Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40815][SQL][FOLLOW-UP] Disable DelegateSymlinkTextInputFormat tests for JDK 9+ #38504

Closed
wants to merge 4 commits into from

Conversation

sadikovi
Copy link
Contributor

@sadikovi sadikovi commented Nov 3, 2022

What changes were proposed in this pull request?

This PR is a follow-up for #38277.

This change is required due to test failures in JDK 11 and JDK 17. The patch disables the unit test for JDK 9+.
This is a temporary measure while I am debugging and working on the fix for higher versions of JDK.

Why are the changes needed?

Fixes the test failure in JDK 11.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

N/A.

@sadikovi
Copy link
Contributor Author

sadikovi commented Nov 3, 2022

@HyukjinKwon @dongjoon-hyun I have ignored the test and also disabled the feature by default.
Let me know if you would like to revert the change completely instead.

@github-actions github-actions bot added the SQL label Nov 3, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what is the root cause of failure?

@@ -226,7 +226,8 @@ class HiveSerDeReadWriteSuite extends QueryTest with SQLTestUtils with TestHiveS
}
}

test("SPARK-40815: DelegateSymlinkTextInputFormat serialization") {
// Ignored due to JDK 11 failures reported in https://github.com/apache/spark/pull/38277.
ignore("SPARK-40815: DelegateSymlinkTextInputFormat serialization") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, the following will help us keep our test coverage in Java 8 CI.

SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_9)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. So I can keep the test enabled but it would be ignored with assume in JDK 11 and JDK 17?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes~

@sadikovi
Copy link
Contributor Author

sadikovi commented Nov 3, 2022

No, I don't know yet.
I have opened a PR to unblock the CI and I am planning to debug the failure later today.

@sadikovi sadikovi changed the title [SPARK-40815][SQL][FOLLOW-UP] Disable DelegateSymlinkTextInputSplit feature and ignore the test [SPARK-40815][SQL][FOLLOW-UP] Disable DelegateSymlinkTextInputFormat tests for JDK 9+ Nov 3, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I manually validated that the test cases are ignored correctly.

[info] HiveSerDeReadWriteSuite:
[info] - SPARK-40815: DelegateSymlinkTextInputFormat serialization !!! CANCELED !!! (15 milliseconds)
[info]   org.apache.commons.lang3.SystemUtils.isJavaVersionAtLeast(JAVA_9) was true (HiveSerDeReadWriteSuite.scala:232)
...

I'm also digging this issue.

@HyukjinKwon
Copy link
Member

Merged to master.

cloud-fan pushed a commit that referenced this pull request Nov 10, 2022
…tInputFormat to avoid Hive ExecMapper.getDone() check

### What changes were proposed in this pull request?

Follow-up for #38504.

This PR fixes an issue where a unit test for SymlinkTextInputFormat would fail in JDK 11. The fix is to return the record reader directly instead of wrapping it with `HiveRecordReader`.

The issue could be reproduced by running a test right before SPARK-40815 tests with JDK 11 and appears to be related to `ExecMapper.getDone()` (https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/io/HiveRecordReader.java#L76). This check is for a static variable that is set when the map job is done to limit results. Note that there is no synchronisation around `getDone()` or `setDone()`, it is just a static mutable variable.

Once that is set, the record reader returns 0 records. I don't know why it works in JDK 8 but not in JDK 11 but my current suspicion is something related to class loader in JDK similar to this patch: apache/hive@a234475 (maybe this is the fix but I was not able to verify).

### Why are the changes needed?

Re-enables the test and fixes an issue with DelegateSymlinkTextInputFormat returning 0 records. Note that this is still an issue with SymlinkTextInputFormat as this needs to be addressed in Hive.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I re-enabled the unit tests that used to fail in JDK 11 and extended a test to cover LIMIT.

Closes #38544 from sadikovi/fix-symlink-test-2.

Authored-by: Ivan Sadikov <ivan.sadikov@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
…tests for JDK 9+

### What changes were proposed in this pull request?

This PR is a follow-up for apache#38277.

This change is required due to test failures in JDK 11 and JDK 17. The patch disables the unit test for JDK 9+.
This is a temporary measure while I am debugging and working on the fix for higher versions of JDK.

### Why are the changes needed?

Fixes the test failure in JDK 11.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A.

Closes apache#38504 from sadikovi/fix-symlink-test.

Authored-by: Ivan Sadikov <ivan.sadikov@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
…tInputFormat to avoid Hive ExecMapper.getDone() check

### What changes were proposed in this pull request?

Follow-up for apache#38504.

This PR fixes an issue where a unit test for SymlinkTextInputFormat would fail in JDK 11. The fix is to return the record reader directly instead of wrapping it with `HiveRecordReader`.

The issue could be reproduced by running a test right before SPARK-40815 tests with JDK 11 and appears to be related to `ExecMapper.getDone()` (https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/io/HiveRecordReader.java#L76). This check is for a static variable that is set when the map job is done to limit results. Note that there is no synchronisation around `getDone()` or `setDone()`, it is just a static mutable variable.

Once that is set, the record reader returns 0 records. I don't know why it works in JDK 8 but not in JDK 11 but my current suspicion is something related to class loader in JDK similar to this patch: apache/hive@a234475 (maybe this is the fix but I was not able to verify).

### Why are the changes needed?

Re-enables the test and fixes an issue with DelegateSymlinkTextInputFormat returning 0 records. Note that this is still an issue with SymlinkTextInputFormat as this needs to be addressed in Hive.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I re-enabled the unit tests that used to fail in JDK 11 and extended a test to cover LIMIT.

Closes apache#38544 from sadikovi/fix-symlink-test-2.

Authored-by: Ivan Sadikov <ivan.sadikov@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants