-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-50716][CORE][FOLLOWUP] Fix the scenario in JavaUtils#deleteRecursivelyUsingJavaIO where BasicFileAttributes cannot be read.
#49357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
.github/workflows/build_and_test.yml
Outdated
| uses: ./.github/workflows/maven_test.yml | ||
| with: | ||
| java: 21 | ||
| os: macos-15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test with macos, will revert after check
| */ | ||
| private static BasicFileAttributes readFileAttributes(File file) { | ||
| try { | ||
| return Files.readAttributes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If the inode itself does not exist, it will not be possible to read the
BasicFileAttributes. - If it is a corrupted symbolic link, the
BasicFileAttributescan be read, butfile.exists()will return false.
It seems that scenario 1 was not covered before. I found that the code has logic that attempts to delete a potentially existing target before writing.
For example:
spark/core/src/main/scala/org/apache/spark/SparkContext.scala
Lines 1879 to 1888 in 91f3fdd
| val dest = new File( | |
| root, | |
| if (uri.getFragment != null) uri.getFragment else source.getName) | |
| logInfo( | |
| log"Unpacking an archive ${MDC(LogKeys.PATH, path)}" + | |
| log" (${MDC(LogKeys.BYTE_SIZE, source.length)} bytes)" + | |
| log" from ${MDC(LogKeys.SOURCE_PATH, source.getAbsolutePath)}" + | |
| log" to ${MDC(LogKeys.DESTINATION_PATH, dest.getAbsolutePath)}") | |
| Utils.deleteRecursively(dest) | |
| Utils.unpack(source, dest) |
Let's test it on macOS first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's getting a bit late in my time zone, so I need to sleep for 4 hours first.
Let's check the new test results and decide on the next steps. Of course, if you think the risk is too high, we can also revert this patch first. @dongjoon-hyun Thanks ~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can wait :)
|
Thank you for taking a look at this. Please take a rest first and your time, @LuciferYang ! |
This seems to be a known flaky test. |
|
It seems that PR builder failure looks reasonable. It means this PR is unable to reproduce the MacOS Daily CI failures. WDTY? https://github.com/apache/spark/actions/runs/12622568770
|
|
@dongjoon-hyun Sorry, I didn't understand your meaning. The test was added previously to confirm whether the this pr can fix the macos daily test |
|
So, it's expected that this PR doesn't reproduce the previous issue, which indicates that it has fixed that problem. |
|
@dongjoon-hyun Or should we revert #49347 first and resubmit it together with this one? |
|
@HyukjinKwon this one is ready now |
I manually tested this flaky test locally, and it passed. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try this after merging. Thank you, @LuciferYang .
|
Merged to master~ |
|
After merging, MacOS CI is triggered here. |
The test case
Need to investigate further. |
|
Regarding the failed case mentioned above, let's synchronize on the progress of the investigation. When the failure occurs, the following code spark/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala Lines 603 to 617 in 204c672
I'm not very familiar with this part of the code, so I need to further investigate it. Do you have any suggestions on this? @dongjoon-hyun |
|
To @LuciferYang , you can stop investigating on this. IIUC, the situation is the same before your PR. Kafka test suites have been unstable in MacOS environment. Thank you for your precious time. |
OK ~ |




What changes were proposed in this pull request?
This PR adds protection against IOException (IOE) scenarios when reading the
BasicFileAttributesof a file in thedeleteRecursivelyUsingJavaIOmethod: it catches the IOE and returns null, and silently handles the scenario wherefileAttributesis null in the subsequent logic.Why are the changes needed?
When the inode itself does not exist, it is impossible to read its
BasicFileAttributes, and an IOException (IOE) will be thrown, which caused the failure of the MacOS daily test:#49347 aimed to fix the cleanup of symbolic links by moving the operation to read
BasicFileAttributesbefore!file.existsto add a check for broken symbolic links. However, in Spark, there is a logic that first cleans up the potentially existing destination path before overwriting it. The target path being cleaned up may itself be a non-existent inode, such as:spark/core/src/main/scala/org/apache/spark/SparkContext.scala
Lines 1879 to 1888 in 91f3fdd
Therefore, additional protection is needed for this scenario to maintain compatibility with the old behavior.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
No