Skip to content

[SPARK-53205][CORE][SQL] Support createParentDirs in SparkFileUtils#51932

Closed
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-53205
Closed

[SPARK-53205][CORE][SQL] Support createParentDirs in SparkFileUtils#51932
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-53205

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Aug 8, 2025

What changes were proposed in this pull request?

This PR aims to support createParentDirs in SparkFileUtils.

Why are the changes needed?

To improve Spark's file utility functions.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member Author

$ dev/scalastyle
Using SPARK_LOCAL_IP=localhost
Scalastyle checks passed.

$ build/sbt "core/testOnly *.RPackageUtilsSuite"
...
[info] Run completed in 2 seconds, 267 milliseconds.
[info] Total number of tests run: 3
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 3, failed 0, canceled 3, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 8 s, completed Aug 8, 2025, 10:07:12 AM

$ build/sbt "sql/testOnly *.Parquet*PartitionDiscoverySuite"
...
[info] Run completed in 21 seconds, 37 milliseconds.
[info] Total number of tests run: 74
[info] Suites: completed 2, aborted 0
[info] Tests: succeeded 74, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 34 s, completed Aug 8, 2025, 10:08:03 AM

$ build/sbt "hive/testOnly *.ParquetHadoopFsRelationSuite" -Phive
...
[info] Run completed in 28 seconds, 150 milliseconds.
[info] Total number of tests run: 40
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 40, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 44 s, completed Aug 8, 2025, 10:09:15 AM

@dongjoon-hyun
Copy link
Member Author

Could you review this PR when you have some time, @viirya ?

All Scala/Java tests passed.

}
val parent = file.getParentFile()
if (parent != null) {
Files.createDirectories(parent.toPath())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between this and parent.mkdirs()?

Btw, should we handle the following like com.google.common.io.Files?


    if (!parent.isDirectory()) {
      throw new IOException("Unable to create parent directories of " + file);
    }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files.createDirectories is a newer API than File.mkdirs. In our case, we just want to make it sure that the directory exists. Since Files.createDirectories doesn't complain on the existing directory. That works for our use case.

@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya . All tests passed.

@dongjoon-hyun
Copy link
Member Author

Let me merge this. If we need a use case to fail at the existing directory, we can revise the method later~

Merged to master for Apache Spark 4.1.0.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-53205 branch August 8, 2025 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants