Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 12 additions & 6 deletions core/src/main/scala/org/apache/spark/util/Utils.scala
Original file line number Diff line number Diff line change
Expand Up @@ -285,9 +285,11 @@ private[spark] object Utils extends Logging {
*/
def createDirectory(dir: File): Boolean = {
try {
// This sporadically fails - not sure why ... !dir.exists() && !dir.mkdirs()
// So attempting to create and then check if directory was created or not.
dir.mkdirs()
// SPARK-35907: The check was required by File.mkdirs() because it could sporadically
// fail silently. After switching to Files.createDirectories(), ideally, there should
// no longer be silent fails. But the check is kept for the safety concern. We can
// remove the check when we're sure that Files.createDirectories() would never fail silently.
Files.createDirectories(dir.toPath)
if ( !dir.exists() || !dir.isDirectory) {
logError(s"Failed to create directory " + dir)
}
Comment on lines 293 to 295
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ngone51 I understand you requested this check to be brought back in your previous comment, but it doesn't seem right to me. Based on the Javadoc for createDirectories, the contract is that the directory will be created successfully, or an exception will be thrown. Keeping the exists/isDirectory checks after is redundant. If we're worried about a JDK bug as described here, why do we trust the implementation of File#exists() and File#isDirectory() but not Files.createDirectories()? IMO the whole point of switching from File to the NIO APIs is that they have more sensible semantics and so we don't need to jump through hoops like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the Javadoc for createDirectories, the contract is that the directory will be created successfully, or an exception will be thrown.

I didn't find such a contract there, but only " Unlike the createDirectory method, an exception is not thrown if the directory could not be created because it already exists."...

If we're worried about a JDK bug as described here, why do we trust the implementation of File#exists() and File#isDirectory() but not Files.createDirectories()?

It's not about which method we should trust but which way is safer to go. If there's such a contract as you mentioned, I agree we don't need the check. Otherwise, I think it's safer to follow the original way to avoid suffering from the old issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word "contract" is not used, but I think that is rarely the case in method descriptions. For example none of the methods on Dataset explicitly state contracts or guarantees, but we are expected to assume that the method will do what it says it does, and indicate an error condition using an exception otherwise. My impression here is that the weird semantics of File#mkdirs() are translating into FUD about the potential shortcomings of Files.createDirectories() and that, if the method were taken without that historical context, we wouldn't be worrying about it.

This all being said, it's only three lines of code extra, so of course it is no big detriment to be on the safe side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word "contract" is not used, but I think that is rarely the case in method descriptions. For example none of the methods on Dataset explicitly state contracts or guarantees, but we are expected to assume that the method will do what it says it does, and indicate an error condition using an exception otherwise. My impression here is that the weird semantics of File#mkdirs() are translating into FUD about the potential shortcomings of Files.createDirectories() and that, if the method were taken without that historical context, we wouldn't be worrying about it.

This all being said, it's only three lines of code extra, so of course it is no big detriment to be on the safe side.

I agree with you in terms of safety and minimizing modification points.Do you have any questions with my current code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shockang Could you leave a comment above the check to give the historical context, e.g.,

"The check was required by File#mkdirs() because it could sporadically fail silently. After switching to Files.createDirectories(), ideally, there should no longer be silent fails. But the check is kept for the safety concern. We can remove the check when we're sure that Files.createDirectories() would never fail silently."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any questions with my current code?

No, just wanted to explain my viewpoint. Current code is fine. I agree with @Ngone51 that a comment is helpful to explain the context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shockang Could you leave a comment above the check to give the historical context, e.g.,

"The check was required by File#mkdirs() because it could sporadically fail silently. After switching to Files.createDirectories(), ideally, there should no longer be silent fails. But the check is kept for the safety concern. We can remove the check when we're sure that Files.createDirectories() would never fail silently."

This comment looks perfect. Thank you very much for your comments.

Expand Down Expand Up @@ -315,10 +317,14 @@ private[spark] object Utils extends Logging {
}
try {
dir = new File(root, namePrefix + "-" + UUID.randomUUID.toString)
if (dir.exists() || !dir.mkdirs()) {
// SPARK-35907:
// This could throw more meaningful exception information if directory creation failed.
Files.createDirectories(dir.toPath)
} catch {
case e @ (_ : IOException | _ : SecurityException) =>
logError(s"Failed to create directory $dir", e)
dir = null
}
} catch { case e: SecurityException => dir = null; }
}
}

dir.getCanonicalFile
Expand Down
62 changes: 62 additions & 0 deletions core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -477,6 +477,68 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging {
}
}

test("SPARK-35907: createDirectory") {
val tmpDir = new File(System.getProperty("java.io.tmpdir"))
val testDir = new File(tmpDir, "createDirectory" + System.nanoTime())
val testDirPath = testDir.getCanonicalPath

// 1. Directory created successfully
val scenario1 = new File(testDir, "scenario1")
assert(Utils.createDirectory(scenario1))
assert(scenario1.exists())
assert(Utils.createDirectory(testDirPath, "scenario1").exists())

// 2. Illegal file path
val scenario2 = new File(testDir, "scenario2" * 256)
assert(!Utils.createDirectory(scenario2))
assert(!scenario2.exists())
assertThrows[IOException](Utils.createDirectory(testDirPath, "scenario2" * 256))

// 3. The parent directory cannot read
val scenario3 = new File(testDir, "scenario3")
assert(testDir.canRead)
assert(testDir.setReadable(false))
assert(Utils.createDirectory(scenario3))
assert(scenario3.exists())
assert(Utils.createDirectory(testDirPath, "scenario3").exists())
assert(testDir.setReadable(true))

// 4. The parent directory cannot write
val scenario4 = new File(testDir, "scenario4")
assert(testDir.canWrite)
assert(testDir.setWritable(false))
assert(!Utils.createDirectory(scenario4))
assert(!scenario4.exists())
assertThrows[IOException](Utils.createDirectory(testDirPath, "scenario4"))
assert(testDir.setWritable(true))

// 5. The parent directory cannot execute
val scenario5 = new File(testDir, "scenario5")
assert(testDir.canExecute)
assert(testDir.setExecutable(false))
assert(!Utils.createDirectory(scenario5))
assert(!scenario5.exists())
assertThrows[IOException](Utils.createDirectory(testDirPath, "scenario5"))
assert(testDir.setExecutable(true))

// The following 3 scenarios are only for the method: createDirectory(File)
// 6. Symbolic link
val scenario6 = java.nio.file.Files.createSymbolicLink(new File(testDir, "scenario6")
.toPath, scenario1.toPath).toFile
assert(!Utils.createDirectory(scenario6))
assert(scenario6.exists())

// 7. Directory exists
assert(scenario1.exists())
assert(Utils.createDirectory(scenario1))
assert(scenario1.exists())

// 8. Not directory
val scenario8 = new File(testDir.getCanonicalPath + File.separator + "scenario8")
assert(scenario8.createNewFile())
assert(!Utils.createDirectory(scenario8))
}

test("doesDirectoryContainFilesNewerThan") {
// create some temporary directories and files
withTempDir { parent =>
Expand Down