Skip to content

Core: Update version-hint.txt atomically#1559

Merged
rdblue merged 5 commits intoapache:masterfrom
lcspinter:issue-1496
Oct 29, 2020
Merged

Core: Update version-hint.txt atomically#1559
rdblue merged 5 commits intoapache:masterfrom
lcspinter:issue-1496

Conversation

@lcspinter
Copy link
Contributor

Under issue #1496 it was already discussed that we should find a way to update version-hint.txt atomically. At the moment, there is no know operation that would support atomic updates. Based on this I think our best solution would be:

  1. Create a new file
  2. Delete old version-hint.txt
  3. Move the new file to version-hint.txt

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Oct 7, 2020

If I understand correctly, these operations should be applied regardless of existing version file - consider the case when two concurrent writers both see there's no existing version file and try to write to the path directly.

That said, delete -> rename can be (should be?) replaced with rename with overwrite = true. If I'm not missing anything, this ensures last one wins (there's no way to ensure atomicity via atomic rename when the file may exist, so that's a best effort) and partial file is not exposed.

@pvary
Copy link
Contributor

pvary commented Oct 7, 2020

That said, delete -> rename can be (should be?) replaced with rename with overwrite = true. If I'm not missing anything, this ensures last one wins and partial file is not exposed.

I think we could not use rename with overwrite since it behaves differently on different FileSystems:

Destination exists and is a file

Renaming a file atop an existing file is specified as failing, raising an exception.
- Local FileSystem : the rename succeeds; the destination file is replaced by the source file.
- HDFS : The rename fails, no exception is raised. Instead the method call simply returns false.

That is why we decided to use delete + rename which works consistently for every FS

@lcspinter
Copy link
Contributor Author

@pvary Thanks for the review. I've committed an improved version of the fix.

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Oct 7, 2020

  • HDFS : The rename fails, no exception is raised. Instead the method call simply returns false.

I guess it's not explaining the case when overwrite = true. (If then HDFS is breaking contract. I agree local filesystem is breaking contract without the option so can't simply rely on contract.)

delete -> rename would work if we simply do delete regardless of when the file exists or not, so not a big deal though. Just wanted to try to ensure the last one wins, but on concurrent writers it wouldn't be weird anyone wins.

@pvary
Copy link
Contributor

pvary commented Oct 7, 2020

+1

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rdblue
Copy link
Contributor

rdblue commented Oct 7, 2020

If there is a rename with overwrite option, then I would prefer to use that for the cases where it is atomic. That would be safer. (Side note: local FS doesn't provide atomic rename, even without overwrite, so you shouldn't be using Hadoop tables with it except for tests.)

Also, we don't care whether the version hint is completely up to date. We just want to be close to the real version. Being a version or two behind is okay.

@lcspinter
Copy link
Contributor Author

Since the FileSystem API doesn't provide atomic renames, that's why we choose to create new file + delete old file + rename new file path.
@rdblue If you don't have any more remarks, could you please push this change to master? Thanks

@lcspinter
Copy link
Contributor Author

@rdblue When you have some time, could you please review my change? Thanks

@rdblue
Copy link
Contributor

rdblue commented Oct 13, 2020

Does the file system API provide a rename with overwrite option? If so, we should use that regardless of whether it is atomic. If not, then we should move forward with delete and then rename.

@lcspinter
Copy link
Contributor Author

@rdblue I agree, rename with overwrite would be the best solution but on HDFS the rename fails without raising any exception if destination exists.

@lcspinter
Copy link
Contributor Author

@rdblue According to the Filesystem rename API doc
Local FileSystem : the rename succeeds; the destination file is replaced by the source file.
HDFS : The rename fails, no exception is raised. Instead the method call simply returns false.

What if I try the rename with overwrite, and if it returns false I fallback to delete and rename? What do you think?

@HeartSaVioR
Copy link
Contributor

To clarify, the API I mentioned was FileContext.rename(src, dst, options), not FileSystem.rename(src, dst), which the filesystem doc documents.

I don't have HDFS cluster now, but the result of the rename operation against local filesystem via FileContext is quite different from the document says.

Below is the code you can run with spark-shell against Hadoop 2.7 & Hadoop 3.2.


import org.apache.hadoop.fs.{FileContext, Path}
import org.apache.hadoop.fs.Options.Rename
import org.apache.hadoop.fs.permission.FsPermission

val context = FileContext.getFileContext()

// assuming you have files `unit-tests.log` and `unit-tests-succeed.log` (different file size) in /tmp

val setupPath = new Path("/tmp/unit-tests-succeed.log")
val anotherFilePath = new Path("/tmp/unit-tests.log")
val sourceDirPath = new Path("/tmp/rename-experiment-src")
val destDirPath = new Path("/tmp/rename-experiment-dst")
val anotherFileSourcePath = new Path("/tmp/rename-experiment-src/unit-tests.log")
val sourcePath = new Path("/tmp/rename-experiment-src/unit-tests-succeed.log")
val destPath = new Path("/tmp/rename-experiment-dst/unit-tests-succeed.log")

// remove directories
context.delete(sourceDirPath, true)
context.delete(destDirPath, true)
context.mkdir(sourceDirPath, FsPermission.getDirDefault(), true)
context.mkdir(destDirPath, FsPermission.getDirDefault(), true)

// setup file
context.util.copy(setupPath, sourcePath)

// the file got moved
context.rename(sourcePath, destPath)

// check whether the file is moved
println(s"src path: ${context.util.exists(sourcePath)}")
println(s"dest path: ${context.util.exists(destPath)}")
println(s"content summary on dest path: ${context.util.getContentSummary(destPath)}")

// re-setup file
context.util.copy(setupPath, sourcePath)

// re-rename -> this will throw exception as file already exists
context.rename(sourcePath, destPath)

// setup another file
context.util.copy(anotherFilePath, anotherFileSourcePath)

// re-rename with overwrite option -> this will not throw exception
context.rename(anotherFileSourcePath, destPath, Rename.OVERWRITE)

// check whether the file is moved
println(s"src path: ${context.util.exists(anotherFileSourcePath)}")
println(s"dest path: ${context.util.exists(destPath)}")
println(s"content summary on dest path: ${context.util.getContentSummary(destPath)}")

It correctly fails on existing file in destination, and correctly overwrites the new file if the overwrite option is provided.

I also looked through the code path on how namenode handles rename, and it redirects me to DistributedFileSystem.rename which javadoc says it guarantees atomicity.

rel/release-2.7.4

https://github.com/apache/hadoop/blob/cd915e1e8d9d0131462a0b7301586c175728a282/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L647-L653

rel/release-3.2.0

https://github.com/apache/hadoop/blob/e97acb3bd8f3befd27418996fa5d4b50bf2e17bf/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L892-L898

@HeartSaVioR
Copy link
Contributor

(It probably depends on the preference of using FileContext though.)

}

private void writeVersionToPath(FileSystem fs, Path path, int versionToWrite) throws IOException {
try (FSDataOutputStream out = fs.create(path, false)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boolean parameters need inline comments to explain what they are. What does false here do?

@rdblue
Copy link
Contributor

rdblue commented Oct 23, 2020

Good to understand the points about rename with overwrite flag. Since there isn't an overwrite flag in the API we use, let's go with delete and rename for this. The current approach looks good to me.

I think the only thing that needs to be updated is that the boolean args need to be documented. I agree with the choice to fail the create if the unique file already exists, but the comment should be there so it can be understood easily.

@lcspinter
Copy link
Contributor Author

@rdblue Thanks for the review. I updated the code with an inline comment.

try {
Path tempVersionHintFile = metadataPath(UUID.randomUUID().toString() + "-version-hint.temp");
writeVersionToPath(fs, tempVersionHintFile, versionToWrite);
fs.delete(versionHintFile, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one needs a comment to clarify false as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed it.

@rdblue rdblue merged commit 2a371cf into apache:master Oct 29, 2020
@rdblue
Copy link
Contributor

rdblue commented Oct 29, 2020

Thanks, @lcspinter! Merged.

@lcspinter lcspinter deleted the issue-1496 branch October 29, 2020 16:31
@lcspinter
Copy link
Contributor Author

@rdblue Thanks!

resolves #1496

@rdblue rdblue added this to the Java 0.10.0 Release milestone Nov 16, 2020
@sylvon
Copy link

sylvon commented Apr 17, 2021

@lcspinter @rdblue we observed a few times in our job that when the program exits say due to OOM right after the delete but before the rename, we will no longer be able to find the table. What we have to do is manually find the latest version and craft a version-hint file by hand. Since we do this delete and rename operation, can we also generate a version-hint by file listing when the file is not found?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants