-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18706: Improve S3ABlockOutputStream recovery #5563
Conversation
d0f3afd
to
2d04346
Compare
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting thought about recovery here; not something we had considered. Of course it doesn't work if you are writing to s3 as part of a yarn app, as the buffer dir is under $LOCAL_DIRS which is automatically cleaned up when the YARN container is destroyed.
I did recently add a command to the cloudstore project to list active multiparts better
https://github.com/steveloughran/cloudstore/blob/trunk/src/main/extra/org/apache/hadoop/fs/s3a/extra/ListMultiparts.java
If you really want upload to be recoverable then you need to be able to combine blocks on the hard disk with the in-progress multipart upload such that you can build finish the upload, build the list of etags and then POST the complete operation.
If you include the spanID in the filename then if you are also collecting S3 server logs then you can actually work out which uploads with outstanding blocks were targeted at. Or, if you were being really clever, when an MPU was initiated, you could save to the local fs a SinglePendingCommit.json file and so actually be able to do some more complicated recovery without even needing those server logs.
Anyway, please include the span ID in the filename. WriteOperationHelper.getAuditSpan()
returns this; the method needs to be pulled up to the WriteOperation interface for the BlockOutputStream to use. This gives extra opportunities to debug which path a block is targeted at, and isolate the case with >1 active write to the same destination.
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ABlockOutputStream.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestDataBlocks.java
Outdated
Show resolved
Hide resolved
💔 -1 overall
This message was automatically generated. |
@steveloughran thanks for your feedback. I've added the span ID to the file name as you suggested for better debugging.
With the part number and key derived from the local file name, I've been using calls to I know it's not a typical use case to recover a partial upload rather than retry the entire file, but it's very helpful with using S3A as the underlying file system in Accumulo. |
🎊 +1 overall
This message was automatically generated. |
OK. I think the design is incomplete as it is, you would really want to be a bit more sophisticated and
but this bit seems ready to go in, low risk and potentially useful for others. now, test policy. which aws s3 region did you run the full "mvn verify" tests for the hadoop-aws module, and what options did you have on the command line? |
@steveloughran I'm running the integration tests in us-east-2 and the only option I'm setting is the parallel test and thread count ( |
ok, and they all work? good to know. Don't be afraid to mention any which do fail, as they are often from different developer config, and its good to find out what is wrong with an existing test (or worse, production code) before we ship... |
I did see intermittent failures once with one of the tests in |
Correction: after rebasing, all of the tests in |
Nevermind. I was running verify from a new terminal window where JAVA_HOME was set to version 18. That was causing issues with the com.google.inject:guice dependency in hadoop-yarn-server-resourcemanager. It looks like that version of java was also messing up the maven-surefire-plugin because the Anyways, rolling back the version of java fixed the issue. I ran the scale tests too this time so |
looks good, just two issues to worry about One bigger issue, which you already mentioned: excessively long filenames. S3 supports 1024 chars of path so this should work through the other block buffers, and MUST work here too. looking at a table of length, there's 255 chars to play with, including block id, span id etc How about adding a new test case or modifying testRegularUpload() to create a file with a name > 256 chars just see what happens? Oh, and we have to remember about windows too, though as java apis go through the unicode ones, its 255 char limit doesn't always hold. Maybe the solution is to do some cutting down of paths such that first few and final chars are always preserved. along with span ID that should be good, though it does depend on filenames generated...does accumulo generate sufficiently unique ones that the last, say, 128 chars will be something you can map to an upload? |
No problem, will do.
Initially I had some code in the S3ADataBlock to trim the key if the file name was nearing 255 chars, but after testing what would happen with a massive S3 key passed to Here's a snippet from the javadocs for
With Accumulo, it's the WALs that are important to recover and they're named with a UUID, so they're very unique even without a prefix. The full WAL key is built as such I'll get that test added and correct the style issue. Let me know if you'd like me to make any changes the file name that's generated. |
6b43d44
to
d61df93
Compare
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with the new test. before I merge, are there any extra assertions on the long file we could/should add? Or is the fact that create() worked enough
LOG.info(dataBlock.toString()); // block file name and location can be viewed in failsafe-report | ||
|
||
// delete the block file | ||
dataBlock.innerClose(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any more asserts here, e.g that the file exists afterwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an assertion to make sure the tmp file is created.
*/ | ||
@Test | ||
public void testDiskBlockCreate() throws IOException { | ||
S3ADataBlocks.BlockFactory diskBlockFactory = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use try-with-resources, even if I doubt this is at risk of leaking things
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, just more detail on a failure
boolean created = Arrays.stream( | ||
Objects.requireNonNull(new File(getConfiguration().get("hadoop.tmp.dir")).listFiles())) | ||
.anyMatch(f -> f.getName().contains("very_long_s3_key")); | ||
assertTrue(created); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a message to print if the assert is false: we need to be able to start debugging without having to work back from the first line of the stack trace as to what went wrong. include that hadoop.tmp.dir value in the message too
c517ac3
to
fd736d0
Compare
🎊 +1 overall
This message was automatically generated. |
can you stop rebasing this during the review process. it makes it impossible for me to use the "review changes since your last commit" once you have confirmed that you are going to stop, I will review the PR again. |
Sorry about that. I won't rebase anymore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, final wrap up.
checkstyle issues
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ABlockOutputArray.java:36:import java.nio.file.Files;:8: Unused import - java.nio.file.Files. [UnusedImports]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ABlockOutputArray.java:96: "very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +: '"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__"' has incorrect indentation level 6, expected level should be 8. [Indentation]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ABlockOutputArray.java:115: diskBlockFactory.create("spanId", s3Key, 1, blockSize, null);: 'diskBlockFactory' has incorrect indentation level 11, expected level should be 13. [Indentation]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ABlockOutputArray.java:119: Objects.requireNonNull(new File(tmpDir).listFiles())): 'Objects' has incorrect indentation level 8, expected level should be 10. [Indentation]
and full test run with -Dscale; please state which aws region and what command line options you set, e.g -Dparallel-tests -DtestsThreadCount=8 -Dscale -Dprefetch
Done. Integration tests were run in us-east-2 with the following options:
|
🎊 +1 overall
This message was automatically generated. |
thanks for the run. final bit of outstanding style. i know there are lots already, but new code should always start well, whenever possible
|
Shoot. That was in the last checkstyle, and I thought I bumped it out to 113. My IDE says it was indented to 113, but it wasn't. I should have run check style before committing. Anyways, my mistake. I reviewed the check style report before committing this time and it should be good now. |
🎊 +1 overall
This message was automatically generated. |
thanks, merged to trunk. can you cherrypick locally to branch-3.3 rerun the tests and then put that up as a new PR? I will then merge that. FWIW I still think what you are trying to do for recovery is "bold", the rock climbing "I wonder if they will survive" meaning of the word, but the filename tracking could be useful in other ways. |
This reverts commit 372631c. Reverted due to HADOOP-18744.
Description of PR
This PR improves the ability to recovery partial S3A uploads.
How was this patch tested?
Unit testing and regression testing with Accumulo
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?