-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-15183. S3Guard store becomes inconsistent after partial failure of rename #951
HADOOP-15183. S3Guard store becomes inconsistent after partial failure of rename #951
Conversation
💔 -1 overall
This message was automatically generated. |
Testing: S3A ireland. All good except for ITestCommitOperations.testBulkCommitFiles which only fails on parallel test runs. Which is very, very annoying, as it is hard to track down, especially as the scale tests now take 30 minutes. Plan: AncestorState.toString to list paths added, and assert to include the before and after string values. Hypotheses:
|
I have 2 failures while testing the latest PR against ireland:
|
🎊 +1 overall
This message was automatically generated. |
I got 4 errors during verify against ireland:
|
@gabor, thanks for that. I have sometimes seen that failure on ITestMagicCommitMR job, hence we now log when it was deleted. What was the actual time when the test was run? What I can do is add some extra diags in the operations where the committers update the DDB tables on commit, because this failure implies they didn't create an entry for the parent dir. this all happens in finishedWrite() which first calls MetastoreAddAncestors, which in DDB goes up the tree to find the first parent dir which is in the store and stops there. Then in the metastore.put() afterwards we add the new file and its parents, but skipping those where there's already an entry. I wonder if we can/should do more here
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @steveloughran for providing this massive contribution. Look good overall, but I proposed some changes and added some notes.
Running the testst with dynamo and local I get the same issues:
[ERROR] Errors:
[ERROR] ITestMagicCommitMRJob>AbstractITCommitMRJob.testMRJob:137->AbstractFSContractTestBase.assertIsDirectory:327 ? FileNotFound
[ERROR] ITestDirectoryCommitMRJob>AbstractITCommitMRJob.testMRJob:137->AbstractFSContractTestBase.assertIsDirectory:327 ? FileNotFound
[ERROR] ITestPartitionCommitMRJob>AbstractITCommitMRJob.testMRJob:137->AbstractFSContractTestBase.assertIsDirectory:327 ? FileNotFound
[ERROR] ITestStagingCommitMRJob>AbstractITCommitMRJob.testMRJob:137->AbstractFSContractTestBase.assertIsDirectory:327 ? FileNotFound
dstMetas = new ArrayList<>(); | ||
} | ||
// TODO S3Guard HADOOP-13761: retries when source paths are not visible yet | ||
// Validation completed: time to begin the operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would worth to create a separate method to the validation and all the other parts of this method that could be moved and tested separately from innerRename. This method is huge, 300+ lines; it would help the sustainability to split it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That I will do. It might also line me up better for when I open up the rename/3 operation which will always throw an exception on any invalid state (rather than return "false" with no explanation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. Not doing any tests on the validation alone, as the contract tests are expected to generate the failure conditions, but it does help isolate the two stages
* @return the store context of this FS. | ||
*/ | ||
@InterfaceAudience.Private | ||
public StoreContext createStoreContext() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot of parameters for a constructor. I think it would worth to use builder pattern for readability and sustainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must disagree. The builder pattern is best for when you want to have partial config or support change where you dont want to add many, many constructors, and substitutes for Java's lack or named params in constructors (compare with: groovy, scala, python)
Here: all parameters must be supplied and this is exclusively for use in the s3a connector. Nobody should be creating these elsewhere, and if they do, not my problem if it breaks
String key, | ||
S3AEncryptionMethods serverSideEncryptionAlgorithm, | ||
String serverSideEncryptionKey, | ||
String eTag, | ||
String versionId) { | ||
String versionId, final long len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if it's one parameter per line we should keep that way (add len to a new line).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. IDE refactoring at work
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/CommitOperations.java
Show resolved
Hide resolved
* No attempt to use a builder here as outside tests | ||
* this should only be created in the S3AFileSystem. | ||
*/ | ||
public StoreContext( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would worth a builder pattern, and as this is new with this PR it will scale well in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, for the reasons as discussed.
- if we add more args, we want 100% of uses (production, test cases) to add every param. Doing that in the refactor operations of the IDE is straightforward.
- I'm thinking that as more FS-level operations are added (path to key, temp file...) I'd just add a new interface "FSLevelOperations" which we'd implement in S3A and for any test suite. This would avoid the need for a new field & constructor arg on every operation, though I'd still require code to call an explicit method for each such operation (i.e. no direct access to FSLevelOperations.
No need to do that now precisely because this is all private; we can do that on the next iteration
@@ -1474,6 +1474,18 @@ Caused by: java.lang.NullPointerException | |||
... 1 more | |||
``` | |||
|
|||
### Error `Attempt to change a resource which is still in use: Table is being deleted` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I see this is an entirely different issue, which should be resolved in a separate commit. do you plan to do that instead of squashing all your changes to the same commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been working on this patch for too long and wrapping up stuff to try and get all tests to worse; a lot of work has gone into the ITestDynamoDB there. I don't want to split them up at this late in the patch. Sorry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth discussing I think--I'm with @bgaborg on this. I've been on projects in the past that auto-reject conflated commits. Why not maintain a list of commits on your branch and commit them intact instead of squashing them? Git rebase -i and force push (your personal branch only) are your friends here. Gives you optimal commit history and not that hard to manage IMO. Maybe try it out next time you are working on a patch set. I wouldn't force you to go break apart these commits this late in the game, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do have branches of the unsquashed PRs, so I should be able to do that. By popular requeset I will give it a go, but leave them in here. If people get the other PR in first, I'll deal with that
...tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java
Show resolved
Hide resolved
...ls/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestDynamoDBMetadataStore.java
Outdated
Show resolved
Hide resolved
.../hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestProgressiveRenameTracker.java
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
2fa4cb3
to
da8e05a
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still a couple more files to look at.. here's my comments so far.
/** | ||
* Saves metadata for exactly one path, potentially | ||
* using any bulk operation state to eliminate duplicate work. | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if operationState
is not null, can implementations may defer write to metastore until a later time, or must they write immediately but are allowed to elide subsequent "duplicate" writes? I don't think the "when is it durable" contract is super important currently, but might be worth clarifying if you roll another version of the patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the metastores are doing anything with delayed operations, just tracking it. Clarified in the javadocs
|
||
@Override | ||
public int compare(Path pathL, Path pathR) { | ||
// exist fast on equal values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "exit fast"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
if (compare > 0) { | ||
return -1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this function just be return -super.compare(pathL, pathR)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you'd think so, but when I tried my sort tests were failing -and even stepping through the code with the debugger couldn't work out why. Doing in like this fixed the tests
} | ||
} | ||
|
||
// outside the lock, the entriesToAdd list has all new entries to create. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which chunks of data is this lock protecting? Since these are vanilla Lists you need a lock to read as well or you get undefined results, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just a variable, the list of new entries to add. I exit the synchronized(this) block so that the move call doesn't block.
Looking at it, I think I'll add DurationInfo around it , and some more comments as to what is happening
// and finish off; including deleting source directories. | ||
// TODO: is this making too many calls? | ||
LOG.debug("Rename completed for {}", this); | ||
getMetadataStore().move(pathsToDelete, destMetas, getOperationState()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No lock here on a read to pathsToDelete, etc. Per previous comment want to understand which data you are guarding with lock so we can ensure we have coverage.
* Originally it implemented the logic to probe for an add ancestors, | ||
* but with the addition of a store-specific bulk operation state | ||
* it became unworkable. | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't hold up this patch for it, but am curious how this became unworkable. I imagine de-duping metadata writes (ancestors) using a bulk context, such that S3A repeating ancestor writes within an op are not an issue.
Again, always trying to keep MetadataStore as simple as possible and specific to logging metadata operations on a FileSystem.
@@ -1474,6 +1474,18 @@ Caused by: java.lang.NullPointerException | |||
... 1 more | |||
``` | |||
|
|||
### Error `Attempt to change a resource which is still in use: Table is being deleted` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth discussing I think--I'm with @bgaborg on this. I've been on projects in the past that auto-reject conflated commits. Why not maintain a list of commits on your branch and commit them intact instead of squashing them? Git rebase -i and force push (your personal branch only) are your friends here. Gives you optimal commit history and not that hard to manage IMO. Maybe try it out next time you are working on a patch set. I wouldn't force you to go break apart these commits this late in the game, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I think I've gone through everything. I could spend more time meditating on the new rename code but these are my comments so far. Most are nits or discussion for fun, but there was one question about synchronization that I'd like clarification on.
...p-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/DynamoDBMetadataStore.java
Outdated
Show resolved
Hide resolved
* lowest entry first. | ||
* | ||
* This is to ensure that if a client failed partway through the update, | ||
* there will no entries in the table which lack parent entries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice use of topological sorting here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. We do need it, just for the extra resilience
* An attempt is made in {@link #deleteTestDirInTeardown()} to prune these test | ||
* files. | ||
*/ | ||
@SuppressWarnings("ThrowableNotThrown") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertIsDirectory(readOnlyDir); | ||
Path renamedDestPath = new Path(readOnlyDir, writableDir.getName()); | ||
assertRenameOutcome(roleFS, writableDir, readOnlyDir, true); | ||
assertIsFile(renamedDestPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests seem to accomplish what you need. Did you think of reaching below into the MetadataStore
(via S3AFileSystem.getMetadataStore()
and asserting on its state after failures? I don't see a specific need (you are getting good coverage through the FS)... just wondering if there are additional cases we could expose that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, didn't actually. good point though
🎊 +1 overall
This message was automatically generated. |
|
||
// check if UNGUARDED_FLAG is passed and use NullMetadataStore in | ||
// config to avoid side effects like creating the table if not exists | ||
Configuration conf0 = getConf(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename this to unguardedConf or something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
This is commit a22fc98 merged atop trunk and the HADOOP-16279 OOB Delete JIRA. * OperationState as arg to the put operations * Still some tuning/review of AddAncestors needed from where it was pushed into the metastore (so it can use/update the ancestors). Change-Id: Idf34e5c7e88a765aa0aeadccd4f9bffdc8bca420
Change-Id: I5b0e5f0991cd26429f5ab9463073f79220ac9bd2
I'm going to close this and re-open a new patch with everything merged atop the OOB patch. It's not that they conflict functionality-wise, it's just as they both pass down a param to the metastore put operations, they create merge conflict. FWIW, I'm now unsure why the TTL needs to go down, rather than set during the init phase |
…ameOperation class, as requested. This does provide just one place to look at the code. There are eight methods in S3AFileSystem used during the rename. I've created an interface for this and the inner class for S3AFS which bridges to them. Looking at the methods you can see what things we should export in a lower-down layer in the S3AFS refactoring -ideally these should all be one level down. This takes most of the new code out of the S3AFS, though the new callback interface adds some again. What is key is that 1. The new algorithm for renaming is in its own class, with src and dest params all as final fields. 2. broken up into separate methods for file and dir rename 3. and with helper methods for queuing operations etc. The StoreContext has backed off from lambda-expressions to invoke S3AFS ops as they were getting too many, moving to an interface and again, an implementation, ContextAccessors This adds some more code in the S3AFS, but it makes it easier to see how the methods are being used, while still allowing tests to provide their own mock implementation class. + InternalConstants class for internal constants + Move the FunctionsRaisingIOE over to fs.impl. With the move to interfaces over l-expressions these aren't being used so much, but can be picked up elsewhere. Marked as private/unstable * slightly better diags for the AbstractCommitTerasortITests on a missing _SUCCESS marker; and a cleanup operation at the end to delete the files (the normal per-fork paths aren't used so they can avoid deletion. * AbstractStoreOperation implements getConf() as it's that common to use. Change-Id: I9e4420d343fb87422779c11ce89fe7710edd180c
a22fc98
to
dbdbfe8
Compare
ok, I untintentionally forced push rather than closed. Hopefully that won't be too disruptive. If need be I can switch to a new PR |
💔 -1 overall
This message was automatically generated. |
…d from a get. This is to debug why some of the root contract tests are failing. I'm going to switch to trunk to see if it has the issue Change-Id: I3d7a177495e83880a179bc76ab81c8dfbaf0a53e
Last little refactoring caused whitespace issues. Will fix
|
* the add ancestors code is now pushed down into the stores, so they can use any bulk state tracking * which also means that the stores need to do the patching of TTL values (now done) * but they also need to do it in Put when completing ancestors there (not done) * and the state tracking in DDB, with the addAncestor integration, doesn't overwrite grandparent paths which have been overwritten with a tombstone Change-Id: I4009ed5ef03549453db2e8c7389903e4c66114a8
💔 -1 overall
This message was automatically generated. |
* DynamoDB.completeAncestry() sets the TTL on entries it creates * DynamoDB.addAncestors() doesn't just stop at the first entry, it goes up the tree. When it finds a tombstone or missing entry as a parent of a vaid entry it logs @ warn and adds to the list As a result, there's now O(depth) gets in every finishedWrite, where before it stopped at the first entry (bad) but in completeAncestry() a put was being done at O(depth) anyway. S3AFileSystem.finishedWrite() calls addAncestors before put(). For a bulk commit, because a bulk operation spans the add and the put, there's no duplication, the cost of a write is less: its O(depth) with the put operations for only those files. For a normal single file write we can do the same by creating a temp bulk write instance and using it purely for the single operation. It does seem overkill, but it lets us glue together both operations in the sequence, which is the whole point. Change-Id: I48ad7b2657b0d14fffc0318934af73bde8368482
💔 -1 overall
This message was automatically generated. |
…d ancestors and put call are integrated finishedWrite() now creates a BulkUpdate if one wasn't already present, closes it afterwards. This is to ensure that the findings of the addAncestors call are used in the putAndReturn call, which will not add a PUT request for all entries we know exists. Makes DDB cost of writing a single file depth * GET + (1+ missing parent count) * PUT. Before: depth * PUT as well as extra GET/PUT calls in addAncestors. PUTs cost more than GET calls, so this is a net saving Failing test ITestCommitOperations was tracked down to clock skew triggering a writeback of the getFileStatus result on the probes after the first commit, so causing an intermittent failure in parallel test runs (under load == worse skew). Filed HADOOP-16382 for the underlying issue; for now simply resetting the MetricDiff counter after the various probes. Change-Id: I85f60bc517cb0ae683961607f1f48b6f35a7004b
Latest patch: doing full matrix of test runs (s3guard/non, local/ddb, auth/non-auth)
Failing test ITestCommitOperations was tracked down to clock skew triggering a writeback of the getFileStatus result on the probes after the first commit, so causing an intermittent failure in parallel test runs (under load == worse skew). Filed HADOOP-16382 for the underlying issue; for now simply resetting the MetricDiff counter after the various probes. I'm reaching that point where I can't see any more issues, and really need the insight/approval/criticism of others. In particular
feedback strongly encouraged. |
🎊 +1 overall
This message was automatically generated. |
* minimising diff between trunk and branch * completeAncestry doesn't break on first ancestor found, it continues up the path. This is due diligence: I haven't encountered problem which arise from not doing this, I'm just making sure that we make sure those parents exist. Because operations now span >1 write, and the normal file write includes the addAncestors check which builds up the same list including with probes for the files actually existing. Change-Id: I2edf9de75ea2546de7f97322ee0bcf838dd7591b
need to create the TTL time provider in the DB for a non-FS init, else you get an NPE in the CLI prune
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some comments & questions attached, and I have some opinions about how RenameOperation could be better, and I'm dreading the backport pain that factoring rename() out is going to cause, but it's only going to get worse, so let's do it! I'm a +1 to committing if there's nothing else blocking it, and I think it's time to move on and address everything else independently.
@@ -418,9 +434,11 @@ private void initThreadPools(Configuration conf) { | |||
unboundedThreadPool = new ThreadPoolExecutor( | |||
maxThreads, Integer.MAX_VALUE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where I had suggested we drop the first argument to 0 (for boundedThreadPool too) as that's core threads, not max threads. Otherwise we actually lock ourselves at the maximum and grow from there. Only if you rebuild and retest anyway - otherwise I'll submit a patch once this is in to avoid further conflicts...
@@ -689,6 +707,7 @@ public String getBucketLocation() throws IOException { | |||
* @return the region in which a bucket is located | |||
* @throws IOException on any failure. | |||
*/ | |||
@VisibleForTesting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... I could kinda see clients wanting to use this. I know for HBOSS I almost did when I was toying with a potential DynamoDB locking implementation..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new storecontext API exports this as an on-demand operation.: fs.createStoreContext().getBucketLocation()
public void move( | ||
@Nullable Collection<Path> pathsToDelete, | ||
@Nullable Collection<PathMetadata> pathsToCreate, | ||
ITtlTimeProvider ttlTimeProvider, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't immediately seeing ttlTimeProvider being used everywhere it's passed. Did we get to the bottom of that? Now's the time to remove it, maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we did. I started a discussion about this if we want to pass it in the metastore init instead of every method which will use it. We ended up passing it to every method instead of the init method. We need to fix that. If that is not fixed in this pr I will fix it tomorrow under a new issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. If you need to override for a single test operation, can always add a @VisibleForTesting setter.
@@ -899,6 +915,9 @@ public void addAncestors( | |||
// a directory entry will go in. | |||
PathMetadata directory = get(parent); | |||
if (directory == null || directory.isDeleted()) { | |||
if (entryFound) { | |||
LOG.warn("Inconsistent S3Guard table: adding directory {}", parent); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting change to this function. slower but more robust (the removed break
below, that is, not this log message)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, we might as well do the depth(path)
get operations in parallel if they always happen, and the break
behavior you remove is not configurable. In terms of write latency it would remove depth(path)-1
round trips (approx.). Proposing this as a followup JIRA, not doing it here.
|
||
// the maximum number of tasks cached if all threads are already uploading | ||
public static final String MAX_TOTAL_TASKS = "fs.s3a.max.total.tasks"; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: remove empty line
…e a time source Change-Id: Ic3ec71dc1d4c7bef4866ca4d598c20aba4e17575
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes since last review LGTM. +1 overall.
@Nullable ITtlTimeProvider ttlTimeProvider) { | ||
return ttlTimeProvider != null ? ttlTimeProvider : timeProvider; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed w/ @bgaborg @mackrorysd this will go away soon, and is fine for now.
@@ -899,6 +915,9 @@ public void addAncestors( | |||
// a directory entry will go in. | |||
PathMetadata directory = get(parent); | |||
if (directory == null || directory.isDeleted()) { | |||
if (entryFound) { | |||
LOG.warn("Inconsistent S3Guard table: adding directory {}", parent); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, we might as well do the depth(path)
get operations in parallel if they always happen, and the break
behavior you remove is not configurable. In terms of write latency it would remove depth(path)-1
round trips (approx.). Proposing this as a followup JIRA, not doing it here.
🎊 +1 overall
This message was automatically generated. |
thanks for the reviews; I will commit as is and file some followups
|
Tested it with -Dscale against ireland. I have the following failures:
This is new for me:
We know about the other 3 testMRJob. I'm not happy that we have those issue, but we know about that. Have we created an issue already to stabilize those? I see some failures in the sequential-integration-tests as well, but those are still running. Not just timeouts - e.g. [ERROR] Tests run: 7, Failures: 1, Errors: 2, Skipped: 0, Time elapsed: 72.408 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.terasort.ITestTerasortMagicCommitter. I'll comment with the results once those are completed. |
I have the sequential-integration-tests issues:
I'm a bit worried about the FNFEs here:
|
@bgaborg thanks for those results, we need to look at them to see if they are related.
|
@mackrorysd FWIW, I wasn't planning to backport this too far. All the new files are far away from existing code, so its the DDB changes and the changes in S3AFS which will be the sources of merge pain |
* Kafka 2.0 upgrade * Migrated some of tests to use new Java APIs and remove scala code * Addressed review comments; fixed all the remaining failures * Remove unused code * Minor cleanup
Contributed by Steve Loughran.
This is the squashed patch of PR #843 commit 115fb77
Contains
HADOOP-13936. S3Guard: DynamoDB can go out of sync with S3AFileSystem.delete()
HADOOP-15604. Bulk commits of S3A MPUs place needless excessive load on S3 & S3Guard
HADOOP-15658. Memory leak in S3AOutputStream
HADOOP-16364. S3Guard table destroy to map IllegalArgumentExceptions to IOEs]
This work adds to the S3Guard Metastore APIs
the notion of a "BulkOperation" : A store-specific class which is requested before initiating bulk work (put, purge, rename) and which then can be used to cache table changes performed during the bulk operation. This allows for renames and commit operations to avoid duplicate creation of parent entries in the tree: the store can track what is already created/found.
the notion of a "RenameTracker" which factors out the task of updating a metastore with changes to the filesystem during a rename, (files added + deleted) and after the completion of the operation, successful or not.
The original rename update -the one which failed to update the store until the end of the rename is implemented as the DelayedUpdateRenameTracker, while a new ProgressiveRenameTracker updates the sttore as individual files are copied and when bulk deletes complete. To avoid performance problems, stores mut provide a BulkOperation implementation which remembers ancestors added. The DynamoDBMetastore does this.
Some of the new features are implemented as part of a gradual refactoring of the S3AFileSystem itself: the handling of partial delete failures is in its own class org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteSupport which, rather than being given a reference back to the owning S3AFileSystem, is handed a StoreContext which contains restriced attributes and callback. As this refactoring continues in future patches, and the different layers of a new store model factored out, this will be extended.
Change-Id: Ie0bd96ab861f0f30170b75f78e5503fc0e929524