HADOOP-15183. S3Guard store becomes inconsistent after partial failure of rename #951

steveloughran · 2019-06-12T09:04:51Z

Contributed by Steve Loughran.

This is the squashed patch of PR #843 commit 115fb77

Contains

HADOOP-13936. S3Guard: DynamoDB can go out of sync with S3AFileSystem.delete()
HADOOP-15604. Bulk commits of S3A MPUs place needless excessive load on S3 & S3Guard
HADOOP-15658. Memory leak in S3AOutputStream
HADOOP-16364. S3Guard table destroy to map IllegalArgumentExceptions to IOEs]

This work adds to the S3Guard Metastore APIs

the notion of a "BulkOperation" : A store-specific class which is requested before initiating bulk work (put, purge, rename) and which then can be used to cache table changes performed during the bulk operation. This allows for renames and commit operations to avoid duplicate creation of parent entries in the tree: the store can track what is already created/found.
the notion of a "RenameTracker" which factors out the task of updating a metastore with changes to the filesystem during a rename, (files added + deleted) and after the completion of the operation, successful or not.

The original rename update -the one which failed to update the store until the end of the rename is implemented as the DelayedUpdateRenameTracker, while a new ProgressiveRenameTracker updates the sttore as individual files are copied and when bulk deletes complete. To avoid performance problems, stores mut provide a BulkOperation implementation which remembers ancestors added. The DynamoDBMetastore does this.

Some of the new features are implemented as part of a gradual refactoring of the S3AFileSystem itself: the handling of partial delete failures is in its own class org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteSupport which, rather than being given a reference back to the owning S3AFileSystem, is handed a StoreContext which contains restriced attributes and callback. As this refactoring continues in future patches, and the different layers of a new store model factored out, this will be extended.

Change-Id: Ie0bd96ab861f0f30170b75f78e5503fc0e929524

hadoop-yetus · 2019-06-12T11:01:51Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	34	Docker mode activated.
		_ Prechecks _
+1	dupname	2	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 32 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	67	Maven dependency ordering for branch
+1	mvninstall	1056	trunk passed
+1	compile	1027	trunk passed
+1	checkstyle	147	trunk passed
+1	mvnsite	126	trunk passed
+1	shadedclient	978	branch has no errors when building and testing our client artifacts.
+1	javadoc	84	trunk passed
0	spotbugs	61	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	177	trunk passed
		_ Patch Compile Tests _
0	mvndep	22	Maven dependency ordering for patch
+1	mvninstall	74	the patch passed
+1	compile	987	the patch passed
+1	javac	987	the patch passed
-0	checkstyle	149	root: The patch generated 20 new + 100 unchanged - 2 fixed = 120 total (was 102)
+1	mvnsite	127	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	xml	2	The patch has no ill-formed XML file.
+1	shadedclient	688	patch has no errors when building and testing our client artifacts.
+1	javadoc	106	the patch passed
+1	findbugs	204	the patch passed
		_ Other Tests _
-1	unit	533	hadoop-common in the patch failed.
+1	unit	288	hadoop-aws in the patch passed.
+1	asflicense	56	The patch does not generate ASF License warnings.
		6954

Reason	Tests
Failed junit tests	hadoop.ha.TestZKFailoverController

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-951/1/artifact/out/Dockerfile
GITHUB PR	#951
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname	Linux 9aa4e01e5952 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `23c0379`
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/1/artifact/out/diff-checkstyle-root.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/1/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/1/testReport/
Max. process+thread count	1392 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/1/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

steveloughran · 2019-06-12T11:03:01Z

Testing: S3A ireland. All good except for ITestCommitOperations.testBulkCommitFiles which only fails on parallel test runs. Which is very, very annoying, as it is hard to track down, especially as the scale tests now take 30 minutes. Plan: AncestorState.toString to list paths added, and assert to include the before and after string values.

Hypotheses:

We are recreating parent paths
more files are somehow being created and committed
parallel test runs are in subdirectories, and this increases the count

[ERROR] testBulkCommitFiles(org.apache.hadoop.fs.s3a.commit.ITestCommitOperations)  Time elapsed: 9.071 s  <<< FAILURE!
java.lang.AssertionError: Number of records written after second commit; first commit had 4: s3guard_metadatastore_record_writes expected:<2> but was:<8>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:834)
	at org.junit.Assert.assertEquals(Assert.java:645)
	at org.apache.hadoop.fs.s3a.S3ATestUtils$MetricDiff.assertDiffEquals(S3ATestUtils.java:882)
	at org.apache.hadoop.fs.s3a.commit.ITestCommitOperations.testBulkCommitFiles(ITestCommitOperations.java:626)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

bgaborg · 2019-06-12T11:49:58Z

I have 2 failures while testing the latest PR against ireland:

[ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITestDirectoryCommitMRJob)  Time elapsed: 42.965 s  <<< ERROR!
java.io.FileNotFoundException: Path s3a://gabota-versioned-bucket-ireland/fork-0005/test/DELAY_LISTING_ME/testMRJob is recorded as deleted by S3Guard at 2019-06-12T11:20:48.612Z
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2857)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2827)
	at org.apache.hadoop.fs.contract.ContractTestUtils.assertIsDirectory(ContractTestUtils.java:559)
	at org.apache.hadoop.fs.contract.AbstractFSContractTestBase.assertIsDirectory(AbstractFSContractTestBase.java:327)
	at org.apache.hadoop.fs.s3a.commit.AbstractITCommitMRJob.testMRJob(AbstractITCommitMRJob.java:137)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

[INFO] Running org.apache.hadoop.fs.s3a.impl.ITestPartialRenamesDeletes
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 93.858 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITestMagicCommitMRJob
[ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITestMagicCommitMRJob)  Time elapsed: 44.661 s  <<< ERROR!
java.io.FileNotFoundException: Path s3a://gabota-versioned-bucket-ireland/fork-0004/test/testMRJob is recorded as deleted by S3Guard at 2019-06-12T11:21:16.180Z
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2857)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2827)
	at org.apache.hadoop.fs.contract.ContractTestUtils.assertIsDirectory(ContractTestUtils.java:559)
	at org.apache.hadoop.fs.contract.AbstractFSContractTestBase.assertIsDirectory(AbstractFSContractTestBase.java:327)
	at org.apache.hadoop.fs.s3a.commit.AbstractITCommitMRJob.testMRJob(AbstractITCommitMRJob.java:137)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)```

hadoop-yetus · 2019-06-12T12:55:38Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	37	Docker mode activated.
		_ Prechecks _
+1	dupname	3	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 32 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	22	Maven dependency ordering for branch
+1	mvninstall	1032	trunk passed
+1	compile	1027	trunk passed
+1	checkstyle	142	trunk passed
+1	mvnsite	132	trunk passed
+1	shadedclient	1002	branch has no errors when building and testing our client artifacts.
+1	javadoc	105	trunk passed
0	spotbugs	59	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	187	trunk passed
		_ Patch Compile Tests _
0	mvndep	20	Maven dependency ordering for patch
+1	mvninstall	73	the patch passed
+1	compile	976	the patch passed
+1	javac	976	the patch passed
-0	checkstyle	137	root: The patch generated 19 new + 100 unchanged - 2 fixed = 119 total (was 102)
+1	mvnsite	121	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	xml	3	The patch has no ill-formed XML file.
+1	shadedclient	654	patch has no errors when building and testing our client artifacts.
+1	javadoc	110	the patch passed
+1	findbugs	207	the patch passed
		_ Other Tests _
+1	unit	514	hadoop-common in the patch passed.
+1	unit	289	hadoop-aws in the patch passed.
+1	asflicense	42	The patch does not generate ASF License warnings.
		6859

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-951/2/artifact/out/Dockerfile
GITHUB PR	#951
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname	Linux b01e374367f9 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `23c0379`
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/2/artifact/out/diff-checkstyle-root.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/2/testReport/
Max. process+thread count	1410 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/2/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

bgaborg · 2019-06-12T14:10:00Z

I got 4 errors during verify against ireland:

[ERROR]   ITestMagicCommitMRJob>AbstractITCommitMRJob.testMRJob:137->AbstractFSContractTestBase.assertIsDirectory:327 ? FileNotFound
[ERROR]   ITestDirectoryCommitMRJob>AbstractITCommitMRJob.testMRJob:137->AbstractFSContractTestBase.assertIsDirectory:327 ? FileNotFound
[ERROR]   ITestPartitionCommitMRJob>AbstractITCommitMRJob.testMRJob:137->AbstractFSContractTestBase.assertIsDirectory:327 ? FileNotFound
[ERROR]   ITestStagingCommitMRJob>AbstractITCommitMRJob.testMRJob:137->AbstractFSContractTestBase.assertIsDirectory:327 ? FileNotFound

[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 67.914 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.staging.integration.ITestPartitionCommitMRJob
[ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITestPartitionCommitMRJob)  Time elapsed: 40.515 s  <<< ERROR!
java.io.FileNotFoundException: Path s3a://gabota-versioned-bucket-ireland/fork-0004/test/DELAY_LISTING_ME/testMRJob is recorded as deleted by S3Guard at 2019-06-12T12:47:07.966Z
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2857)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2827)
	at org.apache.hadoop.fs.contract.ContractTestUtils.assertIsDirectory(ContractTestUtils.java:559)
	at org.apache.hadoop.fs.contract.AbstractFSContractTestBase.assertIsDirectory(AbstractFSContractTestBase.java:327)
	at org.apache.hadoop.fs.s3a.commit.AbstractITCommitMRJob.testMRJob(AbstractITCommitMRJob.java:137)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

[ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 91.891 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitMRJob
[ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitMRJob)  Time elapsed: 38.71 s  <<< ERROR!
java.io.FileNotFoundException: Path s3a://gabota-versioned-bucket-ireland/fork-0008/test/DELAY_LISTING_ME/testMRJob is recorded as deleted by S3Guard at 2019-06-12T12:47:27.663Z
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2857)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2827)
	at org.apache.hadoop.fs.contract.ContractTestUtils.assertIsDirectory(ContractTestUtils.java:559)
	at org.apache.hadoop.fs.contract.AbstractFSContractTestBase.assertIsDirectory(AbstractFSContractTestBase.java:327)
	at org.apache.hadoop.fs.s3a.commit.AbstractITCommitMRJob.testMRJob(AbstractITCommitMRJob.java:137)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 67.194 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.staging.integration.ITestDirectoryCommitMRJob
[ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITestDirectoryCommitMRJob)  Time elapsed: 40.606 s  <<< ERROR!
java.io.FileNotFoundException: Path s3a://gabota-versioned-bucket-ireland/fork-0007/test/DELAY_LISTING_ME/testMRJob is recorded as deleted by S3Guard at 2019-06-12T12:48:25.435Z
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2857)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2827)
	at org.apache.hadoop.fs.contract.ContractTestUtils.assertIsDirectory(ContractTestUtils.java:559)
	at org.apache.hadoop.fs.contract.AbstractFSContractTestBase.assertIsDirectory(AbstractFSContractTestBase.java:327)
	at org.apache.hadoop.fs.s3a.commit.AbstractITCommitMRJob.testMRJob(AbstractITCommitMRJob.java:137)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 71.741 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITestMagicCommitMRJob
[ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITestMagicCommitMRJob)  Time elapsed: 43.631 s  <<< ERROR!
java.io.FileNotFoundException: Path s3a://gabota-versioned-bucket-ireland/fork-0008/test/testMRJob is recorded as deleted by S3Guard at 2019-06-12T12:48:54.008Z
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2857)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2827)
	at org.apache.hadoop.fs.contract.ContractTestUtils.assertIsDirectory(ContractTestUtils.java:559)
	at org.apache.hadoop.fs.contract.AbstractFSContractTestBase.assertIsDirectory(AbstractFSContractTestBase.java:327)
	at org.apache.hadoop.fs.s3a.commit.AbstractITCommitMRJob.testMRJob(AbstractITCommitMRJob.java:137)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

steveloughran · 2019-06-12T15:03:41Z

@gabor, thanks for that. I have sometimes seen that failure on ITestMagicCommitMR job, hence we now log when it was deleted. What was the actual time when the test was run?

What I can do is add some extra diags in the operations where the committers update the DDB tables on commit, because this failure implies they didn't create an entry for the parent dir.

this all happens in finishedWrite() which first calls MetastoreAddAncestors, which in DDB goes up the tree to find the first parent dir which is in the store and stops there. Then in the metastore.put() afterwards we add the new file and its parents, but skipping those where there's already an entry.

I wonder if we can/should do more here

I'll add a check in addAncestors to throw a PathIOE t if the ancestor scan finds a file. Let me know if you see it :)
we should consider whether we should do the addAncestors work at all rather than just do the put() and have it create the entire ancestor tree, rather than stop the moment it finds a parent entry in the DDB. That will implement more recovery of inconsistent state.at the cost (over the entire bulk operation) of one more ddb write per directory level entry and one fewer get for every parent which doesn't have an entry in the store

bgaborg

Thanks @steveloughran for providing this massive contribution. Look good overall, but I proposed some changes and added some notes.
Running the testst with dynamo and local I get the same issues:

[ERROR] Errors:
[ERROR]   ITestMagicCommitMRJob>AbstractITCommitMRJob.testMRJob:137->AbstractFSContractTestBase.assertIsDirectory:327 ? FileNotFound
[ERROR]   ITestDirectoryCommitMRJob>AbstractITCommitMRJob.testMRJob:137->AbstractFSContractTestBase.assertIsDirectory:327 ? FileNotFound
[ERROR]   ITestPartitionCommitMRJob>AbstractITCommitMRJob.testMRJob:137->AbstractFSContractTestBase.assertIsDirectory:327 ? FileNotFound
[ERROR]   ITestStagingCommitMRJob>AbstractITCommitMRJob.testMRJob:137->AbstractFSContractTestBase.assertIsDirectory:327 ? FileNotFound

bgaborg · 2019-06-12T14:46:42Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

-      dstMetas = new ArrayList<>();
-    }
-    // TODO S3Guard HADOOP-13761: retries when source paths are not visible yet
+    // Validation completed: time to begin the operation.


Maybe it would worth to create a separate method to the validation and all the other parts of this method that could be moved and tested separately from innerRename. This method is huge, 300+ lines; it would help the sustainability to split it up.

That I will do. It might also line me up better for when I open up the rename/3 operation which will always throw an exception on any invalid state (rather than return "false" with no explanation)

done. Not doing any tests on the validation alone, as the contract tests are expected to generate the failure conditions, but it does help isolate the two stages

bgaborg · 2019-06-12T15:02:29Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

+   * @return the store context of this FS.
+   */
+  @InterfaceAudience.Private
+  public StoreContext createStoreContext() {


This is a lot of parameters for a constructor. I think it would worth to use builder pattern for readability and sustainability.

I must disagree. The builder pattern is best for when you want to have partial config or support change where you dont want to add many, many constructors, and substitutes for Java's lack or named params in constructors (compare with: groovy, scala, python)

Here: all parameters must be supplied and this is exclusively for use in the s3a connector. Nobody should be creating these elsewhere, and if they do, not my problem if it breaks

bgaborg · 2019-06-12T15:08:28Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ObjectAttributes.java

      String key,
      S3AEncryptionMethods serverSideEncryptionAlgorithm,
      String serverSideEncryptionKey,
      String eTag,
-      String versionId) {
+      String versionId, final long len) {


nit: if it's one parameter per line we should keep that way (add len to a new line).

done. IDE refactoring at work

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/CommitOperations.java

bgaborg · 2019-06-12T15:18:14Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/StoreContext.java

+   * No attempt to use a builder here as outside tests
+   * this should only be created in the S3AFileSystem.
+   */
+  public StoreContext(


this would worth a builder pattern, and as this is new with this PR it will scale well in the future.

No, for the reasons as discussed.

if we add more args, we want 100% of uses (production, test cases) to add every param. Doing that in the refactor operations of the IDE is straightforward.

I'm thinking that as more FS-level operations are added (path to key, temp file...) I'd just add a new interface "FSLevelOperations" which we'd implement in S3A and for any test suite. This would avoid the need for a new field & constructor arg on every operation, though I'd still require code to call an explicit method for each such operation (i.e. no direct access to FSLevelOperations.

No need to do that now precisely because this is all private; we can do that on the next iteration

bgaborg · 2019-06-12T16:20:38Z

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md

@@ -1474,6 +1474,18 @@ Caused by: java.lang.NullPointerException
  ... 1 more
 ```

+### Error `Attempt to change a resource which is still in use: Table is being deleted`


as I see this is an entirely different issue, which should be resolved in a separate commit. do you plan to do that instead of squashing all your changes to the same commit?

I've been working on this patch for too long and wrapping up stuff to try and get all tests to worse; a lot of work has gone into the ITestDynamoDB there. I don't want to split them up at this late in the patch. Sorry

Worth discussing I think--I'm with @bgaborg on this. I've been on projects in the past that auto-reject conflated commits. Why not maintain a list of commits on your branch and commit them intact instead of squashing them? Git rebase -i and force push (your personal branch only) are your friends here. Gives you optimal commit history and not that hard to manage IMO. Maybe try it out next time you are working on a patch set. I wouldn't force you to go break apart these commits this late in the game, though.

I do have branches of the unsquashed PRs, so I should be able to do that. By popular requeset I will give it a go, but leave them in here. If people get the other PR in first, I'll deal with that

...tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java

hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java

...ls/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestDynamoDBMetadataStore.java

.../hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestProgressiveRenameTracker.java

hadoop-yetus · 2019-06-12T17:20:11Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	34	Docker mode activated.
		_ Prechecks _
+1	dupname	2	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 32 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	20	Maven dependency ordering for branch
+1	mvninstall	1026	trunk passed
+1	compile	1006	trunk passed
+1	checkstyle	130	trunk passed
+1	mvnsite	109	trunk passed
+1	shadedclient	896	branch has no errors when building and testing our client artifacts.
+1	javadoc	82	trunk passed
0	spotbugs	63	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	174	trunk passed
		_ Patch Compile Tests _
0	mvndep	23	Maven dependency ordering for patch
+1	mvninstall	77	the patch passed
+1	compile	963	the patch passed
+1	javac	963	the patch passed
-0	checkstyle	145	root: The patch generated 21 new + 100 unchanged - 2 fixed = 121 total (was 102)
+1	mvnsite	126	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	xml	2	The patch has no ill-formed XML file.
+1	shadedclient	683	patch has no errors when building and testing our client artifacts.
+1	javadoc	94	the patch passed
+1	findbugs	200	the patch passed
		_ Other Tests _
+1	unit	515	hadoop-common in the patch passed.
+1	unit	286	hadoop-aws in the patch passed.
+1	asflicense	53	The patch does not generate ASF License warnings.
		6679

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-951/3/artifact/out/Dockerfile
GITHUB PR	#951
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname	Linux fc5a5141e6c7 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `3b31694`
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/3/artifact/out/diff-checkstyle-root.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/3/testReport/
Max. process+thread count	1407 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/3/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2019-06-12T21:53:51Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	33	Docker mode activated.
		_ Prechecks _
+1	dupname	3	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 32 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	60	Maven dependency ordering for branch
+1	mvninstall	1057	trunk passed
+1	compile	1122	trunk passed
+1	checkstyle	147	trunk passed
+1	mvnsite	115	trunk passed
+1	shadedclient	902	branch has no errors when building and testing our client artifacts.
+1	javadoc	86	trunk passed
0	spotbugs	61	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	174	trunk passed
		_ Patch Compile Tests _
0	mvndep	26	Maven dependency ordering for patch
+1	mvninstall	75	the patch passed
+1	compile	1041	the patch passed
+1	javac	1041	the patch passed
-0	checkstyle	146	root: The patch generated 11 new + 100 unchanged - 2 fixed = 111 total (was 102)
+1	mvnsite	116	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	xml	3	The patch has no ill-formed XML file.
+1	shadedclient	621	patch has no errors when building and testing our client artifacts.
+1	javadoc	85	the patch passed
+1	findbugs	194	the patch passed
		_ Other Tests _
+1	unit	525	hadoop-common in the patch passed.
+1	unit	287	hadoop-aws in the patch passed.
+1	asflicense	51	The patch does not generate ASF License warnings.
		6875

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-951/5/artifact/out/Dockerfile
GITHUB PR	#951
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname	Linux 8781dc2ab72a 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `1732312`
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/5/artifact/out/diff-checkstyle-root.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/5/testReport/
Max. process+thread count	1463 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/5/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2019-06-12T21:56:59Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	32	Docker mode activated.
		_ Prechecks _
+1	dupname	2	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 32 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	22	Maven dependency ordering for branch
+1	mvninstall	1030	trunk passed
+1	compile	1140	trunk passed
+1	checkstyle	136	trunk passed
+1	mvnsite	111	trunk passed
+1	shadedclient	911	branch has no errors when building and testing our client artifacts.
+1	javadoc	86	trunk passed
0	spotbugs	61	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	181	trunk passed
		_ Patch Compile Tests _
0	mvndep	22	Maven dependency ordering for patch
+1	mvninstall	81	the patch passed
+1	compile	1043	the patch passed
+1	javac	1043	the patch passed
-0	checkstyle	141	root: The patch generated 11 new + 100 unchanged - 2 fixed = 111 total (was 102)
+1	mvnsite	107	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	xml	2	The patch has no ill-formed XML file.
+1	shadedclient	604	patch has no errors when building and testing our client artifacts.
+1	javadoc	88	the patch passed
+1	findbugs	198	the patch passed
		_ Other Tests _
+1	unit	515	hadoop-common in the patch passed.
+1	unit	288	hadoop-aws in the patch passed.
+1	asflicense	38	The patch does not generate ASF License warnings.
		6792

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-951/6/artifact/out/Dockerfile
GITHUB PR	#951
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname	Linux 03025a03bd22 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `1732312`
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/6/artifact/out/diff-checkstyle-root.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/6/testReport/
Max. process+thread count	1463 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/6/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2019-06-12T22:26:22Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	45	Docker mode activated.
		_ Prechecks _
+1	dupname	2	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 32 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	61	Maven dependency ordering for branch
+1	mvninstall	1141	trunk passed
+1	compile	1584	trunk passed
+1	checkstyle	294	trunk passed
+1	mvnsite	392	trunk passed
+1	shadedclient	2026	branch has no errors when building and testing our client artifacts.
+1	javadoc	328	trunk passed
0	spotbugs	83	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	417	trunk passed
		_ Patch Compile Tests _
0	mvndep	22	Maven dependency ordering for patch
+1	mvninstall	81	the patch passed
+1	compile	1593	the patch passed
+1	javac	1593	the patch passed
-0	checkstyle	288	root: The patch generated 11 new + 100 unchanged - 2 fixed = 111 total (was 102)
+1	mvnsite	420	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	xml	7	The patch has no ill-formed XML file.
+1	shadedclient	1244	patch has no errors when building and testing our client artifacts.
+1	javadoc	342	the patch passed
+1	findbugs	286	the patch passed
		_ Other Tests _
-1	unit	551	hadoop-common in the patch failed.
+1	unit	293	hadoop-aws in the patch passed.
+1	asflicense	48	The patch does not generate ASF License warnings.
		11225

Reason	Tests
Failed junit tests	hadoop.ha.TestZKFailoverControllerStress

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-951/4/artifact/out/Dockerfile
GITHUB PR	#951
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname	Linux 334677821fa6 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 13 15:00:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `1732312`
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/4/artifact/out/diff-checkstyle-root.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/4/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/4/testReport/
Max. process+thread count	1379 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/4/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

ajfabbri

Still a couple more files to look at.. here's my comments so far.

ajfabbri · 2019-06-12T23:20:47Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java

+  /**
+   * Saves metadata for exactly one path, potentially
+   * using any bulk operation state to eliminate duplicate work.
+   *


So, if operationState is not null, can implementations may defer write to metastore until a later time, or must they write immediately but are allowed to elide subsequent "duplicate" writes? I don't think the "when is it durable" contract is super important currently, but might be worth clarifying if you roll another version of the patch.

None of the metastores are doing anything with delayed operations, just tracking it. Clarified in the javadocs

ajfabbri · 2019-06-12T23:56:59Z

...op-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/PathOrderComparators.java

+
+    @Override
+    public int compare(Path pathL, Path pathR) {
+      // exist fast on equal values.


nit: "exit fast"

ajfabbri · 2019-06-13T00:03:00Z

...op-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/PathOrderComparators.java

+      }
+      if (compare > 0) {
+        return -1;
+      }


Could this function just be return -super.compare(pathL, pathR)?

you'd think so, but when I tried my sort tests were failing -and even stepping through the code with the debugger couldn't work out why. Doing in like this fixed the tests

ajfabbri · 2019-06-13T00:13:48Z

...ools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/ProgressiveRenameTracker.java

+      }
+    }
+
+    // outside the lock, the entriesToAdd list has all new entries to create.


which chunks of data is this lock protecting? Since these are vanilla Lists you need a lock to read as well or you get undefined results, right?

this is just a variable, the list of new entries to add. I exit the synchronized(this) block so that the move call doesn't block.
Looking at it, I think I'll add DurationInfo around it , and some more comments as to what is happening

ajfabbri · 2019-06-13T00:15:26Z

...ools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/ProgressiveRenameTracker.java

+    // and finish off; including deleting source directories.
+    // TODO: is this making too many calls?
+    LOG.debug("Rename completed for {}", this);
+    getMetadataStore().move(pathsToDelete, destMetas, getOperationState());


No lock here on a read to pathsToDelete, etc. Per previous comment want to understand which data you are guarding with lock so we can ensure we have coverage.

ajfabbri · 2019-06-13T00:32:29Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3Guard.java

+   * Originally it implemented the logic to probe for an add ancestors,
+   * but with the addition of a store-specific bulk operation state
+   * it became unworkable.
+   *


Wouldn't hold up this patch for it, but am curious how this became unworkable. I imagine de-duping metadata writes (ancestors) using a bulk context, such that S3A repeating ancestor writes within an op are not an issue.

Again, always trying to keep MetadataStore as simple as possible and specific to logging metadata operations on a FileSystem.

ajfabbri · 2019-06-13T00:36:22Z

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md

@@ -1474,6 +1474,18 @@ Caused by: java.lang.NullPointerException
  ... 1 more
 ```

+### Error `Attempt to change a resource which is still in use: Table is being deleted`


Worth discussing I think--I'm with @bgaborg on this. I've been on projects in the past that auto-reject conflated commits. Why not maintain a list of commits on your branch and commit them intact instead of squashing them? Git rebase -i and force push (your personal branch only) are your friends here. Gives you optimal commit history and not that hard to manage IMO. Maybe try it out next time you are working on a patch set. I wouldn't force you to go break apart these commits this late in the game, though.

ajfabbri

Ok I think I've gone through everything. I could spend more time meditating on the new rename code but these are my comments so far. Most are nits or discussion for fun, but there was one question about synchronization that I'd like clarification on.

...p-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/DynamoDBMetadataStore.java

ajfabbri · 2019-06-13T06:34:10Z

...p-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/DynamoDBMetadataStore.java

+   * lowest entry first.
+   *
+   * This is to ensure that if a client failed partway through the update,
+   * there will no entries in the table which lack parent entries.


Nice use of topological sorting here.

thanks. We do need it, just for the extra resilience

ajfabbri · 2019-06-13T06:47:53Z

...tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/ITestPartialRenamesDeletes.java

+ * An attempt is made in {@link #deleteTestDirInTeardown()} to prune these test
+ * files.
+ */
+@SuppressWarnings("ThrowableNotThrown")


ajfabbri · 2019-06-13T07:03:16Z

...tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/ITestPartialRenamesDeletes.java

+    assertIsDirectory(readOnlyDir);
+    Path renamedDestPath = new Path(readOnlyDir, writableDir.getName());
+    assertRenameOutcome(roleFS, writableDir, readOnlyDir, true);
+    assertIsFile(renamedDestPath);


These tests seem to accomplish what you need. Did you think of reaching below into the MetadataStore (via S3AFileSystem.getMetadataStore() and asserting on its state after failures? I don't see a specific need (you are getting good coverage through the FS)... just wondering if there are additional cases we could expose that way.

no, didn't actually. good point though

hadoop-yetus · 2019-06-13T20:09:34Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	34	Docker mode activated.
		_ Prechecks _
+1	dupname	2	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 32 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	58	Maven dependency ordering for branch
+1	mvninstall	1117	trunk passed
+1	compile	1046	trunk passed
+1	checkstyle	138	trunk passed
+1	mvnsite	125	trunk passed
+1	shadedclient	944	branch has no errors when building and testing our client artifacts.
+1	javadoc	97	trunk passed
0	spotbugs	68	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	188	trunk passed
		_ Patch Compile Tests _
0	mvndep	23	Maven dependency ordering for patch
+1	mvninstall	78	the patch passed
+1	compile	989	the patch passed
+1	javac	989	the patch passed
-0	checkstyle	143	root: The patch generated 11 new + 100 unchanged - 2 fixed = 111 total (was 102)
+1	mvnsite	120	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	xml	2	The patch has no ill-formed XML file.
+1	shadedclient	691	patch has no errors when building and testing our client artifacts.
+1	javadoc	105	the patch passed
+1	findbugs	189	the patch passed
		_ Other Tests _
+1	unit	496	hadoop-common in the patch passed.
+1	unit	292	hadoop-aws in the patch passed.
+1	asflicense	38	The patch does not generate ASF License warnings.
		6959

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-951/7/artifact/out/Dockerfile
GITHUB PR	#951
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname	Linux 61d3cd8d79bb 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `bcfd228`
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/7/artifact/out/diff-checkstyle-root.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/7/testReport/
Max. process+thread count	1463 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/7/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

mackrorysd · 2019-06-13T22:23:26Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3GuardTool.java


      // check if UNGUARDED_FLAG is passed and use NullMetadataStore in
      // config to avoid side effects like creating the table if not exists
+      Configuration conf0 = getConf();


Please rename this to unguardedConf or something like that

This is commit a22fc98 merged atop trunk and the HADOOP-16279 OOB Delete JIRA. * OperationState as arg to the put operations * Still some tuning/review of AddAncestors needed from where it was pushed into the metastore (so it can use/update the ancestors). Change-Id: Idf34e5c7e88a765aa0aeadccd4f9bffdc8bca420

Change-Id: I5b0e5f0991cd26429f5ab9463073f79220ac9bd2

steveloughran · 2019-06-17T14:09:38Z

I'm going to close this and re-open a new patch with everything merged atop the OOB patch. It's not that they conflict functionality-wise, it's just as they both pass down a param to the metastore put operations, they create merge conflict.

FWIW, I'm now unsure why the TTL needs to go down, rather than set during the init phase

…ameOperation class, as requested. This does provide just one place to look at the code. There are eight methods in S3AFileSystem used during the rename. I've created an interface for this and the inner class for S3AFS which bridges to them. Looking at the methods you can see what things we should export in a lower-down layer in the S3AFS refactoring -ideally these should all be one level down. This takes most of the new code out of the S3AFS, though the new callback interface adds some again. What is key is that 1. The new algorithm for renaming is in its own class, with src and dest params all as final fields. 2. broken up into separate methods for file and dir rename 3. and with helper methods for queuing operations etc. The StoreContext has backed off from lambda-expressions to invoke S3AFS ops as they were getting too many, moving to an interface and again, an implementation, ContextAccessors This adds some more code in the S3AFS, but it makes it easier to see how the methods are being used, while still allowing tests to provide their own mock implementation class. + InternalConstants class for internal constants + Move the FunctionsRaisingIOE over to fs.impl. With the move to interfaces over l-expressions these aren't being used so much, but can be picked up elsewhere. Marked as private/unstable * slightly better diags for the AbstractCommitTerasortITests on a missing _SUCCESS marker; and a cleanup operation at the end to delete the files (the normal per-fork paths aren't used so they can avoid deletion. * AbstractStoreOperation implements getConf() as it's that common to use. Change-Id: I9e4420d343fb87422779c11ce89fe7710edd180c

steveloughran · 2019-06-17T20:00:44Z

ok, I untintentionally forced push rather than closed. Hopefully that won't be too disruptive. If need be I can switch to a new PR

hadoop-yetus · 2019-06-17T22:10:25Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	45	Docker mode activated.
		_ Prechecks _
+1	dupname	2	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 34 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	70	Maven dependency ordering for branch
+1	mvninstall	1163	trunk passed
+1	compile	1062	trunk passed
+1	checkstyle	151	trunk passed
+1	mvnsite	125	trunk passed
+1	shadedclient	1087	branch has no errors when building and testing our client artifacts.
+1	javadoc	95	trunk passed
0	spotbugs	137	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	291	trunk passed
-0	patch	199	Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
		_ Patch Compile Tests _
0	mvndep	33	Maven dependency ordering for patch
+1	mvninstall	89	the patch passed
+1	compile	1268	the patch passed
+1	javac	1268	the patch passed
-0	checkstyle	157	root: The patch generated 40 new + 109 unchanged - 2 fixed = 149 total (was 111)
+1	mvnsite	129	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	xml	2	The patch has no ill-formed XML file.
+1	shadedclient	783	patch has no errors when building and testing our client artifacts.
-1	javadoc	33	hadoop-tools_hadoop-aws generated 4 new + 1 unchanged - 0 fixed = 5 total (was 1)
+1	findbugs	204	the patch passed
		_ Other Tests _
+1	unit	548	hadoop-common in the patch passed.
+1	unit	304	hadoop-aws in the patch passed.
+1	asflicense	53	The patch does not generate ASF License warnings.
		7848

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-951/8/artifact/out/Dockerfile
GITHUB PR	#951
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname	Linux a9f8d5df3c95 4.4.0-144-generic #170~14.04.1-Ubuntu SMP Mon Mar 18 15:02:05 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `3d020e9`
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/8/artifact/out/diff-checkstyle-root.txt
javadoc	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/8/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/8/testReport/
Max. process+thread count	1715 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/8/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

…d from a get. This is to debug why some of the root contract tests are failing. I'm going to switch to trunk to see if it has the issue Change-Id: I3d7a177495e83880a179bc76ab81c8dfbaf0a53e

steveloughran · 2019-06-18T10:25:40Z

Last little refactoring caused whitespace issues. Will fix

./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/InternalConstants.java:24:public class InternalConstants {:1: Utility classes should not have a public or default constructor. [HideUtilityClassConstructor]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java:153:  public RenameOperation(:10: More than 7 parameters (found 8). [ParameterNumber]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java:399:      final S3ALocatedFileStatus sourceStatus,:34: 'sourceStatus' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java:438:  private Path copySourceAndUpdateTracker(:16: More than 7 parameters (found 8). [ParameterNumber]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java:439:      final RenameTracker renameTracker,:27: 'renameTracker' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java:440:      final Path sourcePath,:18: 'sourcePath' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java:444:      final Path destPath,:18: 'destPath' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java:445:      final String destKey,:20: 'destKey' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java:487:      final RenameTracker renameTracker,:27: 'renameTracker' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java:488:      final List<DeleteObjectsRequest.KeyVersion> keysToDelete,:51: 'keysToDelete' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java:489:      final List<Path> pathsToDelete):24: 'pathsToDelete' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java:541:        final Path path,:9: Redundant 'final' modifier. [RedundantModifier]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java:542:        final String eTag,:9: Redundant 'final' modifier. [RedundantModifier]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java:543:        final String versionId,:9: Redundant 'final' modifier. [RedundantModifier]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java:544:        final long len);:9: Redundant 'final' modifier. [RedundantModifier]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java:552:        final S3AFileStatus fileStatus);:9: Redundant 'final' modifier. [RedundantModifier]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java:561:        final FileStatus fileStatus);:9: Redundant 'final' modifier. [RedundantModifier]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java:563:    /**: First sentence should end with a period. [JavadocStyle]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java:633:        final List<DeleteObjectsRequest.KeyVersion> keysToDelete,:9: Redundant 'final' modifier. [RedundantModifier]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java:634:        final boolean deleteFakeDir,:9: Redundant 'final' modifier. [RedundantModifier]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RenameOperation.java:635:        final List<Path> undeletedObjectsOnFailure):9: Redundant 'final' modifier. [RedundantModifier]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/StoreContext.java:119:  public StoreContext(:10: More than 7 parameters (found 17). [ParameterNumber]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3Guard.java:50:import org.apache.hadoop.fs.s3a.Tristate;:8: Unused import - org.apache.hadoop.fs.s3a.Tristate. [UnusedImports]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3Guard.java:525:   * {@link MetadataStore#addAncestors(Path, ITtlTimeProvider, BulkOperationState)}.: Line is longer than 80 characters (found 84). [LineLength]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3GuardTool.java:1163:        clearBucketOption(unguardedConf, fsURI.getHost(), S3_METADATA_STORE_IMPL);: Line is longer than 80 characters (found 82). [LineLength]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/terasort/AbstractCommitTerasortIT.java:243:  public void test_200_directory_deletion() throws Throwable {:15: Name 'test_200_directory_deletion' must match pattern '^[a-z][a-zA-Z0-9]*$'. [MethodName]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/TestPartialDeleteFailures.java:74:  private static final ContextAccessors contextAccessors:41: Name 'contextAccessors' must match pattern '^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$'. [ConstantName]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/TestPartialDeleteFailures.java:349:    public BulkOperationState initiateBulkWrite(final BulkOperationState.OperationType operation,: Line is longer than 80 characters (found 97). [LineLength]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestDynamoDBMetadataStore.java:988:          () -> ddbms.prune(PruneMode.ALL_BY_MODTIME,0));:53: ',' is not followed by whitespace. [WhitespaceAfter]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestDynamoDBMetadataStore.java:1169:        null );:14: ')' is preceded with whitespace. [ParenPad]

* the add ancestors code is now pushed down into the stores, so they can use any bulk state tracking * which also means that the stores need to do the patching of TTL values (now done) * but they also need to do it in Put when completing ancestors there (not done) * and the state tracking in DDB, with the addAncestor integration, doesn't overwrite grandparent paths which have been overwritten with a tombstone Change-Id: I4009ed5ef03549453db2e8c7389903e4c66114a8

hadoop-yetus · 2019-06-18T19:47:50Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	513	Docker mode activated.
		_ Prechecks _
+1	dupname	3	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 34 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	68	Maven dependency ordering for branch
+1	mvninstall	1021	trunk passed
+1	compile	1104	trunk passed
+1	checkstyle	141	trunk passed
+1	mvnsite	117	trunk passed
+1	shadedclient	970	branch has no errors when building and testing our client artifacts.
+1	javadoc	91	trunk passed
0	spotbugs	63	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	185	trunk passed
-0	patch	95	Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
		_ Patch Compile Tests _
0	mvndep	23	Maven dependency ordering for patch
+1	mvninstall	78	the patch passed
+1	compile	1065	the patch passed
+1	javac	1065	the patch passed
-0	checkstyle	144	root: The patch generated 18 new + 107 unchanged - 4 fixed = 125 total (was 111)
+1	mvnsite	108	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	xml	2	The patch has no ill-formed XML file.
+1	shadedclient	657	patch has no errors when building and testing our client artifacts.
-1	javadoc	27	hadoop-tools_hadoop-aws generated 4 new + 1 unchanged - 0 fixed = 5 total (was 1)
+1	findbugs	208	the patch passed
		_ Other Tests _
+1	unit	543	hadoop-common in the patch passed.
+1	unit	285	hadoop-aws in the patch passed.
+1	asflicense	48	The patch does not generate ASF License warnings.
		7472

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-951/9/artifact/out/Dockerfile
GITHUB PR	#951
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname	Linux 9e5c25f7ae9b 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `b14f056`
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/9/artifact/out/diff-checkstyle-root.txt
javadoc	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/9/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/9/testReport/
Max. process+thread count	1393 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/9/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

* DynamoDB.completeAncestry() sets the TTL on entries it creates * DynamoDB.addAncestors() doesn't just stop at the first entry, it goes up the tree. When it finds a tombstone or missing entry as a parent of a vaid entry it logs @ warn and adds to the list As a result, there's now O(depth) gets in every finishedWrite, where before it stopped at the first entry (bad) but in completeAncestry() a put was being done at O(depth) anyway. S3AFileSystem.finishedWrite() calls addAncestors before put(). For a bulk commit, because a bulk operation spans the add and the put, there's no duplication, the cost of a write is less: its O(depth) with the put operations for only those files. For a normal single file write we can do the same by creating a temp bulk write instance and using it purely for the single operation. It does seem overkill, but it lets us glue together both operations in the sequence, which is the whole point. Change-Id: I48ad7b2657b0d14fffc0318934af73bde8368482

hadoop-yetus · 2019-06-19T00:20:21Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	31	Docker mode activated.
		_ Prechecks _
+1	dupname	2	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 34 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	22	Maven dependency ordering for branch
+1	mvninstall	1024	trunk passed
+1	compile	1071	trunk passed
+1	checkstyle	144	trunk passed
+1	mvnsite	130	trunk passed
+1	shadedclient	1010	branch has no errors when building and testing our client artifacts.
+1	javadoc	100	trunk passed
0	spotbugs	67	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	182	trunk passed
-0	patch	107	Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
		_ Patch Compile Tests _
0	mvndep	23	Maven dependency ordering for patch
+1	mvninstall	79	the patch passed
+1	compile	1024	the patch passed
+1	javac	1024	the patch passed
-0	checkstyle	140	root: The patch generated 19 new + 107 unchanged - 4 fixed = 126 total (was 111)
+1	mvnsite	126	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	xml	3	The patch has no ill-formed XML file.
+1	shadedclient	688	patch has no errors when building and testing our client artifacts.
-1	javadoc	55	hadoop-tools_hadoop-aws generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1)
+1	findbugs	206	the patch passed
		_ Other Tests _
+1	unit	519	hadoop-common in the patch passed.
+1	unit	292	hadoop-aws in the patch passed.
+1	asflicense	52	The patch does not generate ASF License warnings.
		7025

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-951/10/artifact/out/Dockerfile
GITHUB PR	#951
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname	Linux db3ed9314a41 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `37bd5bb`
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/10/artifact/out/diff-checkstyle-root.txt
javadoc	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/10/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/10/testReport/
Max. process+thread count	1413 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/10/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

…d ancestors and put call are integrated finishedWrite() now creates a BulkUpdate if one wasn't already present, closes it afterwards. This is to ensure that the findings of the addAncestors call are used in the putAndReturn call, which will not add a PUT request for all entries we know exists. Makes DDB cost of writing a single file depth * GET + (1+ missing parent count) * PUT. Before: depth * PUT as well as extra GET/PUT calls in addAncestors. PUTs cost more than GET calls, so this is a net saving Failing test ITestCommitOperations was tracked down to clock skew triggering a writeback of the getFileStatus result on the probes after the first commit, so causing an intermittent failure in parallel test runs (under load == worse skew). Filed HADOOP-16382 for the underlying issue; for now simply resetting the MetricDiff counter after the various probes. Change-Id: I85f60bc517cb0ae683961607f1f48b6f35a7004b

steveloughran · 2019-06-19T14:03:55Z

Latest patch: doing full matrix of test runs (s3guard/non, local/ddb, auth/non-auth)

S3AFileSystem.finishedWrite() now initiates a BulkUpdate if one wasn't already present and closes it afterwards. This is to ensure that the findings of the addAncestors call are used in the putAndReturn call, which will not add a PUT request for all entries we know exists. This makes the DDB cost of writing a single file depth * GET + (1+ missing parent count) * PUT. Before: depth * PUT as well as extra GET/PUT calls in addAncestors. PUTs cost more than GET calls, so this is a net saving

Failing test ITestCommitOperations was tracked down to clock skew triggering a writeback of the getFileStatus result on the probes after the first commit, so causing an intermittent failure in parallel test runs (under load == worse skew).

Filed HADOOP-16382 for the underlying issue; for now simply resetting the MetricDiff counter after the various probes.

I'm reaching that point where I can't see any more issues, and really need the insight/approval/criticism of others. In particular

Is the ancestor tracking efficient and yet sufficient? It aims to eliminate the many spurious parent entries put in s3a commit operations and in parallel renames, as well as in simple file writes.
Does the metadata update strategy in ProgressiveRenameTracker hold together?
Is the rename algorithm in org.apache.hadoop.fs.s3a.impl.RenameOperation understandable and correct?

feedback strongly encouraged.

hadoop-yetus · 2019-06-19T15:50:23Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	31	Docker mode activated.
		_ Prechecks _
+1	dupname	2	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 34 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	45	Maven dependency ordering for branch
+1	mvninstall	1084	trunk passed
+1	compile	1154	trunk passed
+1	checkstyle	144	trunk passed
+1	mvnsite	118	trunk passed
+1	shadedclient	971	branch has no errors when building and testing our client artifacts.
+1	javadoc	87	trunk passed
0	spotbugs	62	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	187	trunk passed
-0	patch	95	Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
		_ Patch Compile Tests _
0	mvndep	24	Maven dependency ordering for patch
+1	mvninstall	82	the patch passed
+1	compile	1068	the patch passed
+1	javac	1068	the patch passed
-0	checkstyle	140	root: The patch generated 16 new + 107 unchanged - 4 fixed = 123 total (was 111)
+1	mvnsite	118	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	xml	2	The patch has no ill-formed XML file.
+1	shadedclient	688	patch has no errors when building and testing our client artifacts.
+1	javadoc	90	the patch passed
+1	findbugs	211	the patch passed
		_ Other Tests _
+1	unit	545	hadoop-common in the patch passed.
+1	unit	282	hadoop-aws in the patch passed.
+1	asflicense	43	The patch does not generate ASF License warnings.
		7126

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-951/11/artifact/out/Dockerfile
GITHUB PR	#951
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname	Linux 223b95d9426e 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `d3ac516`
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/11/artifact/out/diff-checkstyle-root.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/11/testReport/
Max. process+thread count	1393 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/11/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

* minimising diff between trunk and branch * completeAncestry doesn't break on first ancestor found, it continues up the path. This is due diligence: I haven't encountered problem which arise from not doing this, I'm just making sure that we make sure those parents exist. Because operations now span >1 write, and the normal file write includes the addAncestors check which builds up the same list including with probes for the files actually existing. Change-Id: I2edf9de75ea2546de7f97322ee0bcf838dd7591b

steveloughran · 2019-06-19T17:15:44Z

need to create the TTL time provider in the DB for a non-FS init, else you get an NPE in the CLI prune

~/P/h/h/t/hadoop-3.3.0-SNAPSHOT (s3/HADOOP-15183-s3guard-rename-failures ⚡↩☡+) bin/hadoop s3guard prune s3a://hwdev-steve-ireland-new/2019-06-19 18:12:59,376 [main] INFO  s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(320)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/hwdev-steve-ireland-new} is initialized. java.lang.NullPointerException
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.completeAncestry(DynamoDBMetadataStore.java:858)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.innerPut(DynamoDBMetadataStore.java:1226)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.removeAuthoritativeDirFlag(DynamoDBMetadataStore.java:1569)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.innerPrune(DynamoDBMetadataStore.java:1497)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.prune(DynamoDBMetadataStore.java:1461)
	at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool$Prune.run(S3GuardTool.java:1094)
	at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:400)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:1659)
	at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.main(S3GuardTool.java:1668)
2019-06-19 18:12:59,945 [main] INFO  util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status -1: java.lang.NullPointerException

mackrorysd

I have some comments & questions attached, and I have some opinions about how RenameOperation could be better, and I'm dreading the backport pain that factoring rename() out is going to cause, but it's only going to get worse, so let's do it! I'm a +1 to committing if there's nothing else blocking it, and I think it's time to move on and address everything else independently.

mackrorysd · 2019-06-19T17:17:09Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

@@ -418,9 +434,11 @@ private void initThreadPools(Configuration conf) {
    unboundedThreadPool = new ThreadPoolExecutor(
        maxThreads, Integer.MAX_VALUE,


This is where I had suggested we drop the first argument to 0 (for boundedThreadPool too) as that's core threads, not max threads. Otherwise we actually lock ourselves at the maximum and grow from there. Only if you rebuild and retest anyway - otherwise I'll submit a patch once this is in to avoid further conflicts...

mackrorysd · 2019-06-19T17:18:18Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

@@ -689,6 +707,7 @@ public String getBucketLocation() throws IOException {
   * @return the region in which a bucket is located
   * @throws IOException on any failure.
   */
+  @VisibleForTesting


... I could kinda see clients wanting to use this. I know for HBOSS I almost did when I was toying with a potential DynamoDB locking implementation..

The new storecontext API exports this as an on-demand operation.: fs.createStoreContext().getBucketLocation()

mackrorysd · 2019-06-19T17:20:20Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/LocalMetadataStore.java

+  public void move(
+      @Nullable Collection<Path> pathsToDelete,
+      @Nullable Collection<PathMetadata> pathsToCreate,
+        ITtlTimeProvider ttlTimeProvider,


I still don't immediately seeing ttlTimeProvider being used everywhere it's passed. Did we get to the bottom of that? Now's the time to remove it, maybe.

Yes we did. I started a discussion about this if we want to pass it in the metastore init instead of every method which will use it. We ended up passing it to every method instead of the init method. We need to fix that. If that is not fixed in this pr I will fix it tomorrow under a new issue.

Makes sense. If you need to override for a single test operation, can always add a @VisibleForTesting setter.

ajfabbri · 2019-06-19T20:46:27Z

...p-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/DynamoDBMetadataStore.java

@@ -899,6 +915,9 @@ public void addAncestors(
      // a directory entry will go in.
      PathMetadata directory = get(parent);
      if (directory == null || directory.isDeleted()) {
+        if (entryFound) {
+          LOG.warn("Inconsistent S3Guard table: adding directory {}", parent);
+        }


interesting change to this function. slower but more robust (the removed break below, that is, not this log message)

also, we might as well do the depth(path) get operations in parallel if they always happen, and the break behavior you remove is not configurable. In terms of write latency it would remove depth(path)-1 round trips (approx.). Proposing this as a followup JIRA, not doing it here.

bgaborg · 2019-06-19T20:51:23Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java

+
+  // the maximum number of tasks cached if all threads are already uploading
+  public static final String MAX_TOTAL_TASKS = "fs.s3a.max.total.tasks";
+


Nit: remove empty line

…e a time source Change-Id: Ic3ec71dc1d4c7bef4866ca4d598c20aba4e17575

ajfabbri

Changes since last review LGTM. +1 overall.

ajfabbri · 2019-06-19T20:48:29Z

...p-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/DynamoDBMetadataStore.java

+      @Nullable ITtlTimeProvider ttlTimeProvider) {
+    return ttlTimeProvider != null ? ttlTimeProvider : timeProvider;
+  }
+


As we discussed w/ @bgaborg @mackrorysd this will go away soon, and is fine for now.

ajfabbri · 2019-06-19T20:54:55Z

...p-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/DynamoDBMetadataStore.java

@@ -899,6 +915,9 @@ public void addAncestors(
      // a directory entry will go in.
      PathMetadata directory = get(parent);
      if (directory == null || directory.isDeleted()) {
+        if (entryFound) {
+          LOG.warn("Inconsistent S3Guard table: adding directory {}", parent);
+        }


also, we might as well do the depth(path) get operations in parallel if they always happen, and the break behavior you remove is not configurable. In terms of write latency it would remove depth(path)-1 round trips (approx.). Proposing this as a followup JIRA, not doing it here.

hadoop-yetus · 2019-06-20T00:13:38Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	1490	Docker mode activated.
		_ Prechecks _
+1	dupname	2	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 34 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	91	Maven dependency ordering for branch
+1	mvninstall	1278	trunk passed
+1	compile	1403	trunk passed
+1	checkstyle	149	trunk passed
+1	mvnsite	147	trunk passed
+1	shadedclient	1093	branch has no errors when building and testing our client artifacts.
+1	javadoc	101	trunk passed
0	spotbugs	68	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	198	trunk passed
-0	patch	120	Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
		_ Patch Compile Tests _
0	mvndep	25	Maven dependency ordering for patch
+1	mvninstall	83	the patch passed
+1	compile	1169	the patch passed
+1	javac	1169	the patch passed
-0	checkstyle	139	root: The patch generated 16 new + 107 unchanged - 4 fixed = 123 total (was 111)
+1	mvnsite	117	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	xml	3	The patch has no ill-formed XML file.
+1	shadedclient	656	patch has no errors when building and testing our client artifacts.
+1	javadoc	86	the patch passed
+1	findbugs	201	the patch passed
		_ Other Tests _
+1	unit	550	hadoop-common in the patch passed.
+1	unit	307	hadoop-aws in the patch passed.
+1	asflicense	47	The patch does not generate ASF License warnings.
		9345

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-951/12/artifact/out/Dockerfile
GITHUB PR	#951
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname	Linux e1d78ead575e 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / 71ecd2e
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/12/artifact/out/diff-checkstyle-root.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/12/testReport/
Max. process+thread count	1386 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-951/12/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

steveloughran · 2019-06-20T08:54:34Z

thanks for the reviews; I will commit as is and file some followups

doing that depth check in parallel: nice
moving off the TTL per arg. I did actually do a revision of the patch with that but reverted it because those tests which were patching the TTL were failing. Merge this in and Gabor can make use of the IDE's refactor-delete-argument feature.
Extra prune resilience. Now there are checks for inconsistency on bulk writeback, when your table is in a bit of a mess, things go wrong. As my table is in that state I have a great opportunity to debug this by writing new tests

bgaborg · 2019-06-20T11:22:07Z

Tested it with -Dscale against ireland. I have the following failures:

[ERROR] Failures:
[ERROR]   ITestS3AEmptyDirectory.testDirectoryBecomesEmpty:48->assertEmptyDirectory:56->Assert.assertEquals:118->Assert.failNotEquals:834->Assert.fail:88 dir is empty expected:<TRUE> but was:<FALSE>
[ERROR] Errors:
[ERROR]   ITestMagicCommitMRJob>AbstractITCommitMRJob.testMRJob:137->AbstractFSContractTestBase.assertIsDirectory:327 ? FileNotFound
[ERROR]   ITestDirectoryCommitMRJob>AbstractITCommitMRJob.testMRJob:137->AbstractFSContractTestBase.assertIsDirectory:327 ? FileNotFound
[ERROR]   ITestPartitionCommitMRJob>AbstractITCommitMRJob.testMRJob:137->AbstractFSContractTestBase.assertIsDirectory:327 ? FileNotFound
[ERROR]   ITestStagingCommitMRJob>AbstractITCommitMRJob.testMRJob:137->AbstractFSContractTestBase.assertIsDirectory:327 ? FileNotFound
[INFO]
[ERROR] Tests run: 1023, Failures: 1, Errors: 4, Skipped: 130

This is new for me:

ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 11.097 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3AEmptyDirectory
[ERROR] testDirectoryBecomesEmpty(org.apache.hadoop.fs.s3a.ITestS3AEmptyDirectory)  Time elapsed: 4.515 s  <<< FAILURE!
java.lang.AssertionError: dir is empty expected:<TRUE> but was:<FALSE>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:834)
	at org.junit.Assert.assertEquals(Assert.java:118)
	at org.apache.hadoop.fs.s3a.ITestS3AEmptyDirectory.assertEmptyDirectory(ITestS3AEmptyDirectory.java:56)
	at org.apache.hadoop.fs.s3a.ITestS3AEmptyDirectory.testDirectoryBecomesEmpty(ITestS3AEmptyDirectory.java:48)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

We know about the other 3 testMRJob. I'm not happy that we have those issue, but we know about that. Have we created an issue already to stabilize those?

I see some failures in the sequential-integration-tests as well, but those are still running. Not just timeouts - e.g. [ERROR] Tests run: 7, Failures: 1, Errors: 2, Skipped: 0, Time elapsed: 72.408 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.terasort.ITestTerasortMagicCommitter. I'll comment with the results once those are completed.

bgaborg · 2019-06-20T13:09:58Z

I have the sequential-integration-tests issues:

[ERROR] Failures:
[ERROR]   ITestTerasortMagicCommitter>AbstractCommitTerasortIT.test_110_teragen:167->AbstractCommitTerasortIT.executeStage:143->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88 Teragen(1000, s3a://gabota-versioned-bucket-ireland/terasort-ITestTerasortMagicCommitter/sortin) failed expected:<0> but was:<1>
[ERROR] Errors:
[ERROR]   ITestS3AContractRootDir>AbstractContractRootDirectoryTest.testRecursiveRootListing:219 ? TestTimedOut
[ERROR]   ITestS3AContractRootDir>AbstractContractRootDirectoryTest.testRmEmptyRootDirNonRecursive:95 ? TestTimedOut
[ERROR]   ITestTerasortMagicCommitter>AbstractCommitTerasortIT.test_120_terasort:177->AbstractCommitITest.loadSuccessFile:499 ? FileNotFound
[ERROR]   ITestTerasortMagicCommitter>AbstractCommitTerasortIT.test_130_teravalidate:192->AbstractCommitITest.loadSuccessFile:499 ? FileNotFound
[ERROR]   ITestDynamoDBMetadataStoreScale.lambda$execute$10:494->lambda$test_040_get$1:296 ? FileNotFound

I'm a bit worried about the FNFEs here:

[ERROR] Tests run: 11, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 373.769 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale
[ERROR] test_040_get(org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale)  Time elapsed: 7.065 s  <<< ERROR!
java.io.FileNotFoundException: get on s3a://example.org/get: com.amazonaws.services.dynamodbv2.model.ResourceNotFoundException: Requested resource not found (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ResourceNotFoundException; Request ID: 2IOBP94FLJ86OIPKUUPDVR40G7VV4KQNSO5AEMVJF66Q9ASUAAJG)
	at org.apache.hadoop.fs.s3a.S3AUtils.translateDynamoDBException(S3AUtils.java:424)
	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:206)
	at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:111)
	at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:314)
	at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:406)
	at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:310)
	at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:285)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.getConsistentItem(DynamoDBMetadataStore.java:644)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.innerGet(DynamoDBMetadataStore.java:688)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.get(DynamoDBMetadataStore.java:666)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.lambda$test_040_get$1(ITestDynamoDBMetadataStoreScale.java:296)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.lambda$execute$10(ITestDynamoDBMetadataStoreScale.java:494)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazonaws.services.dynamodbv2.model.ResourceNotFoundException: Requested resource not found (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ResourceNotFoundException; Request ID: 2IOBP94FLJ86OIPKUUPDVR40G7VV4KQNSO5AEMVJF66Q9ASUAAJG)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
	at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:4279)
	at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:4246)
	at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeGetItem(AmazonDynamoDBClient.java:2054)
	at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.getItem(AmazonDynamoDBClient.java:2020)
	at com.amazonaws.services.dynamodbv2.document.internal.GetItemImpl.doLoadItem(GetItemImpl.java:77)
	at com.amazonaws.services.dynamodbv2.document.internal.GetItemImpl.getItem(GetItemImpl.java:66)
	at com.amazonaws.services.dynamodbv2.document.Table.getItem(Table.java:608)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.lambda$getConsistentItem$3(DynamoDBMetadataStore.java:649)
	at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
	... 13 more

[ERROR] test_130_teravalidate(org.apache.hadoop.fs.s3a.commit.terasort.ITestTerasortMagicCommitter)  Time elapsed: 1.609 s  <<< ERROR!
java.io.FileNotFoundException: Expected file: not found s3a://gabota-versioned-bucket-ireland/terasort-ITestTerasortMagicCommitter/sortout/_SUCCESS in s3a://gabota-versioned-bucket-ireland/terasort-ITestTerasortMagicCommitter/sortout
	at org.apache.hadoop.fs.contract.ContractTestUtils.verifyPathExists(ContractTestUtils.java:940)
	at org.apache.hadoop.fs.contract.ContractTestUtils.assertPathExists(ContractTestUtils.java:918)
	at org.apache.hadoop.fs.contract.ContractTestUtils.assertIsFile(ContractTestUtils.java:826)
	at org.apache.hadoop.fs.s3a.commit.AbstractCommitITest.loadSuccessFile(AbstractCommitITest.java:499)
	at org.apache.hadoop.fs.s3a.commit.terasort.AbstractCommitTerasortIT.test_130_teravalidate(AbstractCommitTerasortIT.java:192)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: No such file or directory: s3a://gabota-versioned-bucket-ireland/terasort-ITestTerasortMagicCommitter/sortout/_SUCCESS
	at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2804)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2693)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2589)
	at org.apache.hadoop.fs.contract.ContractTestUtils.verifyPathExists(ContractTestUtils.java:934)
	... 19 more

[ERROR] test_120_terasort(org.apache.hadoop.fs.s3a.commit.terasort.ITestTerasortMagicCommitter)  Time elapsed: 1.658 s  <<< ERROR!
java.io.FileNotFoundException: Expected file: not found s3a://gabota-versioned-bucket-ireland/terasort-ITestTerasortMagicCommitter/sortin/_SUCCESS in s3a://gabota-versioned-bucket-ireland/terasort-ITestTerasortMagicCommitter/sortin
	at org.apache.hadoop.fs.contract.ContractTestUtils.verifyPathExists(ContractTestUtils.java:940)
	at org.apache.hadoop.fs.contract.ContractTestUtils.assertPathExists(ContractTestUtils.java:918)
	at org.apache.hadoop.fs.contract.ContractTestUtils.assertIsFile(ContractTestUtils.java:826)
	at org.apache.hadoop.fs.s3a.commit.AbstractCommitITest.loadSuccessFile(AbstractCommitITest.java:499)
	at org.apache.hadoop.fs.s3a.commit.terasort.AbstractCommitTerasortIT.test_120_terasort(AbstractCommitTerasortIT.java:177)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: No such file or directory: s3a://gabota-versioned-bucket-ireland/terasort-ITestTerasortMagicCommitter/sortin/_SUCCESS
	at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2804)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2693)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2589)
	at org.apache.hadoop.fs.contract.ContractTestUtils.verifyPathExists(ContractTestUtils.java:934)
	... 19 more

[ERROR] Tests run: 7, Failures: 1, Errors: 2, Skipped: 0, Time elapsed: 72.408 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.terasort.ITestTerasortMagicCommitter
[ERROR] test_110_teragen(org.apache.hadoop.fs.s3a.commit.terasort.ITestTerasortMagicCommitter)  Time elapsed: 21.174 s  <<< FAILURE!
java.lang.AssertionError: Teragen(1000, s3a://gabota-versioned-bucket-ireland/terasort-ITestTerasortMagicCommitter/sortin) failed expected:<0> but was:<1>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:834)
	at org.junit.Assert.assertEquals(Assert.java:645)
	at org.apache.hadoop.fs.s3a.commit.terasort.AbstractCommitTerasortIT.executeStage(AbstractCommitTerasortIT.java:143)
	at org.apache.hadoop.fs.s3a.commit.terasort.AbstractCommitTerasortIT.test_110_teragen(AbstractCommitTerasortIT.java:167)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

steveloughran · 2019-06-20T14:39:55Z

@bgaborg thanks for those results, we need to look at them to see if they are related.

`test_040_get(org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale)`

That FileNotFoundException wraps a ResourceNotFoundException: the DDB isn't there. What happens on a rerun

terasort.

The tests are in an ordered chain: they only run if the previous test stage completed, which is inferred from the _SUCCESS marker in the previous dir

The only one to worry about (at least at first) is: test_110_teragen, where the exec'd operation returned a non-zero value: it failed. But we don't know why.

One thing I have never worked out is where in the miniyarn cluster the logs from the MR job actually collect. We have those of the JUnit process, but not the forked processes which are actually logging what's going on. If you have any insight here, that'd help us debug. Otherwise, what happens when you rereun this?

steveloughran · 2019-06-20T18:26:44Z

@mackrorysd FWIW, I wasn't planning to backport this too far. All the new files are far away from existing code, so its the DDB changes and the changes in S3AFS which will be the sources of merge pain

* Kafka 2.0 upgrade * Migrated some of tests to use new Java APIs and remove scala code * Addressed review comments; fixed all the remaining failures * Remove unused code * Minor cleanup

…pache#1002)

steveloughran mentioned this pull request Jun 12, 2019

HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename #843

Closed

bgaborg requested changes Jun 12, 2019

View reviewed changes

steveloughran force-pushed the s3/HADOOP-15183-s3guard-rename-failures branch 2 times, most recently from 2fa4cb3 to da8e05a Compare June 12, 2019 20:02

ajfabbri reviewed Jun 13, 2019

View reviewed changes

mackrorysd reviewed Jun 13, 2019

View reviewed changes

steveloughran added 2 commits June 17, 2019 15:06

HADOOP-15183: review comments from Sean and Aaron

67bec7f

Change-Id: I5b0e5f0991cd26429f5ab9463073f79220ac9bd2

steveloughran force-pushed the s3/HADOOP-15183-s3guard-rename-failures branch from a22fc98 to dbdbfe8 Compare June 17, 2019 19:58

HADOOP-15183 add some more debug level logging of what's been returne…

97e5eba

…d from a get. This is to debug why some of the root contract tests are failing. I'm going to switch to trunk to see if it has the issue Change-Id: I3d7a177495e83880a179bc76ab81c8dfbaf0a53e

mackrorysd approved these changes Jun 19, 2019

View reviewed changes

ajfabbri reviewed Jun 19, 2019

View reviewed changes

bgaborg reviewed Jun 19, 2019

View reviewed changes

HADOOP-15183: NPE in s3guard prune: both DDB init paths need to creat…

29d160c

…e a time source Change-Id: Ic3ec71dc1d4c7bef4866ca4d598c20aba4e17575

ajfabbri approved these changes Jun 19, 2019

View reviewed changes

steveloughran closed this Jun 20, 2019

shanthoosh pushed a commit to shanthoosh/hadoop that referenced this pull request Oct 15, 2019

Fix ZkLocalApplicationRunner regression introduced by PR apache#951 (a…

cc9acb2

…pache#1002)

steveloughran deleted the s3/HADOOP-15183-s3guard-rename-failures branch October 15, 2021 19:45

		@@ -418,9 +434,11 @@ private void initThreadPools(Configuration conf) {
		unboundedThreadPool = new ThreadPoolExecutor(
		maxThreads, Integer.MAX_VALUE,


		// the maximum number of tasks cached if all threads are already uploading
		public static final String MAX_TOTAL_TASKS = "fs.s3a.max.total.tasks";

HADOOP-15183. S3Guard store becomes inconsistent after partial failure of rename #951

HADOOP-15183. S3Guard store becomes inconsistent after partial failure of rename #951

Conversation

steveloughran commented Jun 12, 2019

hadoop-yetus commented Jun 12, 2019

steveloughran commented Jun 12, 2019

bgaborg commented Jun 12, 2019 • edited Loading

hadoop-yetus commented Jun 12, 2019

bgaborg commented Jun 12, 2019 • edited Loading

steveloughran commented Jun 12, 2019

bgaborg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hadoop-yetus commented Jun 12, 2019

hadoop-yetus commented Jun 12, 2019

hadoop-yetus commented Jun 12, 2019

hadoop-yetus commented Jun 12, 2019

ajfabbri left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajfabbri left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hadoop-yetus commented Jun 13, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steveloughran commented Jun 17, 2019

steveloughran commented Jun 17, 2019

hadoop-yetus commented Jun 17, 2019

steveloughran commented Jun 18, 2019

hadoop-yetus commented Jun 18, 2019

hadoop-yetus commented Jun 19, 2019

steveloughran commented Jun 19, 2019

hadoop-yetus commented Jun 19, 2019

steveloughran commented Jun 19, 2019

mackrorysd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajfabbri Jun 19, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajfabbri left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hadoop-yetus commented Jun 20, 2019

steveloughran commented Jun 20, 2019 • edited Loading

bgaborg commented Jun 20, 2019

bgaborg commented Jun 12, 2019 •

edited

Loading

bgaborg commented Jun 12, 2019 •

edited

Loading

ajfabbri Jun 19, 2019 •

edited

Loading

steveloughran commented Jun 20, 2019 •

edited

Loading

bgaborg commented Jun 20, 2019 •

edited

Loading

`test_040_get(org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale)`