Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-17559. S3guard import OOM. #2734

Closed

Conversation

steveloughran
Copy link
Contributor

Remove all tracking of files from DDB AncestorState; dirs in import tool.

Reduces size of the cache to O(dirs).

Test change is to reduce brittleness to clock skew on loaded test runs;
removes an intermittent failure where the existence assert was triggering
a s3guard update -which then broke the assert about the number of writes

Change-Id: I9251f64beb0fec225b0b4ba71bc16f3e116bc758

Remove all tracking of files from DDB AncestorState; dirs in import tool.

Reduces size of the cache to O(dirs).

Test change is to reduce brittleness to clock skew on loaded test runs;
removes an intermittent failure where the existence assert was triggering
a s3guard update -which then broke the assert about the number of writes

Change-Id: I9251f64beb0fec225b0b4ba71bc16f3e116bc758
@steveloughran
Copy link
Contributor Author

tested: s3 london with s3guard. Some failures I'm fixing in the audit patch; only new failure was the intermittent one in ITestCommitOperations which I suspected was a regression, but tracked it down in the logs to clock skew: initial PUT of the output file uses local clock, but s3 uses its clock, and on HEAD request this is updated.

@steveloughran
Copy link
Contributor Author

Stack from the test failure fixed here as seen in the audit PR. Shows its not directly related to this, though I had to stop through to make sure

java.lang.AssertionError: Number of records written after commit #2; first commit had 4; first commit ancestors CommitContext{operationState=AncestorState{operation=Commitid=55; dest=s3a://stevel-london/fork-0001/test/DELAY_LISTING_ME/testBulkCommitFiles/out; size=6; paths={s3a://stevel-london/fork-0001/test/DELAY_LISTING_ME s3a://stevel-london/fork-0001/test/DELAY_LISTING_ME/testBulkCommitFiles/out/file1 s3a://stevel-london/fork-0001 s3a://stevel-london/fork-0001/test/DELAY_LISTING_ME/testBulkCommitFiles/out s3a://stevel-london/fork-0001/test s3a://stevel-london/fork-0001/test/DELAY_LISTING_ME/testBulkCommitFiles}}}; second commit ancestors: CommitContext{operationState=AncestorState{operation=Commitid=55; dest=s3a://stevel-london/fork-0001/test/DELAY_LISTING_ME/testBulkCommitFiles/out; size=8; paths={s3a://stevel-london/fork-0001/test/DELAY_LISTING_ME s3a://stevel-london/fork-0001/test/DELAY_LISTING_ME/testBulkCommitFiles/out/file1 s3a://stevel-london/fork-0001 s3a://stevel-london/fork-0001/test/DELAY_LISTING_ME/testBulkCommitFiles/out s3a://stevel-london/fork-0001/test/DELAY_LISTING_ME/testBulkCommitFiles/out/subdir s3a://stevel-london/fork-0001/test s3a://stevel-london/fork-0001/test/DELAY_LISTING_ME/testBulkCommitFiles s3a://stevel-london/fork-0001/test/DELAY_LISTING_ME/testBulkCommitFiles/out/subdir/file2}}}: s3guard_metadatastore_record_writes expected:<2> but was:<3>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:834)
	at org.junit.Assert.assertEquals(Assert.java:645)
	at org.apache.hadoop.fs.s3a.S3ATestUtils$MetricDiff.assertDiffEquals(S3ATestUtils.java:1001)
	at org.apache.hadoop.fs.s3a.commit.ITestCommitOperations.testBulkCommitFiles(ITestCommitOperations.java:722)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

@steveloughran
Copy link
Contributor Author

+Add a test to verify we can import with spaces in the filenames

@steveloughran steveloughran added the fs/s3 changes related to hadoop-aws; submitter must declare test endpoint label May 27, 2021
@apache apache deleted a comment from hadoop-yetus Oct 15, 2021
@apache apache deleted a comment from hadoop-yetus Oct 15, 2021
@apache apache deleted a comment from hadoop-yetus Oct 15, 2021
@steveloughran
Copy link
Contributor Author

closing as unmerged; leaving the PR up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs/s3 changes related to hadoop-aws; submitter must declare test endpoint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants