Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18637. S3A to support upload of files greater than 2 GB using DiskBlocks #5543

Merged

Conversation

steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Apr 11, 2023

Description of PR

#5481 with extra commit to wrap up

unlimited disk block size.

  • disk block size for allocation requests => -1
  • this turns off capacity checks on allocator
  • and disk blocks no longer worry about/report lack of space
  • block output stream knows not to worry about running out of space
  • tests to show this
  • had to edit pom.xml to always get the full stack trace.

How was this patch tested?

in progress against s3 london

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

HarshitGupta and others added 18 commits March 13, 2023 14:40
* disk block size for allocation requests => -1
* this turns off capacity checks on allocator
* and disk blocks no longer worry about/report lack of space
* block output stream knows not to worry about running out of space
* tests to show this

+ had to edit pom.xml to always get the full stack trace.

Change-Id: I97374a046481165489274fa83202f6b1ebc3bafa
@steveloughran
Copy link
Contributor Author

test failings

[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   ITestS3APrefetchingInputStream.testReadLargeFileFully:143 [Maxiumum named action_executor_acquired.max] 
Expecting:
 <0L>
to be greater than:
 <0L> 
[ERROR] Errors: 
[ERROR]   ITestS3ABucketExistence.testAccessPointProbingV2:171->expectUnknownStore:103->lambda$testAccessPointProbingV2$12:172 » IllegalArgument
[ERROR]   ITestS3ABucketExistence.testAccessPointRequired:188->expectUnknownStore:103->lambda$testAccessPointRequired$14:189 » IllegalArgument
[INFO] 

and


[ERROR] testReadLargeFileFully(org.apache.hadoop.fs.s3a.ITestS3APrefetchingInputStream)  Time elapsed: 89.65 s  <<< FAILURE!
java.lang.AssertionError: 
[Maxiumum named action_executor_acquired.max] 
Expecting:
 <0L>
to be greater than:
 <0L> 
        at org.apache.hadoop.fs.s3a.ITestS3APrefetchingInputStream.testReadLargeFileFully(ITestS3APrefetchingInputStream.java:143)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
        at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
        at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.lang.Thread.run(Thread.java:829)

the bucket ones look unrelated and more that my endpoint is set to eu-west-2


[ERROR] testAccessPointRequired(org.apache.hadoop.fs.s3a.ITestS3ABucketExistence)  Time elapsed: 0.715 s  <<< ERROR!
java.lang.IllegalArgumentException: The region field of the ARN being passed as a bucket parameter to an S3 operation does not match the region the client was configured with. Provided region: 'eu-west-1'; client region: 'eu-west-2'.

nothing has changed there and i don't see any explicit setting of the region other than for the explicit buckets. will need to test there on hadoop-trunk to see if something else has changed.

@steveloughran
Copy link
Contributor Author

ok, trunk run failed too, same bucket probe errors. other one not...maybe its a timing one

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Errors: 
[ERROR]   ITestS3ABucketExistence.testAccessPointProbingV2:171->expectUnknownStore:103->lambda$testAccessPointProbingV2$12:172 » IllegalArgument
[ERROR]   ITestS3ABucketExistence.testAccessPointRequired:188->expectUnknownStore:103->lambda$testAccessPointRequired$14:189 » IllegalArgument
[INFO] 
[ERROR] Tests run: 1158, Failures: 0, Errors: 2, Skipped: 52
[INFO] 

@steveloughran steveloughran changed the title S3/hadoop 18637 huge s3 files HADOOP-18637. S3A to support upload of files greater than 2 GB using DiskBlocks Apr 11, 2023
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 40s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 5 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 39m 7s trunk passed
+1 💚 compile 0m 44s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 0m 37s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 checkstyle 0m 35s trunk passed
+1 💚 mvnsite 0m 45s trunk passed
+1 💚 javadoc 0m 32s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 34s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 1m 20s trunk passed
+1 💚 shadedclient 20m 43s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 21m 2s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 31s the patch passed
+1 💚 compile 0m 36s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 0m 36s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 javac 0m 29s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 19s /results-checkstyle-hadoop-tools_hadoop-aws.txt hadoop-tools/hadoop-aws: The patch generated 2 new + 9 unchanged - 0 fixed = 11 total (was 9)
+1 💚 mvnsite 0m 35s the patch passed
+1 💚 javadoc 0m 16s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 24s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 1m 5s the patch passed
+1 💚 shadedclient 20m 12s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 31s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 37s The patch does not generate ASF License warnings.
94m 47s
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5543/1/artifact/out/Dockerfile
GITHUB PR #5543
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint
uname Linux dbd69a77967d 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / d990781
Default Java Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5543/1/testReport/
Max. process+thread count 737 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5543/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@mukund-thakur mukund-thakur self-requested a review April 11, 2023 21:34
@Override
long remainingCapacity() {
return limit - bytesWritten;
return unlimited()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remainingCapacity is long so shouldn't it be long.MAX_VALUE

Copy link
Contributor

@mukund-thakur mukund-thakur Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I see we are always casting to int. So should fine. I think it is like that as we are writing the big file in disk in loop.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 47s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 5 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 39m 48s trunk passed
+1 💚 compile 0m 43s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 0m 36s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 checkstyle 0m 33s trunk passed
+1 💚 mvnsite 0m 48s trunk passed
+1 💚 javadoc 0m 30s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 36s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 1m 21s trunk passed
+1 💚 shadedclient 20m 21s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 20m 39s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 32s the patch passed
+1 💚 compile 0m 36s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 0m 36s the patch passed
+1 💚 compile 0m 31s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 javac 0m 31s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 19s the patch passed
+1 💚 mvnsite 0m 37s the patch passed
+1 💚 javadoc 0m 14s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 23s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 1m 12s the patch passed
+1 💚 shadedclient 20m 4s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 30s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 36s The patch does not generate ASF License warnings.
94m 57s
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5543/2/artifact/out/Dockerfile
GITHUB PR #5543
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint
uname Linux 5a06c41cf424 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 9f07eba
Default Java Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5543/2/testReport/
Max. process+thread count 729 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5543/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 39s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 5 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 39m 48s trunk passed
+1 💚 compile 0m 45s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 0m 41s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 checkstyle 0m 35s trunk passed
+1 💚 mvnsite 0m 48s trunk passed
+1 💚 javadoc 0m 26s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 31s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 1m 23s trunk passed
+1 💚 shadedclient 20m 34s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 20m 53s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 33s the patch passed
+1 💚 compile 0m 39s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 0m 39s the patch passed
+1 💚 compile 0m 31s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 javac 0m 31s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 20s the patch passed
+1 💚 mvnsite 0m 35s the patch passed
+1 💚 javadoc 0m 14s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 24s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 1m 9s the patch passed
+1 💚 shadedclient 20m 6s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 35s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 38s The patch does not generate ASF License warnings.
95m 22s
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5543/3/artifact/out/Dockerfile
GITHUB PR #5543
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint
uname Linux 5ccf2de415b0 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 9f07eba
Default Java Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5543/3/testReport/
Max. process+thread count 700 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5543/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@mukund-thakur mukund-thakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1. Ran aws tests in us-west-1. All good.

@mukund-thakur mukund-thakur merged commit 7c3d94a into apache:trunk Apr 11, 2023
@steveloughran
Copy link
Contributor Author

I have a followup for this feature, primarily to reject multipart copy requests when disabled, test to verify that for a large enough threshold, calls don't get rejected.

HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request Apr 18, 2023
…DiskBlocks (apache#5543)

Contributed By: HarshitGupta and Steve Loughran
@steveloughran
Copy link
Contributor Author

@HarshitGupta11 create a new PR with your change for yetus to review, then we can merge through the github ui. No need code reviews, unless related to the backport itself

HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request May 8, 2023
…DiskBlocks (apache#5543)

Contributed By: HarshitGupta and Steve Loughran
HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request May 10, 2023
…DiskBlocks (apache#5543)

Contributed By: HarshitGupta and Steve Loughran
ferdelyi pushed a commit to ferdelyi/hadoop that referenced this pull request May 26, 2023
…DiskBlocks (apache#5543)


Contributed By: HarshitGupta and Steve Loughran
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants