Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18637:S3A to support upload of files greater than 2 GB using DiskBlocks #5481

Open
wants to merge 17 commits into
base: trunk
Choose a base branch
from

Conversation

HarshitGupta11
Copy link
Contributor

Description of PR

Use S3A Diskblocks to support the upload of files greater than 2 GB using DiskBlocks. Currently, the max upload size of a single block is ~2GB.

How was this patch tested?

The patch was tested against us-west-2

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

Copy link
Contributor

@mukund-thakur mukund-thakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments.
Have you run all the types of buffer tests?
Added tests for setting and not setting the new configuration and validating the behavior.
when not set, upload should happen via multipart else a single put.

@@ -595,7 +596,7 @@ public void initialize(URI name, Configuration originalConf)
}
blockOutputBuffer = conf.getTrimmed(FAST_UPLOAD_BUFFER,
DEFAULT_FAST_UPLOAD_BUFFER);
partSize = ensureOutputParameterInRange(MULTIPART_SIZE, partSize);
//partSize = ensureOutputParameterInRange(MULTIPART_SIZE, partSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cut

@@ -1831,6 +1832,11 @@ private FSDataOutputStream innerCreateFile(
final PutObjectOptions putOptions =
new PutObjectOptions(keep, null, options.getHeaders());

if(!checkDiskBuffer(getConf())){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just add a method validateOutputStreamConfiguration() and throw exception in the implementation only.

Copy link
Contributor

@mukund-thakur mukund-thakur Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just add a method validateOutputStreamConfiguration() and throw exception in the implementation only.

This is still pending. I don't really mind leaving it as it is but I think my suggestion is consistent with other parts of the code and is more readable.
CC @steveloughran

public static boolean checkDiskBuffer(Configuration conf){
boolean isAllowedMultipart = conf.getBoolean(ALLOW_MULTIPART_UPLOADS,
IS_ALLOWED_MULTIPART_UPLOADS_DEFAULT);
if (isAllowedMultipart) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is wrong here I guess.
if isAllowedMultipart is enabled then FAST_UPLOAD_BUFFER must be disk else we throw an error right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multipart is disabled and the FAST_UPLOAD_BUFFER is not disk then we throw an error.

@mukund-thakur
Copy link
Contributor

@steveloughran could you review this please. thanks.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did a review.

add a test in ITestS3AConfiguration to verify that a forbidden config (multipart off and disk buffering) raises an exception

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final tuning

* be used as the file size might be bigger than the buffer size that can be
* allocated.
* @param conf
* @return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: document conf argument and retrun value

@@ -1859,7 +1859,7 @@ private FSDataOutputStream innerCreateFile(
.withPutOptions(putOptions)
.withIOStatisticsAggregator(
IOStatisticsContext.getCurrentIOStatisticsContext().getAggregator())
.withMultipartAllowed(getConf().getBoolean(
.withMultipartEnabled(getConf().getBoolean(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the multipart enabled flag should be made a field and stored during initialize(), so we can save on scanning the conf map every time a file is created.

//First one being the creation of test/ directory marker
//Second being the creation of the file with tests3ascale/<file-name>
//Third being the creation of directory marker tests3ascale/ on the file delete
assertEquals(3L,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use IOStatisticAssertions here; it generates AssertJ assertion chains from lookups with automatic generation of error text.

assertThatStatisticCounter(fs.getIOStatistics(), OBJECT_PUT_REQUESTS.getSymbol())
  .isEqualTo(3);

Copy link
Contributor

@mukund-thakur mukund-thakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. added some minor comments.

Seems like these points still need to be addressed as discussed before.

  1. Error in staging committer based on new config.
  2. Error in magic committer based on new config.
  3. Error in write operations helper based on new config.

@apache apache deleted a comment from hadoop-yetus Apr 4, 2023
@apache apache deleted a comment from hadoop-yetus Apr 4, 2023
@apache apache deleted a comment from hadoop-yetus Apr 5, 2023
@apache apache deleted a comment from hadoop-yetus Apr 5, 2023
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 35s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 46m 46s Maven dependency ordering for branch
+1 💚 mvninstall 26m 12s trunk passed
+1 💚 compile 23m 1s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 20m 33s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 checkstyle 3m 40s trunk passed
+1 💚 mvnsite 2m 38s trunk passed
+1 💚 javadoc 1m 53s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 1m 38s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 4m 4s trunk passed
+1 💚 shadedclient 21m 0s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 21m 24s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 29s Maven dependency ordering for patch
+1 💚 mvninstall 1m 33s the patch passed
+1 💚 compile 22m 32s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 22m 32s the patch passed
+1 💚 compile 20m 25s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 javac 20m 25s the patch passed
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-0 ⚠️ checkstyle 3m 34s /results-checkstyle-root.txt root: The patch generated 4 new + 9 unchanged - 0 fixed = 13 total (was 9)
+1 💚 mvnsite 2m 38s the patch passed
+1 💚 javadoc 1m 44s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
-1 ❌ javadoc 0m 47s /results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09.txt hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_362-8u362-ga-0ubuntu120.04.1-b09 with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu120.04.1-b09 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 spotbugs 4m 12s the patch passed
+1 💚 shadedclient 21m 25s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 18m 31s hadoop-common in the patch passed.
-1 ❌ unit 2m 25s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch passed.
+1 💚 asflicense 1m 1s The patch does not generate ASF License warnings.
258m 23s
Reason Tests
Failed junit tests hadoop.fs.s3a.commit.staging.TestStagingDirectoryOutputCommitter
hadoop.fs.s3a.commit.staging.TestStagingPartitionedFileListing
hadoop.fs.s3a.commit.staging.TestStagingCommitter
hadoop.fs.s3a.commit.staging.TestStagingPartitionedJobCommit
hadoop.fs.s3a.commit.staging.TestStagingPartitionedTaskCommit
hadoop.fs.s3a.commit.staging.TestDirectoryCommitterScale
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5481/6/artifact/out/Dockerfile
GITHUB PR #5481
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 4db319f228cd 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 4e922b4
Default Java Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5481/6/testReport/
Max. process+thread count 2152 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5481/6/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 36s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 1s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 4 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 16m 4s Maven dependency ordering for branch
+1 💚 mvninstall 25m 45s trunk passed
+1 💚 compile 23m 10s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 20m 57s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 checkstyle 3m 42s trunk passed
+1 💚 mvnsite 2m 34s trunk passed
+1 💚 javadoc 1m 39s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 1m 23s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 4m 1s trunk passed
+1 💚 shadedclient 20m 49s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 21m 12s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 28s Maven dependency ordering for patch
+1 💚 mvninstall 1m 30s the patch passed
+1 💚 compile 22m 28s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 22m 28s the patch passed
+1 💚 compile 20m 36s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 javac 20m 36s the patch passed
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-0 ⚠️ checkstyle 3m 31s /results-checkstyle-root.txt root: The patch generated 4 new + 9 unchanged - 0 fixed = 13 total (was 9)
+1 💚 mvnsite 2m 38s the patch passed
+1 💚 javadoc 1m 45s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
-1 ❌ javadoc 0m 46s /results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09.txt hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_362-8u362-ga-0ubuntu120.04.1-b09 with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu120.04.1-b09 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 spotbugs 4m 9s the patch passed
+1 💚 shadedclient 20m 53s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 18m 30s hadoop-common in the patch passed.
-1 ❌ unit 2m 19s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch passed.
+1 💚 asflicense 1m 1s The patch does not generate ASF License warnings.
226m 22s
Reason Tests
Failed junit tests hadoop.fs.s3a.commit.staging.TestStagingPartitionedFileListing
hadoop.fs.s3a.commit.staging.TestStagingCommitter
hadoop.fs.s3a.commit.staging.TestDirectoryCommitterScale
hadoop.fs.s3a.commit.staging.TestStagingPartitionedTaskCommit
hadoop.fs.s3a.commit.staging.TestStagingDirectoryOutputCommitter
hadoop.fs.s3a.commit.staging.TestStagingPartitionedJobCommit
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5481/7/artifact/out/Dockerfile
GITHUB PR #5481
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 2cbac9757e23 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 13fc2d5
Default Java Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5481/7/testReport/
Max. process+thread count 1291 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5481/7/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor comments.

now, what about multipart uploads (as you mentioned to me offline)

  1. the request factory changes guarantee it won't work, but it would be good to have fail faster.
  2. s3afs.createMultipartUploader() should fail the way it does with isCSEEnabled; add a test to verify this.

other than that, all looks great!

protected Configuration createScaleConfiguration() {
Configuration configuration = super.createScaleConfiguration();
configuration.setBoolean(Constants.MULTIPART_UPLOADS_ENABLED, false);
configuration.setLong(MULTIPART_SIZE, 53687091200L);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this some special value? if so: make a constant, explain what it is.

@apache apache deleted a comment from hadoop-yetus Apr 6, 2023
Copy link
Contributor

@mukund-thakur mukund-thakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost ready to go. Some minor tuning.

Path commitPath = getFileSystem().makeQualified(
new Path(getContract().getTestPath(), "/testpath"));
LOG.debug("{}", commitPath);
assertThrows(PathCommitException.class,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same intercept.

@@ -1831,6 +1832,11 @@ private FSDataOutputStream innerCreateFile(
final PutObjectOptions putOptions =
new PutObjectOptions(keep, null, options.getHeaders());

if(!checkDiskBuffer(getConf())){
Copy link
Contributor

@mukund-thakur mukund-thakur Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just add a method validateOutputStreamConfiguration() and throw exception in the implementation only.

This is still pending. I don't really mind leaving it as it is but I think my suggestion is consistent with other parts of the code and is more readable.
CC @steveloughran

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 36s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 4 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 16m 17s Maven dependency ordering for branch
+1 💚 mvninstall 26m 43s trunk passed
+1 💚 compile 23m 6s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 20m 36s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 checkstyle 3m 47s trunk passed
+1 💚 mvnsite 2m 41s trunk passed
+1 💚 javadoc 1m 53s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 1m 33s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 4m 3s trunk passed
+1 💚 shadedclient 20m 58s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 21m 22s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 29s Maven dependency ordering for patch
+1 💚 mvninstall 1m 31s the patch passed
+1 💚 compile 22m 21s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 22m 21s the patch passed
+1 💚 compile 20m 33s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 javac 20m 33s the patch passed
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-0 ⚠️ checkstyle 3m 38s /results-checkstyle-root.txt root: The patch generated 3 new + 9 unchanged - 0 fixed = 12 total (was 9)
+1 💚 mvnsite 2m 38s the patch passed
+1 💚 javadoc 1m 45s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
-1 ❌ javadoc 0m 48s /results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09.txt hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_362-8u362-ga-0ubuntu120.04.1-b09 with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu120.04.1-b09 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 spotbugs 4m 9s the patch passed
+1 💚 shadedclient 20m 52s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 18m 21s hadoop-common in the patch passed.
-1 ❌ unit 2m 22s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch passed.
+1 💚 asflicense 1m 2s The patch does not generate ASF License warnings.
227m 52s
Reason Tests
Failed junit tests hadoop.fs.s3a.commit.staging.TestStagingDirectoryOutputCommitter
hadoop.fs.s3a.commit.staging.TestStagingPartitionedFileListing
hadoop.fs.s3a.commit.staging.TestStagingCommitter
hadoop.fs.s3a.commit.staging.TestStagingPartitionedJobCommit
hadoop.fs.s3a.commit.staging.TestStagingPartitionedTaskCommit
hadoop.fs.s3a.commit.staging.TestDirectoryCommitterScale
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5481/8/artifact/out/Dockerfile
GITHUB PR #5481
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 665b3a783820 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 1476424
Default Java Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5481/8/testReport/
Max. process+thread count 1259 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5481/8/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 37s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 4 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 38m 45s trunk passed
+1 💚 compile 0m 43s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 0m 38s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 checkstyle 0m 35s trunk passed
+1 💚 mvnsite 0m 44s trunk passed
+1 💚 javadoc 0m 32s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 33s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 1m 18s trunk passed
+1 💚 shadedclient 20m 25s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 20m 44s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 31s the patch passed
+1 💚 compile 0m 38s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 0m 38s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 javac 0m 29s the patch passed
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-0 ⚠️ checkstyle 0m 18s /results-checkstyle-hadoop-tools_hadoop-aws.txt hadoop-tools/hadoop-aws: The patch generated 3 new + 9 unchanged - 0 fixed = 12 total (was 9)
+1 💚 mvnsite 0m 34s the patch passed
+1 💚 javadoc 0m 16s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
-1 ❌ javadoc 0m 25s /results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09.txt hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_362-8u362-ga-0ubuntu120.04.1-b09 with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu120.04.1-b09 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 spotbugs 1m 6s the patch passed
+1 💚 shadedclient 20m 13s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 2m 5s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch passed.
+1 💚 asflicense 0m 38s The patch does not generate ASF License warnings.
93m 27s
Reason Tests
Failed junit tests hadoop.fs.s3a.commit.staging.TestStagingPartitionedFileListing
hadoop.fs.s3a.commit.staging.TestStagingCommitter
hadoop.fs.s3a.commit.staging.TestDirectoryCommitterScale
hadoop.fs.s3a.commit.staging.TestStagingPartitionedTaskCommit
hadoop.fs.s3a.commit.staging.TestStagingDirectoryOutputCommitter
hadoop.fs.s3a.commit.staging.TestStagingPartitionedJobCommit
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5481/9/artifact/out/Dockerfile
GITHUB PR #5481
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 97ba7bf3ee91 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / f18c0cb
Default Java Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5481/9/testReport/
Max. process+thread count 758 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5481/9/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@mukund-thakur mukund-thakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you run all aws integration tests @HarshitGupta11 , lot of test failing.

@@ -369,6 +373,8 @@ private synchronized void uploadCurrentBlock(boolean isLast)
*/
@Retries.RetryTranslated
private void initMultipartUpload() throws IOException {
Preconditions.checkState(!isMultipartUploadEnabled,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

@mehakmeet mehakmeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran aws test suite on default settings and seeing alot of failures.
Some notable ones:

[ERROR] testAbortAfterTwoPartUpload(org.apache.hadoop.fs.s3a.scale.ITestS3AMultipartUploadSizeLimits)  Time elapsed: 9.331 s  <<< FAILURE!
java.lang.AssertionError: upload must not have completed: unexpectedly found s3a://mehakmeet-singh-data/fork-0001/test/testAbortAfterTwoPartUpload as  S3AFileStatus{path=s3a://mehakmeet-singh-data/fork-0001/test/testAbortAfterTwoPartUpload; isDirectory=false; length=5242880; replication=1; blocksize=33554432; modification_time=1681199546000; access_time=0; owner=mehakmeet.singh; group=mehakmeet.singh; permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=true; isErasureCoded=false} isEmptyDirectory=FALSE eTag=7afe7425a06fe7b3eec28c310e4b5a7e versionId=null
	at org.junit.Assert.fail(Assert.java:89)
	at org.apache.hadoop.fs.contract.ContractTestUtils.assertPathDoesNotExist(ContractTestUtils.java:1018)
	at org.apache.hadoop.fs.contract.AbstractFSContractTestBase.assertPathDoesNotExist(AbstractFSContractTestBase.java:330)
	at org.apache.hadoop.fs.s3a.scale.ITestS3AMultipartUploadSizeLimits.testAbortAfterTwoPartUpload(ITestS3AMultipartUploadSizeLimits.java:158)

This one I'm a little skeptical could be my own machine/config issue so would like others to run this once(tried to run alone still fails):

[ERROR] Failures:
[ERROR]   ITestS3ACommitterMRJob.test_200_execute:295->Assert.fail:89 Job job_1681203442293_0003 failed in state FAILED with cause Task failed task_1681203442293_0003_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 killedReduces: 0
.
Consult logs under /Users/mehakmeet.singh/workstation/osource/hadoop-trunk-review/hadoop/hadoop-tools/hadoop-aws/target/test/data/yarn-2023-04-11-14.26.54.78/yarn-2011546580

This one Mukund has highlighted already:

[ERROR] testCommitterWithDuplicatedCommit(org.apache.hadoop.fs.s3a.commit.magic.ITestMagicCommitProtocol)  Time elapsed: 6.838 s  <<< ERROR!
java.lang.IllegalStateException: multipart upload is disabled
	at org.apache.hadoop.util.Preconditions.checkState(Preconditions.java:269)
	at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.initMultipartUpload(S3ABlockOutputStream.java:376)
	at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.<init>(S3ABlockOutputStream.java:209)
[ERROR] testReplaceWithDeleteFailure(org.apache.hadoop.fs.s3a.commit.staging.TestStagingPartitionedJobCommit)  Time elapsed: 3.09 s  <<< ERROR!
org.apache.hadoop.fs.s3a.commit.PathCommitException: `s3a://bucket-name/output/path': Multipart uploads are disabled for the FileSystem, the committer can't proceed.
	at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.<init>(AbstractS3ACommitter.java:221)

There are few more failures but best to run the test suite and debug the cause from there

@@ -217,6 +217,10 @@ protected AbstractS3ACommitter(
LOG.debug("{} instantiated for job \"{}\" ID {} with destination {}",
role, jobName(context), jobIdString(context), outputPath);
S3AFileSystem fs = getDestS3AFS();
if (!fs.isMultipartUploadEnabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we want to fail for any s3a committer initialization if the multipart is disabled? iirc magic committer does require multipart but should we be failing for others as well? CC @steveloughran

also seems like alot of tests are failing when I run the suite on default props(by default this should be true and not fail here) could be due to UTs using "MockS3AFileSystem" which doesn't actually initialize and set the variable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they all use multiparts as that is how they write-but-don't-commit the data. this is something harshit and I worked on

@@ -369,6 +373,8 @@ private synchronized void uploadCurrentBlock(boolean isLast)
*/
@Retries.RetryTranslated
private void initMultipartUpload() throws IOException {
Preconditions.checkState(!isMultipartUploadEnabled,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor tunings for the production code, a bit of the testing too

@@ -414,6 +414,11 @@ public class S3AFileSystem extends FileSystem implements StreamCapabilities,
*/
private ArnResource accessPoint;

/**
* Is this S3A FS instance has multipart uploads enabled?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grammar nit
"is multipart upload enabled?"

@@ -1854,7 +1863,8 @@ private FSDataOutputStream innerCreateFile(
.withCSEEnabled(isCSEEnabled)
.withPutOptions(putOptions)
.withIOStatisticsAggregator(
IOStatisticsContext.getCurrentIOStatisticsContext().getAggregator());
IOStatisticsContext.getCurrentIOStatisticsContext().getAggregator())
.withMultipartEnabled(isMultipartUploadEnabled);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, indentation. should be aligned with the .with above

MULTIPART_UPLOAD_ENABLED_DEFAULT);
if (isMultipartUploadEnabled) {
return true;
} else if (!isMultipartUploadEnabled && conf.get(FAST_UPLOAD_BUFFER)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be simplified to

return isMultipartUploadEnabled
 || FAST_UPLOAD_BUFFER_DISK.equals(conf.get(FAST_UPLOAD_BUFFER, DEFAULT_FAST_UPLOAD_BUFFER));

that default in conf.get is critical to prevent NPEs if the option is unset, moving the constant first even more rigorous

@Nullable final PutObjectOptions options) {
@Nullable final PutObjectOptions options) throws IOException {
if (!isMultipartUploadEnabled) {
throw new IOException("Multipart uploads are disabled on the given filesystem.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a PathIOException and include destkey. This gives a bit more detail.

throw new PathIOException(destKey, "Multipart uploads are disabled");

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 37s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 5 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 38m 42s trunk passed
+1 💚 compile 0m 43s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 0m 38s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 checkstyle 0m 35s trunk passed
+1 💚 mvnsite 0m 46s trunk passed
+1 💚 javadoc 0m 33s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 33s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 1m 20s trunk passed
+1 💚 shadedclient 20m 35s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 20m 53s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 31s the patch passed
+1 💚 compile 0m 36s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 0m 36s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 javac 0m 29s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 19s the patch passed
+1 💚 mvnsite 0m 35s the patch passed
+1 💚 javadoc 0m 16s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 23s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 1m 6s the patch passed
+1 💚 shadedclient 20m 21s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 2m 5s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch passed.
+1 💚 asflicense 0m 37s The patch does not generate ASF License warnings.
93m 39s
Reason Tests
Failed junit tests hadoop.fs.s3a.commit.staging.TestStagingDirectoryOutputCommitter
hadoop.fs.s3a.commit.staging.TestStagingPartitionedFileListing
hadoop.fs.s3a.commit.staging.TestStagingCommitter
hadoop.fs.s3a.commit.staging.TestStagingPartitionedJobCommit
hadoop.fs.s3a.commit.staging.TestStagingPartitionedTaskCommit
hadoop.fs.s3a.commit.staging.TestDirectoryCommitterScale
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5481/10/artifact/out/Dockerfile
GITHUB PR #5481
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 412427820e6b 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 7207fdd
Default Java Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5481/10/testReport/
Max. process+thread count 626 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5481/10/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor

added a change in #5543 to pull in here.

now, what do we do for large file renames. currently the transfer manager using that part size to trigger use of MPUs in renames; it doesn't use our request factory so it won't surface.

we could add a modified auditor which would trigger an exception on any MPU initialisation POST, then make sure the huge file renames don't trigger it...

@steveloughran
Copy link
Contributor

or just bypass the xfer manager entirely in this world and do a single copy request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants