Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18073. Backport of SDK V2 upgrade. #6303

Conversation

ahmarsuhail
Copy link
Contributor

@ahmarsuhail ahmarsuhail commented Nov 28, 2023

Description of PR

This backports SDK V2 to 3.4.0.

HADOOP-18778 - Fixes failing tests when CSE is enabled. (#5763)

HADOOP-18073 - S3A: Upgrade AWS SDK to V2 (#5995)

HADOOP-18888 - createS3AsyncClient() always enables multipart uploads (#6056)

HADOOP-18889 - S3A v2 SDK third party support (#6141)

HADOOP-18932 - Upgrade AWS v2 SDK to 2.20.160 and v1 to 1.12.565 (#6178)

HADOOP-18939 - NPE in AWS v2 SDK RetryOnErrorCodeCondition.shouldRetry() (#6193)

HADOOP-18908- Improve S3A region handling. (#6187)

HADOOP-18946 - TestErrorTranslation failure (#6205)

HADOOP-18945 - IAMInstanceCredentialsProvider failing. (#6202)

HADOOP-18889 - Third party storage followup. (#6186)

HADOOP-18948 - Add option fs.s3a.directory.operations.purge.uploads to purge on rename/delete (#6218)

HADOOP-18850 - Enable dual-layer server-side encryption with AWS KMS keys (#6140)

How was this patch tested?

Tested in eu-west-1 with mvn -Dparallel-tests -DtestsThreadCount=16 clean verify and scale tests enabled. All good.

ahmarsuhail and others added 11 commits November 27, 2023 12:15
Contributed By: Ahmar Suhail <ahmarsu@amazon.co.uk>
This patch migrates the S3A connector to use the V2 AWS SDK.

This is a significant change at the source code level.
Any applications using the internal extension/override points in
the filesystem connector are likely to break.

This includes but is not limited to:
- Code invoking methods on the S3AFileSystem class
  which used classes from the V1 SDK.
- The ability to define the factory for the `AmazonS3` client, and
  to retrieve it from the S3AFileSystem. There is a new factory
  API and a special interface S3AInternals to access a limited
  set of internal classes and operations.
- Delegation token and auditing extensions.
- Classes trying to integrate with the AWS SDK.

All standard V1 credential providers listed in the option
fs.s3a.aws.credentials.provider will be automatically remapped to their
V2 equivalent.

Other V1 Credential Providers are supported, but only if the V1 SDK is
added back to the classpath.

The SDK Signing plugin has changed; all v1 signers are incompatible.
There is no support for the S3 "v2" signing algorithm.

Finally, the aws-sdk-bundle JAR has been replaced by the shaded V2
equivalent, "bundle.jar", which is now exported by the hadoop-aws module.

Consult the document aws_sdk_upgrade for the full details.

Contributed by Ahmar Suhail + some bits by Steve Loughran
…oads (apache#6056)

* The multipart flag fs.s3a.multipart.uploads.enabled is passed to the async client created
* s3A connector bypasses the transfer manager entirely if disabled or for small files.

Contributed by Steve Loughran
Tune AWS v2 SDK changes based on testing with third party stores
including GCS.

Contains HADOOP-18889. S3A v2 SDK error translations and troubleshooting docs

* Changes needed to work with multiple third party stores
* New third_party_stores document on how to bind to and test
  third party stores, including google gcs (which works!)
* Troubleshooting docs mostly updated for v2 SDK

Exception translation/resilience

* New AWSUnsupportedFeatureException for unsupported/unavailable errors
* Handle 501 method unimplemented as one of these
* Error codes > 500 mapped to the AWSStatus500Exception if no explicit
  handler.
* Precondition errors handled a bit better
* GCS throttle exception also recognized.
* GCS raises 404 on a delete of a file which doesn't exist: swallow it.
* Error translation uses reflection to create IOE of the right type.
  All IOEs at the bottom of an AWS stack chain are regenerated.
  then a new exception of that specific type is created, with the top level ex
  its cause. This is done to retain the whole stack chain.
* Reduce the number of retries within the AWS SDK
* And those of s3a code.
* S3ARetryPolicy explicitly declare SocketException as connectivity failure
  but subclasses BindException
* SocketTimeoutException also considered connectivity
* Log at debug whenever retry policies looked up
* Reorder exceptions to alphabetical order, with commentary
* Review use of the Invoke.retry() method

 The reduction in retries is because its clear when you try to create a bucket
 which doesn't resolve that the time for even an UnknownHostException to
 eventually fail over 90s, which then hit the s3a retry code.
 - Reducing the SDK retries means these escalate to our code better.
 - Cutting back on our own retries makes it a bit more responsive for most real
 deployments.
 - maybeTranslateNetworkException() and s3a retry policy means that
   unknown host exception is recognised and fails fast.

Contributed by Steve Loughran
…pache#6178)


v1 => 1.12.565
v2 => 2.20.160
Only the v2 one is distributed; v1 is needed in deployments only to support v1 credential providers

Contributed by Steve Loughran
…() (apache#6193)


MultiObjectDeleteException to fill in the error details

See also: aws/aws-sdk-java-v2#4600

Contributed by Steve Loughran
S3A region logic improved for better inference and
to be compatible with previous releases

1. If you are using an AWS S3 AccessPoint, its region is determined
   from the ARN itself.
2. If fs.s3a.endpoint.region is set and non-empty, it is used.
3. If fs.s3a.endpoint is an s3.*.amazonaws.com url, 
   the region is determined by by parsing the URL 
   Note: vpce endpoints are not handled by this.
4. If fs.s3a.endpoint.region==null, and none could be determined
   from the endpoint, use us-east-2 as default.
5. If fs.s3a.endpoint.region=="" then it is handed off to
   The default AWS SDK resolution process.

Consult the AWS SDK documentation for the details on its resolution
process, knowing that it is complicated and may use environment variables,
entries in ~/.aws/config, IAM instance information within
EC2 deployments and possibly even JSON resources on the classpath.
Put differently: it is somewhat brittle across deployments.

Contributed by Ahmar Suhail
Fixes TestErrorTranslation.testMultiObjectExceptionFilledIn() failure
which came in with HADOOP-18939.

Contributed by Steve Loughran

This restores asynchronous retrieval/refresh of any AWS credentials provided by the
EC2 instance/container in which the process is running.

Contributed by Steve Loughran
Followup to HADOOP-18889 third party store support;

Fix some minor review comments which came in after the merge.
…ds to purge on rename/delete (apache#6218)


S3A directory delete and rename will optionally abort all pending multipart uploads
in their under their to-be-deleted paths when.

fs.s3a.directory.operations.purge.upload is true

It is off by default.

The filesystems hasPathCapability("fs.s3a.directory.operations.purge.upload")
probe will return true when this feature is enabled.

Multipart uploads may accrue from interrupted data writes, uncommitted 
staging/magic committer jobs and other operations/applications. On AWS S3
lifecycle rules are the recommended way to clean these; this change improves
support for stores which lack these rules.

Contributed by Steve Loughran
@steveloughran
Copy link
Contributor

LGTM. Rather than merge in, create a feature branch and then we can merge the chain in with a merge commit

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's big patch, but we've been working on it for a while. I am not going to suggest any changes because this is just a backport.

  1. create a feature branch
  2. we can merge this with a merge commit
  3. and we can see if any surprises do occur

After this all v2 sdk patches must be backported too.

@jojochuang
Copy link
Contributor

@ahmarsuhail i saw you created a branch-3.4.0 branch out of branch-3.3.
Would you like to send out a heads-up to the dev mailing list so the rest of the community is aware of what's going to happen?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 30s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 5s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+0 🆗 shelldocs 0m 0s Shelldocs was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 101 new or modified test files.
_ branch-3.4.0 Compile Tests _
+0 🆗 mvndep 13m 46s Maven dependency ordering for branch
+1 💚 mvninstall 35m 27s branch-3.4.0 passed
+1 💚 compile 18m 59s branch-3.4.0 passed
+1 💚 checkstyle 2m 49s branch-3.4.0 passed
+1 💚 mvnsite 25m 38s branch-3.4.0 passed
+1 💚 javadoc 7m 23s branch-3.4.0 passed
+0 🆗 spotbugs 0m 18s branch/hadoop-project no spotbugs output file (spotbugsXml.xml)
+1 💚 shadedclient 68m 0s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 55s Maven dependency ordering for patch
+1 💚 mvninstall 35m 54s the patch passed
+1 💚 compile 18m 7s the patch passed
-1 ❌ javac 18m 7s /results-compile-javac-root.txt root generated 17 new + 1795 unchanged - 16 fixed = 1812 total (was 1811)
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 2m 44s /results-checkstyle-root.txt root: The patch generated 8 new + 71 unchanged - 17 fixed = 79 total (was 88)
+1 💚 mvnsite 22m 25s the patch passed
+1 💚 shellcheck 0m 0s No new issues.
+1 💚 javadoc 7m 15s the patch passed
+0 🆗 spotbugs 0m 19s hadoop-project has no data from spotbugs
+1 💚 shadedclient 68m 17s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 697m 17s /patch-unit-root.txt root in the patch passed.
+1 💚 asflicense 1m 32s The patch does not generate ASF License warnings.
1042m 45s
Reason Tests
Failed junit tests hadoop.yarn.sls.appmaster.TestAMSimulator
hadoop.yarn.client.api.impl.TestAMRMClient
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6303/1/artifact/out/Dockerfile
GITHUB PR #6303
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint markdownlint shellcheck shelldocs
uname Linux 18634c217a78 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.4.0 / 0fe766f
Default Java Private Build-1.8.0_362-8u372-gaus1-0ubuntu118.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6303/1/testReport/
Max. process+thread count 3722 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws . U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6303/1/console
versions git=2.17.1 maven=3.6.0 spotbugs=4.2.2 shellcheck=0.4.6
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@ahmarsuhail
Copy link
Contributor Author

pushed to feature-HADOOP-18073-sdk-v2-upgrade-3.4 . Closing this PR.

@jojochuang yes, will send out that email by tomorrow.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 6m 13s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 4s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+0 🆗 shelldocs 0m 0s Shelldocs was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 104 new or modified test files.
_ branch-3.4.0 Compile Tests _
+0 🆗 mvndep 13m 47s Maven dependency ordering for branch
+1 💚 mvninstall 35m 59s branch-3.4.0 passed
+1 💚 compile 18m 28s branch-3.4.0 passed
+1 💚 checkstyle 2m 43s branch-3.4.0 passed
+1 💚 mvnsite 25m 33s branch-3.4.0 passed
+1 💚 javadoc 7m 27s branch-3.4.0 passed
+0 🆗 spotbugs 0m 19s branch/hadoop-project no spotbugs output file (spotbugsXml.xml)
+1 💚 shadedclient 67m 41s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 57s Maven dependency ordering for patch
-1 ❌ mvninstall 0m 31s /patch-mvninstall-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
-1 ❌ mvninstall 33m 42s /patch-mvninstall-root.txt root in the patch failed.
-1 ❌ compile 17m 0s /patch-compile-root.txt root in the patch failed.
-1 ❌ javac 17m 0s /patch-compile-root.txt root in the patch failed.
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 2m 40s /results-checkstyle-root.txt root: The patch generated 8 new + 71 unchanged - 17 fixed = 79 total (was 88)
-1 ❌ mvnsite 21m 16s /patch-mvnsite-root.txt root in the patch failed.
+1 💚 shellcheck 0m 0s No new issues.
+1 💚 javadoc 7m 14s the patch passed
+0 🆗 spotbugs 0m 20s hadoop-project has no data from spotbugs
-1 ❌ spotbugs 0m 35s /patch-spotbugs-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
-1 ❌ spotbugs 32m 18s /patch-spotbugs-root.txt root in the patch failed.
+1 💚 shadedclient 45m 46s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 13m 30s /patch-unit-root.txt root in the patch failed.
+0 🆗 asflicense 0m 32s ASF License check generated no output?
367m 0s
Reason Tests
Failed junit tests hadoop.io.erasurecode.rawcoder.TestRawErasureCoderBenchmark
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6303/2/artifact/out/Dockerfile
GITHUB PR #6303
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint markdownlint shellcheck shelldocs
uname Linux a8a2e17cc32e 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.4.0 / 1aeae7a
Default Java Private Build-1.8.0_362-8u372-gaus1-0ubuntu118.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6303/2/testReport/
Max. process+thread count 559 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws . U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6303/2/console
versions git=2.17.1 maven=3.6.0 spotbugs=4.2.2 shellcheck=0.4.6
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants