-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18073. Backport of SDK V2 upgrade. #6303
HADOOP-18073. Backport of SDK V2 upgrade. #6303
Conversation
Contributed By: Ahmar Suhail <ahmarsu@amazon.co.uk>
This patch migrates the S3A connector to use the V2 AWS SDK. This is a significant change at the source code level. Any applications using the internal extension/override points in the filesystem connector are likely to break. This includes but is not limited to: - Code invoking methods on the S3AFileSystem class which used classes from the V1 SDK. - The ability to define the factory for the `AmazonS3` client, and to retrieve it from the S3AFileSystem. There is a new factory API and a special interface S3AInternals to access a limited set of internal classes and operations. - Delegation token and auditing extensions. - Classes trying to integrate with the AWS SDK. All standard V1 credential providers listed in the option fs.s3a.aws.credentials.provider will be automatically remapped to their V2 equivalent. Other V1 Credential Providers are supported, but only if the V1 SDK is added back to the classpath. The SDK Signing plugin has changed; all v1 signers are incompatible. There is no support for the S3 "v2" signing algorithm. Finally, the aws-sdk-bundle JAR has been replaced by the shaded V2 equivalent, "bundle.jar", which is now exported by the hadoop-aws module. Consult the document aws_sdk_upgrade for the full details. Contributed by Ahmar Suhail + some bits by Steve Loughran
…oads (apache#6056) * The multipart flag fs.s3a.multipart.uploads.enabled is passed to the async client created * s3A connector bypasses the transfer manager entirely if disabled or for small files. Contributed by Steve Loughran
Tune AWS v2 SDK changes based on testing with third party stores including GCS. Contains HADOOP-18889. S3A v2 SDK error translations and troubleshooting docs * Changes needed to work with multiple third party stores * New third_party_stores document on how to bind to and test third party stores, including google gcs (which works!) * Troubleshooting docs mostly updated for v2 SDK Exception translation/resilience * New AWSUnsupportedFeatureException for unsupported/unavailable errors * Handle 501 method unimplemented as one of these * Error codes > 500 mapped to the AWSStatus500Exception if no explicit handler. * Precondition errors handled a bit better * GCS throttle exception also recognized. * GCS raises 404 on a delete of a file which doesn't exist: swallow it. * Error translation uses reflection to create IOE of the right type. All IOEs at the bottom of an AWS stack chain are regenerated. then a new exception of that specific type is created, with the top level ex its cause. This is done to retain the whole stack chain. * Reduce the number of retries within the AWS SDK * And those of s3a code. * S3ARetryPolicy explicitly declare SocketException as connectivity failure but subclasses BindException * SocketTimeoutException also considered connectivity * Log at debug whenever retry policies looked up * Reorder exceptions to alphabetical order, with commentary * Review use of the Invoke.retry() method The reduction in retries is because its clear when you try to create a bucket which doesn't resolve that the time for even an UnknownHostException to eventually fail over 90s, which then hit the s3a retry code. - Reducing the SDK retries means these escalate to our code better. - Cutting back on our own retries makes it a bit more responsive for most real deployments. - maybeTranslateNetworkException() and s3a retry policy means that unknown host exception is recognised and fails fast. Contributed by Steve Loughran
…pache#6178) v1 => 1.12.565 v2 => 2.20.160 Only the v2 one is distributed; v1 is needed in deployments only to support v1 credential providers Contributed by Steve Loughran
…() (apache#6193) MultiObjectDeleteException to fill in the error details See also: aws/aws-sdk-java-v2#4600 Contributed by Steve Loughran
S3A region logic improved for better inference and to be compatible with previous releases 1. If you are using an AWS S3 AccessPoint, its region is determined from the ARN itself. 2. If fs.s3a.endpoint.region is set and non-empty, it is used. 3. If fs.s3a.endpoint is an s3.*.amazonaws.com url, the region is determined by by parsing the URL Note: vpce endpoints are not handled by this. 4. If fs.s3a.endpoint.region==null, and none could be determined from the endpoint, use us-east-2 as default. 5. If fs.s3a.endpoint.region=="" then it is handed off to The default AWS SDK resolution process. Consult the AWS SDK documentation for the details on its resolution process, knowing that it is complicated and may use environment variables, entries in ~/.aws/config, IAM instance information within EC2 deployments and possibly even JSON resources on the classpath. Put differently: it is somewhat brittle across deployments. Contributed by Ahmar Suhail
Fixes TestErrorTranslation.testMultiObjectExceptionFilledIn() failure which came in with HADOOP-18939. Contributed by Steve Loughran
This restores asynchronous retrieval/refresh of any AWS credentials provided by the EC2 instance/container in which the process is running. Contributed by Steve Loughran
Followup to HADOOP-18889 third party store support; Fix some minor review comments which came in after the merge.
…ds to purge on rename/delete (apache#6218) S3A directory delete and rename will optionally abort all pending multipart uploads in their under their to-be-deleted paths when. fs.s3a.directory.operations.purge.upload is true It is off by default. The filesystems hasPathCapability("fs.s3a.directory.operations.purge.upload") probe will return true when this feature is enabled. Multipart uploads may accrue from interrupted data writes, uncommitted staging/magic committer jobs and other operations/applications. On AWS S3 lifecycle rules are the recommended way to clean these; this change improves support for stores which lack these rules. Contributed by Steve Loughran
LGTM. Rather than merge in, create a feature branch and then we can merge the chain in with a merge commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's big patch, but we've been working on it for a while. I am not going to suggest any changes because this is just a backport.
- create a feature branch
- we can merge this with a merge commit
- and we can see if any surprises do occur
After this all v2 sdk patches must be backported too.
@ahmarsuhail i saw you created a branch-3.4.0 branch out of branch-3.3. |
💔 -1 overall
This message was automatically generated. |
…KMS keys (apache#6140) Contributed by Viraj Jasani
pushed to @jojochuang yes, will send out that email by tomorrow. |
💔 -1 overall
This message was automatically generated. |
Description of PR
This backports SDK V2 to 3.4.0.
HADOOP-18778 - Fixes failing tests when CSE is enabled. (#5763)
HADOOP-18073 - S3A: Upgrade AWS SDK to V2 (#5995)
HADOOP-18888 - createS3AsyncClient() always enables multipart uploads (#6056)
HADOOP-18889 - S3A v2 SDK third party support (#6141)
HADOOP-18932 - Upgrade AWS v2 SDK to 2.20.160 and v1 to 1.12.565 (#6178)
HADOOP-18939 - NPE in AWS v2 SDK RetryOnErrorCodeCondition.shouldRetry() (#6193)
HADOOP-18908- Improve S3A region handling. (#6187)
HADOOP-18946 - TestErrorTranslation failure (#6205)
HADOOP-18945 - IAMInstanceCredentialsProvider failing. (#6202)
HADOOP-18889 - Third party storage followup. (#6186)
HADOOP-18948 - Add option fs.s3a.directory.operations.purge.uploads to purge on rename/delete (#6218)
HADOOP-18850 - Enable dual-layer server-side encryption with AWS KMS keys (#6140)
How was this patch tested?
Tested in eu-west-1 with
mvn -Dparallel-tests -DtestsThreadCount=16
clean verify and scale tests enabled. All good.