-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18382. SDK upgrade prerequisites #4698
HADOOP-18382. SDK upgrade prerequisites #4698
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I think there are still some documentation and configuration files to be updated about deprecated classes like here
## <a name="authenticating"></a> Authenticating with S3 |
<name>fs.s3a.aws.credentials.provider</name> |
Also, yetus doesn’t seem to be happy with the deprecated classes. I’m wondering how we should address this because after this change is merged into trunk other people would get a lot of complaint from it as well.
Thanks @monthonk! @steveloughran how can the yetus deprecation errors be handled? Also for credential providers, do we want to allow custom V1 credential providers in delegation tokens and otherwise? They can be used with the V2 client like this (I've opened a draft PR with the first set of changes for the upgrade). I wasn't sure if we want to support them, will update the documentation I've written in this PR accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that there's going to be a shim for credential providers, logging warnings there is needless and will scare people.
At the very least, we should not log if any of the credential providers are the ones in the hadoop code base itself (something in S3AUtils to verify this). If people are using/subclassing them in their own code, the deprecation warnings should give them a hint.
And wherever things are logged, a LogExactlyOnce log avoids flooding job logs with hundreds of warnings whenever filesystems are created. Hive creates and releases many file systems for many different users for example. See DefaultS3ClientFactory for an example use of this.
Yetus is complaining about deprecation in our own uses of the classes. You should be able to suppress that in your IDE for those methods; yetus/maven Will pick up those same annotations.
@@ -849,6 +849,10 @@ private void bindAWSClient(URI name, boolean dtEnabled) throws IOException { | |||
// with it if so. | |||
|
|||
LOG.debug("Using delegation tokens"); | |||
LOG.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to use a LogOnce; we should consider having a special log just for v2 warnings, so people can turn it off entirely (i would for example)
@@ -637,6 +637,11 @@ public static AWSCredentialProviderList buildAWSProviderList( | |||
AWSCredentialProviderList providers = new AWSCredentialProviderList(); | |||
for (Class<?> aClass : awsClasses) { | |||
|
|||
if (aClass.getName().contains(AWS_AUTH_CLASS_PREFIX)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, log through the proposed LogOnce log
This document explains the upcoming work for upgrading S3A to AWS SDK V2. | ||
This work is tracked in [HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073). | ||
|
||
## Why do we want to upgrade? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer "why the upgrade"
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
public static final List<Class<?>> | ||
STANDARD_AWS_PROVIDERS = Collections.unmodifiableList( | ||
Arrays.asList( | ||
TemporaryAWSCredentialsProvider.class, | ||
SimpleAWSCredentialsProvider.class, | ||
EnvironmentVariableCredentialsProvider.class, | ||
IAMInstanceCredentialsProvider.class)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using fully qualified class names to avoid deprecation warnings that happen on importing a deprecated class. not sure if there is a better way to suppress import warnings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we just ignore them if there's no way to avoid
🎊 +1 overall
This message was automatically generated. |
@@ -233,7 +234,10 @@ needs the credentials needed to interact with buckets. | |||
|
|||
The client supports multiple authentication mechanisms and can be configured as to | |||
which mechanisms to use, and their order of use. Custom implementations | |||
of `com.amazonaws.auth.AWSCredentialsProvider` may also be used. | |||
of `com.amazonaws.auth.AWSCredentialsProvider` may also be used. | |||
However, with the upcoming upgrade to AWS Java SDK V2, these classes will need to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this look ok? Just to confirm:
- We are deprecating all classes that implement
com.amazonaws.auth.AWSCredentialsProvider
. - We will update internal hadoop credential provider classes to implement V2's credential provider. Class names will stay the same, so no changes need to be made to configs.
- We will also update delegation token binding classes and credential providers to V2, this is a breaking change but this is ok.
- We're using a shim for credential providers, so custom cred providers which implement V1's Credential provider will still work, but we want to encourage people to update these.
Thanks @steveloughran, I've suppressed deprecation warnings and used LogExactlyOnce. Yetus is happy now so this is now ready for review. Also had a couple of questions. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having some extra deprecation warnings isn't a blocker; especially if it's internal stuff. For that we just have to do it as "best effort" and not worry if yet as votes down a patch because of them.
Regarding the logging, I like it more now. However it is a bit scattered around the project.
How about you add a new class in fs.s3a.impl.V2Migration class to deal with V2 migration, hich creates the logs and has static methods which can be invoked to report the issues.
It's where extra things could go in future such as wrapping v1 credential providers with a bridging class.
V2Migration.v1ProviderReferenced(classname)
V2Migration.v1S3ClientRequested(reason)
V2Migration.wrapV1CredentialProvider()
...
`
This lets us isolate the changes (keeps S3AFileSystem complexity down, the ongoing struggle),
makes it easy to maintain etc.
One interesting thought it is that what is to be done about AWSCredentialProviderList?
If it were to become a list of V2 credential providers, and implement the new credential provider API then it could be where are part of the bridging takes place.
- All add(AWSCredentialsProvider) and constructors would wrap them with the bridging class.
- new add() methods to add v2 providers would go in
This means the delegation token api would not break, as its deploy() methods these lists. And by removing the deprecation tag there, fewer complaints.
for this pr, mark the class as final to find out who was subclassing something tagged as @Private.
public static final List<Class<?>> | ||
STANDARD_AWS_PROVIDERS = Collections.unmodifiableList( | ||
Arrays.asList( | ||
TemporaryAWSCredentialsProvider.class, | ||
SimpleAWSCredentialsProvider.class, | ||
EnvironmentVariableCredentialsProvider.class, | ||
IAMInstanceCredentialsProvider.class)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we just ignore them if there's no way to avoid
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@steveloughran have added a new V2Migration class and reverted using fully qualified classnames. Some deprecation warnings now, but they are all for imports. For AWSCredentialProviderList, this is what I was thinking too. We can definitely prevent DT from breaking..but since usage is limited not sure if it's better to force people to update instead. have marked the class final for now. there's also a patch warning on yetus now..not sure why? i'd merged trunk into this branch a couple of days ago. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy. we are printing more warnings but the changes are well structured.
as they are seen by users, we may want to have a section in troubleshooting.md to explain what they mean, which is one of "stop it"
@@ -1241,8 +1234,7 @@ AmazonS3 getAmazonS3Client() { | |||
@VisibleForTesting | |||
public AmazonS3 getAmazonS3ClientForTesting(String reason) { | |||
LOG.warn("Access to S3A client requested, reason {}", reason); | |||
WARN_ON_GET_S3_CLIENT.warn( | |||
"getAmazonS3ClientForTesting() will be removed as part of upgrading S3A to AWS SDK V2"); | |||
V2Migration.v1S3ClientRequested(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a getClient() call too; i will comment on that in the conversation
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/V2Migration.java
Show resolved
Hide resolved
*/ | ||
@InterfaceAudience.Private | ||
@InterfaceStability.Evolving | ||
public class AWSCredentialProviderList implements AWSCredentialsProvider, | ||
@Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am proposing the class does implement the v2 api. so no need to deprecate or break the delegation token binding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have removed this deprecation. Since this means DT bindings will not break, and custom cred providers there can continue to use V1 cred providers, do we want to remove the warning on use of DT's? I guess we still want to encourage people to update these credential providers so might be worth leaving in.
Looks good. Be aware that in CDP we will not backport this without the logs disabled, unless/until we have our own code all happy, which will involve a move to the v2 sdk elsewhere (STS interaction). getClient()the s3a FS class has a getClient() call too, which is only used internally or in external code which is doing things to get at it (the forTesting() one came in later, see). The internal stuff could make that private and instead pass it in to WriteOperationsHelper. really that class should
None of our subsidiary classes should have direct refs to the owning fs; it just complicates our life too much. seemed a good idea at the time though... what does that mean for that patch? nothing, except by moving the (split) callbacks into two interfaces/impls. that getClient call could be made private and migration would be easier. datatypesDoes the migration also mean that all the datatypes, s3Object, multipart input stream etc are all going to go too? |
Thanks @steveloughran, have made the suggested changes. And yes, data types are changing. getObject now returns ResponseInputStream<GetObjectResponse>. In this PR, I'm logging a warning on |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the changes; we are down to a couple of non-code nits now: javadocs and markdown tuning.
/** | ||
* Callbacks for WriteOperationHelper. | ||
*/ | ||
private final class WriteOperationHelperCallbacksImpl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like this. yes, it makes for a bigger patch, but it was needed anyway
@@ -157,7 +162,8 @@ protected WriteOperationHelper(S3AFileSystem owner, | |||
Configuration conf, | |||
S3AStatisticsContext statisticsContext, | |||
final AuditSpanSource auditSpanSource, | |||
final AuditSpan auditSpan) { | |||
final AuditSpan auditSpan, | |||
final WriteOperationHelperCallbacks writeOperationHelperCallbacks) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: needs a javadoc entry
Custom signers will currently be implementing `com.amazonaws.auth.Signer` and will need to be | ||
updated to implement `software.amazon.awssdk.core.signer.Signer`. | ||
|
||
### <a name="GetObjectMetadataCalled"></a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afraid you need to strip the trailing spaces. we all do this on the markdowns
@@ -176,7 +177,8 @@ public void initialize(URI name, Configuration originalConf) | |||
conf, | |||
new EmptyS3AStatisticsContext(), | |||
noopAuditor(conf), | |||
AuditTestSupport.NOOP_SPAN); | |||
AuditTestSupport.NOOP_SPAN, | |||
new MinimalWriteOperationHelperCallbacks()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is why i like the callback interface/impl design
@steveloughran i think you may have viewed changes from a previous commit, may last commit addresses both of those issues: 3b87159 |
oh, you are right. ready to go in...yetus is just complaining about deprecation. +1, merging |
This patch prepares the hadoop-aws module for a future migration to using the v2 AWS SDK (HADOOP-18073) That upgrade will be incompatible; this patch prepares for it: -marks some credential providers and other classes and methods as @deprecated. -updates site documentation -reduces the visibility of the s3 client; other than for testing, it is kept private to the S3AFileSystem class. -logs some warnings when deprecated APIs are used. The warning messages are printed only once per JVM's life. To disable them, set the log level of org.apache.hadoop.fs.s3a.SDKV2Upgrade to ERROR Contributed by Ahmar Suhail
This patch prepares the hadoop-aws module for a future migration to using the v2 AWS SDK (HADOOP-18073) That upgrade will be incompatible; this patch prepares for it: -marks some credential providers and other classes and methods as @deprecated. -updates site documentation -reduces the visibility of the s3 client; other than for testing, it is kept private to the S3AFileSystem class. -logs some warnings when deprecated APIs are used. The warning messages are printed only once per JVM's life. To disable them, set the log level of org.apache.hadoop.fs.s3a.SDKV2Upgrade to ERROR Contributed by Ahmar Suhail
This patch prepares the hadoop-aws module for a future migration to using the v2 AWS SDK (HADOOP-18073) That upgrade will be incompatible; this patch prepares for it: -marks some credential providers and other classes and methods as @deprecated. -updates site documentation -reduces the visibility of the s3 client; other than for testing, it is kept private to the S3AFileSystem class. -logs some warnings when deprecated APIs are used. The warning messages are printed only once per JVM's life. To disable them, set the log level of org.apache.hadoop.fs.s3a.SDKV2Upgrade to ERROR Contributed by Ahmar Suhail
Description of PR
Adds in warn logs for things that will be removed in the upgrade. Also adds in documentation around the forthcoming changes.
How was this patch tested?
Tested in eu-west-1 by running
mvn -Dparallel-tests -DtestsThreadCount=16 clean verify
.ITestSelectLandsat
timing out, but this is a known issue.