-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18154. S3A Authentication to support WebIdentity #4070
base: trunk
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've done the initial review.
thanks for this; it looks like a low risk new feature
i can't see any easy tests for it though.
what i will need is
- docs
- you to run all the integration tests using this as auth so we know what break.
things will break, but that should just be where the code has assumptions about authentication...the tests should recognise that these are only session credentials
and skip those tests
...op-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/OIDCTokenCredentialsProvider.java
Outdated
Show resolved
Hide resolved
...op-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/OIDCTokenCredentialsProvider.java
Outdated
Show resolved
Hide resolved
private String sessionName; | ||
private IOException lookupIOE; | ||
|
||
public OIDCTokenCredentialsProvider(Configuration conf) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should credential providers be allowed to raise IOEs? we should be able to fix that
...op-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/OIDCTokenCredentialsProvider.java
Outdated
Show resolved
Hide resolved
.roleSessionName(sessionName) | ||
.build(); | ||
return credentialsProvider.getCredentials(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same line as the }
if (!StringUtils.isEmpty(jwtPath) && !StringUtils.isEmpty(roleARN)) { | ||
final AWSCredentialsProvider credentialsProvider = | ||
WebIdentityTokenCredentialsProvider.builder() | ||
.webIdentityTokenFile(jwtPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will handle local files only, so won't work for jobs across a cluster unless the token is already there.
either cluster fs paths will be needed (download locally and then reference) or we require it on the host of the user launching a job and then include the token data in a delegation token which goes with it. that's a lot more powerful -but a lot more work. best to leave that for a followup patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure i get your point. Our service runs accross multiple Kubernetes pods (replicas) using a ServiceAccount, so that any of those pods is automatically attached a volume pointing to a token file created with the same ServiceAcocunt signature. Perhaps i am missing other use-cases within the Hadoop ecosystem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was just wondering how the secrets get around. for other credentials we can pick them up from the user launching, say, a distcp job, and they will get passed round. alternatively, they can go into a cluster FS like hdfs.
if it works with your k8s setup, then the docs should say "mount a shared volume in your containers". support for credential propagation can be added by someone else when they needed it
@@ -0,0 +1,79 @@ | |||
package org.apache.hadoop.fs.s3a; | |||
|
|||
import org.apache.commons.lang3.StringUtils; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, we have some layout rules for imports. here's my full intellij settings for this if it helps
https://gist.github.com/steveloughran/817dd90e0f1775ce2b6f24684dfb078c
@@ -142,6 +142,10 @@ private Constants() { | |||
public static final String ASSUMED_ROLE_CREDENTIALS_DEFAULT = | |||
SimpleAWSCredentialsProvider.NAME; | |||
|
|||
/** | |||
* Absolute path to the web identity token file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, add a . at the end of the sentence. javadoc versions like that.
also in docs, say "path in local/mounted filesystem" so it is clear it is not a cluster fs like HDFS
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.hadoop.fs.s3a; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move to org.apache.hadoop.fs.s3a.auth
import com.amazonaws.auth.WebIdentityTokenCredentialsProvider; | ||
|
||
import org.apache.commons.lang3.StringUtils; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can cut this line
getClass().getSimpleName(), | ||
jwtPath, roleARN, sessionName); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, add a newline
.build(); | ||
return credentialsProvider.getCredentials(); | ||
} | ||
else throw new CredentialInitializationException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to throw explicit errors when the options are unset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small feedbacks. Forgive me, I'm not familiar with WebIdentity / JWT / OIDC.
/** | ||
* Absolute path to the web identity token file | ||
*/ | ||
public static final String JWT_PATH = "fs.s3a.jwt.path"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If its for OIDC / WebIdentity, can we change to something like fs.s3a.oidc.jwt.path
/ fs.s3a.webidentity.jwt.path
?
Also, add @value
to JavaDoc for IDE hints?
/** | ||
* Support OpenID Connect (OIDC) token for authenticating with AWS. | ||
* | ||
* Please note that users may reference this class name from configuration | ||
* property fs.s3a.aws.credentials.provider. Therefore, changing the class name | ||
* would be a backward-incompatible change. | ||
* | ||
* This credential provider must not fail in creation because that will | ||
* break a chain of credential providers. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this credential provider actually support more than just Open ID Connect - anything that vends an identity under a JWT?
This provider is allowing users to configure Role ARN, JWT path, and session name for the SDK WebIdentityTokenCredentialsProvider. Should we move to similar naming?
Looking at the I'm going to propose we go with @dannycjones's suggestion and support the whole set of values and have the prefix for the arn, we could have a property but, what should we do if it wasn't set?
I don't see any way to a completely block the environment variable resolution, which is a pain. I also see in the internal Library classes that sometimes roles are set up with an external ID, but it is not possible here. Is that an issue? |
so as well as authing with a webidentidy token, we could use https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRoleWithWebIdentity.html to get role credentials for up to 12h. which could go into a delegation token. |
we need this feature, can somebody please help? |
Hey @steveloughran. Could you please provide status update on this PR, so you plan to merge it, or it's already outdated? |
@insider89 not been any changes on this PR since it was last reviewed. if the author takes up the pr, addresses those issues etc then we can work on getting it in to hadoop. I would suggest asking the author what the status is, and then working with them and other interested parties to get it into state where the reviewers are happy. I am not set up to test this, so a key role of those who need this is to verify the patch works. |
@jclarysse - I am looking for this same capability - we're deploying trino / hive metastore (on top of Hadoop) in EKS, and of course service account -> iam role mappings do not work as is. Is this PR merged anyplace where we can build and try out inside of hive metastore? |
note that hadoop-trunk will, once #5872 is in, move to aws sdk 2 only, with the other credential providers. There will be support for v1 credential providers, but only if the v1 aws sdk is explicitly added to the classspath
|
Trunk is on aws v2 sdk now; it's where features go in. once something is in there a patch for branch-3.3 may be considered; that'd have to be a v1 implementation of the same feature so I'd be reluctant to add it as it would be very different. |
Description of PR
The PR addresses a requirement to comply with AWS security concept IAM roles for service accounts (IRSA) while operating a service that isn't based on Apache Spark and that runs inside Amazon Elastic Kubernetes Service (EKS).
The code change consists in adding a new credentials provider class
org.apache.hadoop.fs.s3a.OIDCTokenCredentialsProvider
to the module hadoop-aws.How was this patch tested?
No new unit-test or integration-test was created on-purpose. The patch was "only" tested based on Hadoop release 2.10.1, as part of our specific use-case based on Delta sharing service v0.4.0 along with the following Hadoop configuration (core-site.xml):
For code changes: