New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19739][CORE] propagate S3 session token to cluser #17080
Conversation
LGTM. Verified option name in |
Test build #73501 has finished for PR 17080 at commit
|
@steveloughran I got it wrong |
I agree. I was just checking the files to make sure the strings were consistent/correct, rather than trusting the documentation |
@steveloughran OK, my fault, I got it wrong. |
\cc @srowen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC @steveloughran who might have an opinion, but if this is just a standard env property and standard S3 property, seems like it doesn't hurt.
if (System.getenv("AWS_SESSION_TOKEN") != null) { | ||
val sessionToken = System.getenv("AWS_SESSION_TOKEN") | ||
hadoopConf.set("fs.s3a.session.token", sessionToken) | ||
logDebug(s"Found 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY' and " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these debug statements that useful? they don't need to be interpolated strings, FWIW.
@@ -93,6 +93,16 @@ class SparkHadoopUtil extends Logging { | |||
hadoopConf.set("fs.s3.awsSecretAccessKey", accessKey) | |||
hadoopConf.set("fs.s3n.awsSecretAccessKey", accessKey) | |||
hadoopConf.set("fs.s3a.secret.key", accessKey) | |||
|
|||
if (System.getenv("AWS_SESSION_TOKEN") != null) { | |||
val sessionToken = System.getenv("AWS_SESSION_TOKEN") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could get this value once, and then check it for null and also use its value. It would avoid duplicating the system prop name. You could also improve the similar check above.
@srowen dont worry, been tracking this: I filed the JIRA. Core code is good (i.e. property/env var names). One thing to bear in mind, the existing code propagates the env vars even if you are submitting work to a cluster running in EC2, which will be using EC2 instance metadata as a source for its credentials. Nobody has publicly complained about that in JIRA/stack overflow, and changing the behaviour may have adverse consequences. This patch does not change that situation |
Test build #73765 has finished for PR 17080 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but I'd remove the log statements.
val sessionToken = System.getenv("AWS_SESSION_TOKEN") | ||
if (sessionToken != null) { | ||
hadoopConf.set("fs.s3a.session.token", sessionToken) | ||
logDebug(s"Found 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY' and " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind removing the logs? They don't seem particularly useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. FYI @steveloughran, maybe users can got more datailed error message from aws-sdk.
Test build #73797 has finished for PR 17080 at commit
|
cc @vanzin |
Merged to master |
thanks. One thing I realised last night is that logging the session token, even at debug level, would have been a security risk. So it's very good that the log statement got cut, even at the cost of making it slightly harder to track down where credentials came from. One thing which could be added to all the setters would be the use of |
What changes were proposed in this pull request?
propagate S3 session token to cluser
How was this patch tested?
existing ut