Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19739][CORE] propagate S3 session token to cluser #17080

Closed
wants to merge 3 commits into from

Conversation

uncleGen
Copy link
Contributor

What changes were proposed in this pull request?

propagate S3 session token to cluser

How was this patch tested?

existing ut

@steveloughran
Copy link
Contributor

LGTM. Verified option name in org.apache.hadoop.fs.s3a.Constants file; env var name in `com.amazonaws.SDKGlobalConfiguration'

@SparkQA
Copy link

SparkQA commented Feb 27, 2017

Test build #73501 has finished for PR 17080 at commit 0ae5aa7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@uncleGen
Copy link
Contributor Author

uncleGen commented Feb 28, 2017

@steveloughran IMHO, there is no need to use org.apache.hadoop.fs.s3a.Constants and com.amazonaws.SDKGlobalConfiguration, otherwise we will import hadoop-aws and aws-java-sdk-core into Spark core.

I got it wrong

@steveloughran
Copy link
Contributor

I agree. I was just checking the files to make sure the strings were consistent/correct, rather than trusting the documentation

@uncleGen
Copy link
Contributor Author

uncleGen commented Feb 28, 2017

@steveloughran OK, my fault, I got it wrong.

@uncleGen
Copy link
Contributor Author

uncleGen commented Mar 2, 2017

\cc @srowen

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @steveloughran who might have an opinion, but if this is just a standard env property and standard S3 property, seems like it doesn't hurt.

if (System.getenv("AWS_SESSION_TOKEN") != null) {
val sessionToken = System.getenv("AWS_SESSION_TOKEN")
hadoopConf.set("fs.s3a.session.token", sessionToken)
logDebug(s"Found 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY' and " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these debug statements that useful? they don't need to be interpolated strings, FWIW.

@@ -93,6 +93,16 @@ class SparkHadoopUtil extends Logging {
hadoopConf.set("fs.s3.awsSecretAccessKey", accessKey)
hadoopConf.set("fs.s3n.awsSecretAccessKey", accessKey)
hadoopConf.set("fs.s3a.secret.key", accessKey)

if (System.getenv("AWS_SESSION_TOKEN") != null) {
val sessionToken = System.getenv("AWS_SESSION_TOKEN")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could get this value once, and then check it for null and also use its value. It would avoid duplicating the system prop name. You could also improve the similar check above.

@steveloughran
Copy link
Contributor

@srowen dont worry, been tracking this: I filed the JIRA. Core code is good (i.e. property/env var names).

One thing to bear in mind, the existing code propagates the env vars even if you are submitting work to a cluster running in EC2, which will be using EC2 instance metadata as a source for its credentials. Nobody has publicly complained about that in JIRA/stack overflow, and changing the behaviour may have adverse consequences. This patch does not change that situation

@SparkQA
Copy link

SparkQA commented Mar 2, 2017

Test build #73765 has finished for PR 17080 at commit d8fd8dc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but I'd remove the log statements.

val sessionToken = System.getenv("AWS_SESSION_TOKEN")
if (sessionToken != null) {
hadoopConf.set("fs.s3a.session.token", sessionToken)
logDebug(s"Found 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY' and " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind removing the logs? They don't seem particularly useful.

Copy link
Contributor Author

@uncleGen uncleGen Mar 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. FYI @steveloughran, maybe users can got more datailed error message from aws-sdk.

@SparkQA
Copy link

SparkQA commented Mar 3, 2017

Test build #73797 has finished for PR 17080 at commit 976dc57.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@uncleGen
Copy link
Contributor Author

uncleGen commented Mar 3, 2017

cc @vanzin

@srowen
Copy link
Member

srowen commented Mar 3, 2017

Merged to master

@asfgit asfgit closed this in fa50143 Mar 3, 2017
@steveloughran
Copy link
Contributor

thanks. One thing I realised last night is that logging the session token, even at debug level, would have been a security risk. So it's very good that the log statement got cut, even at the cost of making it slightly harder to track down where credentials came from.

One thing which could be added to all the setters would be the use of Configuration.set(key, value, origin) and set the origin to be something like "Env var $varname on $hostname", so the provenance does get tracked. There's not much which is offered in terms of examining that though; hard to justify

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants