Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark-UI docker container : setting aws security credentials throws #75

Closed
fbielejec opened this issue Oct 15, 2020 · 6 comments
Closed

Comments

@fbielejec
Copy link

After building the docker image and attempting to start it (env vars are exported):

docker run -it -e SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=$LOG_DIR -Dspark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID -Dspark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY -Dfs.s3n.awsAccessKeyId=$AWS_ACCESS_KEY_ID -Dfs.s3n.awsSecretAccessKey=$AWS_SECRET_ACCESS_KEY" -p 18080:18080 glue/sparkui:latest "/opt/spark/bin/spark-class org.apache.spark.deploy.history.HistoryServer"

I get:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/10/15 11:02:47 INFO HistoryServer: Started daemon with process name: 1@f3fe1ed456cf
20/10/15 11:02:47 INFO SignalUtils: Registered signal handler for TERM
20/10/15 11:02:47 INFO SignalUtils: Registered signal handler for HUP
20/10/15 11:02:47 INFO SignalUtils: Registered signal handler for INT
20/10/15 11:02:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/10/15 11:02:47 INFO SecurityManager: Changing view acls to: root
20/10/15 11:02:47 INFO SecurityManager: Changing modify acls to: root
20/10/15 11:02:47 INFO SecurityManager: Changing view acls groups to: 
20/10/15 11:02:47 INFO SecurityManager: Changing modify acls groups to: 
20/10/15 11:02:47 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
20/10/15 11:02:48 INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions
20/10/15 11:02:48 WARN FileSystem: S3FileSystem is deprecated and will be removed in future releases. Use NativeS3FileSystem or S3AFileSystem instead.
Exception in thread "main" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:280)
	at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified by setting the fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey properties (respectively).
	at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:74)
	at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.initialize(Jets3tFileSystemStore.java:94)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)
	at com.sun.proxy.$Proxy5.initialize(Unknown Source)
	at org.apache.hadoop.fs.s3.S3FileSystem.initialize(S3FileSystem.java:111)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2812)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
	at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:117)
	at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:86)
	... 6 more

@fbielejec fbielejec changed the title Spark-UI docker container : aws securty credentials Spark-UI docker container : setting aws security credentials throws Oct 15, 2020
@mc-rca
Copy link

mc-rca commented Oct 23, 2020

Seeing the same error. I fixed by adding AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to the container environment directly. However, I think the issue is that it is picking the wrong credentials provider. In other words, the flag -Dspark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider seems to have no effect.

@Robinspecteur
Copy link

I had the same issue. This was because I had set the LOG dir to S3 instead of S3A:

$LOG_DIR="s3://path_to_eventlog/"

instead of

$LOG_DIR="s3a://path_to_eventlog/"

@ctivanovich
Copy link

ctivanovich commented Dec 21, 2020

@Robinspecteur Does pointing the variable to s3a require some special configuration on the s3 destination? Or just setting a Glue job to write to regular S3 is adequate?

I try s3a in the address, but get a credential error:
No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider SharedInstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Failed to connect to service endpoint

I also addressed an error, the documentation is out of date, by setting the key handlers to spark.hadoop.fs.s3a.awsAccessKeyId and spark.hadoop.fs.s3a.awsSecretAccessKey, as prompted by error message.

Edit: For anyone who has this issue, using OP's posted command worked -- I am not sure why, but you need to pass the secret access key to both s3a and s3n handlers.

@moomindani
Copy link
Contributor

moomindani commented Dec 25, 2020

fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey are not the configuration property for S3A. It is for S3N.
And those parameters are hadoop-related parameters. If you want to use them, you will need to add spark.hadoop. to the prefix of each configuration names (spark.hadoop.fs.s3.awsAccessKeyId and spark.hadoop.fs.s3.awsSecretAccessKey).

BTW, when you use s3a:// for the prefix of LOG_DIR, you won't need to configure S3N credentials.
If you still see the issue in S3A, can you paste entire command you used with masking credentials, and also paste the output of $LOG_DIR?

Reference: https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Authentication_properties

@moomindani
Copy link
Contributor

Let me close this since there are no updates for several months.
Please reopen this issue if you still see the issue.

@nelson-lark
Copy link

For anyone stumbling upon this from the google, this is what worked for me
The s3 log location needed to have the s3a instead of s3 specified

export LOG_DIR=s3a://aws-glue-assets-123456-us-east-1/sparkHistoryLogs/

the docker command:

docker run -itd \
  -e AWS_ACCESS_KEY_ID="$AWS_ACCESS_KEY_ID" \
  -e AWS_SECRET_ACCESS_KEY="$AWS_SECRET_ACCESS_KEY" \
  -e AWS_SESSION_TOKEN="$AWS_SESSION_TOKEN" \
  -e SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS \
     -Dspark.history.fs.logDirectory=$LOG_DIR \
     -Dspark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.DefaultAWSCredentialsProviderChain" \
  -p 18080:18080 glue/sparkui:latest "/opt/spark/bin/spark-class org.apache.spark.deploy.history.HistoryServer"

Importantly, notice the environment vars in set explicitly in the docker container in place of the following options in the readme:

 -Dspark.hadoop.fs.s3.access.key=$AWS_ACCESS_KEY_ID
 -Dspark.hadoop.fs.s3.secret.key=$AWS_SECRET_ACCESS_KEY
 -Dspark.hadoop.fs.s3.session.token=$AWS_SESSION_TOKEN

I may make a pull request after playing with it some more, but in case I don't just wanted to leave it for anyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants