-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels

Description
Please fill out the form below.
System Information
- Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): I am running the sample notebook provided when running new sagemaker notebooks: sample-notebooks/sagemaker-spark/pyspark_mnist/pyspark_mnist_xgboost.ipynb
- Framework Version:
- Python Version: Python 3.6.4
- CPU or GPU: CPU
- Python SDK Version: sagemaker==1.2.2, sagemaker-pyspark==1.0.4
- Are you using a custom image: No
Describe the problem
When I am running Training and Hosting a Model as it is, I get a ClientError. If i provide the IAM role myself (hence not using role = sagemaker.get_execution_role()
) the algorithm is trained without any problem.
More precisely, the two roles are:
- from execution_role():
arn:aws:iam::<ACCOUNT>:role/service-role/AmazonSageMaker-ExecutionRole
- provided by me:
arn:aws:iam::<ACCOUNT>:role/AmazonSageMaker-ExecutionRole
Minimal repro / logs
Py4JJavaError: An error occurred while calling o139.fit.
: java.lang.RuntimeException: Training job couldn't be completed.
at com.amazonaws.services.sagemaker.sparksdk.SageMakerEstimator.runTrainingJob(SageMakerEstimator.scala:407)
at com.amazonaws.services.sagemaker.sparksdk.SageMakerEstimator.fit(SageMakerEstimator.scala:311)
at com.amazonaws.services.sagemaker.sparksdk.SageMakerEstimator.fit(SageMakerEstimator.scala:175)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Training job 'trainingJob-665ebe56c361-2018-05-09T10-53-21-799' failed for reason: 'ClientError: SageMaker was unable to assume the role 'arn:aws:iam::<ACCOUNT>:role/service-role/AmazonSageMaker-ExecutionRole''
at com.amazonaws.services.sagemaker.sparksdk.SageMakerEstimator.awaitTrainingCompletion(SageMakerEstimator.scala:435)
at com.amazonaws.services.sagemaker.sparksdk.SageMakerEstimator.runTrainingJob(SageMakerEstimator.scala:409)
... 13 more
- Exact command to reproduce:
The cell Training and Hosting a Model on the tutorial sample-notebooks/sagemaker-spark/pyspark_mnist/pyspark_mnist_xgboost.ipynb on a new sagemaker notebook instance