Skip to content

Pyspark Mnist XGBoost tutorial  #174

@ghost

Description

Please fill out the form below.

System Information

  • Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): I am running the sample notebook provided when running new sagemaker notebooks: sample-notebooks/sagemaker-spark/pyspark_mnist/pyspark_mnist_xgboost.ipynb
  • Framework Version:
  • Python Version: Python 3.6.4
  • CPU or GPU: CPU
  • Python SDK Version: sagemaker==1.2.2, sagemaker-pyspark==1.0.4
  • Are you using a custom image: No

Describe the problem

When I am running Training and Hosting a Model as it is, I get a ClientError. If i provide the IAM role myself (hence not using role = sagemaker.get_execution_role()) the algorithm is trained without any problem.
More precisely, the two roles are:

  • from execution_role(): arn:aws:iam::<ACCOUNT>:role/service-role/AmazonSageMaker-ExecutionRole
  • provided by me: arn:aws:iam::<ACCOUNT>:role/AmazonSageMaker-ExecutionRole

Minimal repro / logs

Py4JJavaError: An error occurred while calling o139.fit.
: java.lang.RuntimeException: Training job couldn't be completed.
	at com.amazonaws.services.sagemaker.sparksdk.SageMakerEstimator.runTrainingJob(SageMakerEstimator.scala:407)
	at com.amazonaws.services.sagemaker.sparksdk.SageMakerEstimator.fit(SageMakerEstimator.scala:311)
	at com.amazonaws.services.sagemaker.sparksdk.SageMakerEstimator.fit(SageMakerEstimator.scala:175)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Training job 'trainingJob-665ebe56c361-2018-05-09T10-53-21-799' failed for reason: 'ClientError: SageMaker was unable to assume the role 'arn:aws:iam::<ACCOUNT>:role/service-role/AmazonSageMaker-ExecutionRole''
	at com.amazonaws.services.sagemaker.sparksdk.SageMakerEstimator.awaitTrainingCompletion(SageMakerEstimator.scala:435)
	at com.amazonaws.services.sagemaker.sparksdk.SageMakerEstimator.runTrainingJob(SageMakerEstimator.scala:409)
	... 13 more
  • Exact command to reproduce:
    The cell Training and Hosting a Model on the tutorial sample-notebooks/sagemaker-spark/pyspark_mnist/pyspark_mnist_xgboost.ipynb on a new sagemaker notebook instance

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions