-
Notifications
You must be signed in to change notification settings - Fork 130
Closed
Description
Please fill out the form below.
System Information
- Spark or PySpark: PySpark
- SDK Version:
- Spark Version: Spark 2.4.0
- Algorithm (e.g. KMeans): XGBoost
Describe the problem
I am running into three errors when calling SageMakerModel.fromModelS3Path() inside my script run in EMR:
Error#1:
AttributeError: 'Option' object has no attribute '_java_obj'
Exception ignored in: <bound method JavaWrapper.__del__ of <sagemaker_pyspark.wrapper.ScalaMap object at 0x7f3a02df6ac8>>
Traceback (most recent call last):
Error#2 :
There was an error calling SageMaker: An error occurred while calling z:com.amazonaws.services.sagemaker.sparksdk.SageMakerModel.fromModelS3Path. Trace:
py4j.Py4JException: Method fromModelS3Path([class java.lang.String, class java.lang.String, class java.lang.String, class java.lang.String, class java.lang.Integer, class com.amazonaws.services.sagemaker.sparksdk.transformation.serializers.LibSVMRequestRowSerializer, class com.amazonaws.services.sagemaker.sparksdk.transformation.deserializers.XGBoostCSVRowDeserializer, class scala.collection.immutable.HashMap, class scala.Enumeration$Val, class com.amazonaws.services.sagemaker.AmazonSageMakerClient, class java.lang.Boolean, class com.amazonaws.services.sagemaker.sparksdk.RandomNamePolicy, class java.lang.String]) does not exist
Error#3:
When including --jar in spark-submit :
Exception in thread "main" java.io.FileNotFoundException: File file:/mnt/var/lib/hadoop/steps/s-xxxxxxx/SageMakerSparkApplicationJar.jar does not exist
I am able to run SageMakerModel.fromModelS3Path() smoothly within a sagemaker notebook, but the code fails inside the EMR cluster.
Minimal repo / logs
The logs per above
- Exact command to reproduce:
The command I use to submit the EMR application is per the documentation:
--packages com.amazonaws:aws-java-sdk:1.11.613 \
--deploy-mode cluster \
--conf spark.driver.userClassPathFirst=true \
--conf spark.executor.userClassPathFirst=true \
--jars SageMakerSparkApplicationJar.jar,...
When I include --jar SageMakerSparkApplicationJar.jar, I get Error#3 above.
When I do not include --jar SageMakerSparkApplicationJar.jar in spark-submit but rather include the following code in my main script; everything runs but produces Error#1 and Error#2 above:
import sagemaker_pyspark
from pyspark.sql import SparkSession
classpath = ":".join(sagemaker_pyspark.classpath_jars())
spark = SparkSession.builder.config("spark.driver.extraClassPath", classpath).getOrCreate()
KobaKhit
Metadata
Metadata
Assignees
Labels
No labels