-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
component: processingRelates to the SageMaker Processing PlatformRelates to the SageMaker Processing Platformtype: documentationtype: question
Description
Since there was no activity in the discussions, I move this here in the hopes of some input.
Discussed in #2912
Originally posted by JimFawkes February 8, 2022
We would like to run a SM processing Job in our own VPC & Security Group. We assume that that is what the `NetworkConfig is setting is for. Unfortunately we are running into a rather unhelpful error:
UnexpectedStatusException: Error for Processing job test-job-2022-02-08-16-14-38-489: Failed. Reason: InternalServerError: We encountered an internal error. Please try again.
The following code produces the error:
import boto3
from sagemaker import Session
from sagemaker.sklearn.processing import ScriptProcessor
from sagemaker.network import NetworkConfig
sm = boto3.client("sagemaker")
session = Session(default_bucket="our-s3-bucket-name", sagemaker_client=sm)
networking = NetworkConfig(
security_group_ids=["sg-12345"],
subnets=["subnet-123345"]
)
script_processor = ScriptProcessor(
command=["python3"],
image_uri="12345.dkr.ecr.eu-west-1.amazonaws.com/abc/efg",
role="arn:aws:iam::12345:role/custom-service-role",
instance_count=1,
instance_type="ml.m5.large",
base_job_name="test-job",
sagemaker_session=session,
env=env_vars,
network_config=networking
)
script_processor.run(
code=script_location,
)
We assumed that the error is due to a missing permission in the sagemaker service role we pass to the job but could not figure out which it is.
Any tips to help us debug the error are greatly appreciated.
Metadata
Metadata
Assignees
Labels
component: processingRelates to the SageMaker Processing PlatformRelates to the SageMaker Processing Platformtype: documentationtype: question