Skip to content

Which permissions are needed to allow us to use the NetworkConfig for processing Jobs? #3026

@JimFawkes

Description

@JimFawkes

Since there was no activity in the discussions, I move this here in the hopes of some input.

Discussed in #2912

Originally posted by JimFawkes February 8, 2022
We would like to run a SM processing Job in our own VPC & Security Group. We assume that that is what the `NetworkConfig is setting is for. Unfortunately we are running into a rather unhelpful error:

UnexpectedStatusException: Error for Processing job test-job-2022-02-08-16-14-38-489: Failed. Reason: InternalServerError: We encountered an internal error.  Please try again.

The following code produces the error:

import boto3
from sagemaker import Session
from sagemaker.sklearn.processing import ScriptProcessor
from sagemaker.network import NetworkConfig


sm = boto3.client("sagemaker")

session = Session(default_bucket="our-s3-bucket-name", sagemaker_client=sm)

networking = NetworkConfig(
    security_group_ids=["sg-12345"],
    subnets=["subnet-123345"]
)

script_processor = ScriptProcessor(
    command=["python3"],
    image_uri="12345.dkr.ecr.eu-west-1.amazonaws.com/abc/efg",
    role="arn:aws:iam::12345:role/custom-service-role",
    instance_count=1,
    instance_type="ml.m5.large",
    base_job_name="test-job",
    sagemaker_session=session,
    env=env_vars,
    network_config=networking
)

 script_processor.run(
        code=script_location,
)

We assumed that the error is due to a missing permission in the sagemaker service role we pass to the job but could not figure out which it is.

Any tips to help us debug the error are greatly appreciated.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions