Skip to content

Jumpstart with Cross-Account Role Assumption Fails with GetObject Access Denial #4043

@mencarellic

Description

@mencarellic

Describe the bug
When using sagemaker.jumpstart.model method and an assumed role (cross-accounts) via Boto3, the call to JumpStartModel fails with a denial for a GetObject call. When running this without the AssumeRole, the same code works fine. However, due to some compliance requirements, I need to assume the role and run the code.

I can also create a model, endpoint, etc, using pure Boto3 however, I lose some of the abstraction, so I'd prefer to use the Sagemaker SDK.

To reproduce
Attempt to use the Jumpstart method while also using an assumed role. I am using this script:

import boto3
import sagemaker
from sagemaker.jumpstart.model import JumpStartModel

sagemaker_role = "arn:aws:iam::123456789012:role/sagemaker-role"
session_name = "AssumedRoleSession"
region = "us-west-2"
model_id = "huggingface-text2text-flan-t5-xxl-fp16"
model_version = "*"

sts_client = boto3.client("sts")
response = sts_client.assume_role(RoleArn=sagemaker_role, RoleSessionName=session_name)
assumed_session = boto3.Session(
    aws_access_key_id=response["Credentials"]["AccessKeyId"],
    aws_secret_access_key=response["Credentials"]["SecretAccessKey"],
    aws_session_token=response["Credentials"]["SessionToken"],
    region_name=region
)

sagemaker_session = sagemaker.Session(boto_session=assumed_session)
model = JumpStartModel(
    model_id=model_id,
    model_version=model_version,
    region=region,
    role=sagemaker_role,
    vpc_config={
        "SecurityGroupIds": [
            "sg-00000000000000000",
        ],
        "Subnets": [
            "subnet-00000000000000000",
            "subnet-11111111111111111",
        ],
    },
    sagemaker_session=sagemaker_session,
    enable_network_isolation=True,
)

model.deploy(
    initial_instance_count=1,
    instance_type='summary-v1-2023-08-03T00-00-00Z',
    endpoint_name='ml.g5.12xlarge',
)

Expected behavior
I expect a model, endpoint configuration, and endpoint to be created based on the parameters specified.

Screenshots or logs
Error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 7, in create_jumpstart_model
  File "/usr/local/lib/python3.11/site-packages/sagemaker/jumpstart/model.py", line 266, in __init__
    if not _is_valid_model_id_hook():
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sagemaker/jumpstart/model.py", line 259, in _is_valid_model_id_hook
    return is_valid_model_id(
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sagemaker/jumpstart/utils.py", line 592, in is_valid_model_id
    models_manifest_list = accessors.JumpStartModelsAccessor._get_manifest(region=region)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sagemaker/jumpstart/accessors.py", line 97, in _get_manifest
    return JumpStartModelsAccessor._cache.get_manifest()  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sagemaker/jumpstart/cache.py", line 342, in get_manifest
    manifest_dict = self._s3_cache.get(
                    ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sagemaker/utilities/cache.py", line 103, in get
    self.put(key)
  File "/usr/local/lib/python3.11/site-packages/sagemaker/utilities/cache.py", line 126, in put
    value = self._retrieval_function(  # type: ignore
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sagemaker/jumpstart/cache.py", line 323, in _retrieval_function
    formatted_body, etag = self._get_json_file(s3_key, file_type)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sagemaker/jumpstart/cache.py", line 266, in _get_json_file
    file_content, etag = self._get_json_file_and_etag_from_s3(key)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sagemaker/jumpstart/cache.py", line 243, in _get_json_file_and_etag_from_s3
    response = self._s3_client.get_object(Bucket=self.s3_bucket_name, Key=key)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/botocore/client.py", line 535, in _api_call
    return self._make_api_call(operation_name, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/botocore/client.py", line 980, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied

If I create an S3 client with the same assumed session and try to fetch the manifest object (I'm assuming it is this one, maybe not: s3://jumpstart-cache-prod-us-west-2/models_manifest.json), I can do so:

>>> s3_client = assumed_session.client("s3", region_name=region)
>>> s3_bucket = 'jumpstart-cache-prod-us-west-2'
>>> s3_key = 'models_manifest.json'
>>> local_file_path = 'models_manifest.json'
>>> s3_client.download_file(s3_bucket, s3_key, local_file_path)
>>> with open(local_file_path, 'r') as file:
...     first_10_lines = ''.join([next(file) for _ in range(10)])
...
>>> print(first_10_lines)
[
    {
        "model_id": "autogluon-classification-ensemble",
        "version": "1.1.1",
        "min_version": "2.103.0",
        "spec_key": "community_models/autogluon-classification-ensemble/specs_v1.1.1.json"
    },
    {
        "model_id": "autogluon-classification-ensemble",
        "version": "1.1.0",

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.174.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans):
  • Framework version:
  • Python version: 3.11
  • CPU or GPU: GPU
  • Custom Docker image (Y/N): N

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions