-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
When using sagemaker.jumpstart.model method and an assumed role (cross-accounts) via Boto3, the call to JumpStartModel fails with a denial for a GetObject call. When running this without the AssumeRole, the same code works fine. However, due to some compliance requirements, I need to assume the role and run the code.
I can also create a model, endpoint, etc, using pure Boto3 however, I lose some of the abstraction, so I'd prefer to use the Sagemaker SDK.
To reproduce
Attempt to use the Jumpstart method while also using an assumed role. I am using this script:
import boto3
import sagemaker
from sagemaker.jumpstart.model import JumpStartModel
sagemaker_role = "arn:aws:iam::123456789012:role/sagemaker-role"
session_name = "AssumedRoleSession"
region = "us-west-2"
model_id = "huggingface-text2text-flan-t5-xxl-fp16"
model_version = "*"
sts_client = boto3.client("sts")
response = sts_client.assume_role(RoleArn=sagemaker_role, RoleSessionName=session_name)
assumed_session = boto3.Session(
aws_access_key_id=response["Credentials"]["AccessKeyId"],
aws_secret_access_key=response["Credentials"]["SecretAccessKey"],
aws_session_token=response["Credentials"]["SessionToken"],
region_name=region
)
sagemaker_session = sagemaker.Session(boto_session=assumed_session)
model = JumpStartModel(
model_id=model_id,
model_version=model_version,
region=region,
role=sagemaker_role,
vpc_config={
"SecurityGroupIds": [
"sg-00000000000000000",
],
"Subnets": [
"subnet-00000000000000000",
"subnet-11111111111111111",
],
},
sagemaker_session=sagemaker_session,
enable_network_isolation=True,
)
model.deploy(
initial_instance_count=1,
instance_type='summary-v1-2023-08-03T00-00-00Z',
endpoint_name='ml.g5.12xlarge',
)Expected behavior
I expect a model, endpoint configuration, and endpoint to be created based on the parameters specified.
Screenshots or logs
Error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 7, in create_jumpstart_model
File "/usr/local/lib/python3.11/site-packages/sagemaker/jumpstart/model.py", line 266, in __init__
if not _is_valid_model_id_hook():
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sagemaker/jumpstart/model.py", line 259, in _is_valid_model_id_hook
return is_valid_model_id(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sagemaker/jumpstart/utils.py", line 592, in is_valid_model_id
models_manifest_list = accessors.JumpStartModelsAccessor._get_manifest(region=region)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sagemaker/jumpstart/accessors.py", line 97, in _get_manifest
return JumpStartModelsAccessor._cache.get_manifest() # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sagemaker/jumpstart/cache.py", line 342, in get_manifest
manifest_dict = self._s3_cache.get(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sagemaker/utilities/cache.py", line 103, in get
self.put(key)
File "/usr/local/lib/python3.11/site-packages/sagemaker/utilities/cache.py", line 126, in put
value = self._retrieval_function( # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sagemaker/jumpstart/cache.py", line 323, in _retrieval_function
formatted_body, etag = self._get_json_file(s3_key, file_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sagemaker/jumpstart/cache.py", line 266, in _get_json_file
file_content, etag = self._get_json_file_and_etag_from_s3(key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sagemaker/jumpstart/cache.py", line 243, in _get_json_file_and_etag_from_s3
response = self._s3_client.get_object(Bucket=self.s3_bucket_name, Key=key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/botocore/client.py", line 535, in _api_call
return self._make_api_call(operation_name, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/botocore/client.py", line 980, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied
If I create an S3 client with the same assumed session and try to fetch the manifest object (I'm assuming it is this one, maybe not: s3://jumpstart-cache-prod-us-west-2/models_manifest.json), I can do so:
>>> s3_client = assumed_session.client("s3", region_name=region)
>>> s3_bucket = 'jumpstart-cache-prod-us-west-2'
>>> s3_key = 'models_manifest.json'
>>> local_file_path = 'models_manifest.json'
>>> s3_client.download_file(s3_bucket, s3_key, local_file_path)
>>> with open(local_file_path, 'r') as file:
... first_10_lines = ''.join([next(file) for _ in range(10)])
...
>>> print(first_10_lines)
[
{
"model_id": "autogluon-classification-ensemble",
"version": "1.1.1",
"min_version": "2.103.0",
"spec_key": "community_models/autogluon-classification-ensemble/specs_v1.1.1.json"
},
{
"model_id": "autogluon-classification-ensemble",
"version": "1.1.0",System information
A description of your system. Please provide:
- SageMaker Python SDK version: 2.174.0
- Framework name (eg. PyTorch) or algorithm (eg. KMeans):
- Framework version:
- Python version: 3.11
- CPU or GPU: GPU
- Custom Docker image (Y/N): N
Additional context
Add any other context about the problem here.