Skip to content

Allow storing inputs and outputs of FrameworkProcessor cross-accounts #2913

@ksonbol

Description

@ksonbol

Describe the feature you'd like
Definitions:

  • Account A: where I will run SageMaker FrameworkProcessor jobs.
  • Account B: where data exists that I need to access for ProcessingInput.source and ProcessingOutput.destination
  • Role A: IAM role in account A that has SageMakerFullAccess policy attached.
  • Role B: IAM role in acount B that has S3 read and write access to the needed data.
    Role A can assume role B, and role B has a trusted relationship with Role A.

Problem:
I would like to create a FrameworkProcessor (specifically a TensorFlowProcessor) instance that can run on Account A but read and write data to Account B to avoid having to copy data back and forth between the two accounts.

How would this feature be used? Please describe.
A role parameter could be added to the ProcessingInput and ProcessingOutput classes that would be assumed before accessing the data.

processor = TensorFlowProcessor(role=role_A,...)
processor.run(
    inputs=[ProcessingInput(source=.., destination=.., role=role_B)],
    outputs=[ProcessingOutput(source=..,destination=.., role=role_B)],
    ...
)

Describe alternatives you've considered
There is a role parameter in the TensorFlowProcessor constructor. However,

  • Using Role A fails with error No S3 objects found under S3 URL...: reason is object exists in Account B not A!
  • Using Role B fails with error:

botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the CreateProcessingJob operation: RoleArn: Cross-account pass role is not allowed.

Reason: we need the SageMaker permissions on Account A to be defined in the role.

How can I tell SageMaker to use Role A for creating and running the processing job but to assume role B to access the datasets in account B? For example, I am able to do that easily in SageMaker notebooks.

Please let me know if there is a way to achieve that with the current FrameworkProcessor implementation.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions