Skip to content

Specifying a code location in ProcessingStep should be optional #4904

@system123

Description

@system123

Describe the bug
When defining a ProcessingStep using the Python SDK the pipeline compiler complains if the code= argument is not specified. However, the SDK documentation and code have code=None as a default (which is invalid) and the AWS documentation for processing steps states that the code parameter may be None if the code already exists in the container. In this case the ScriptProcessor already contains the code, and defines how to execute it through command= parameter.

To reproduce
Defining a processing step without a code argument will cause an error.

evaluation_step = ProcessingStep(
    name="EvaluateModel",
    processor=script_processor,
    inputs=[
        sagemaker.processing.ProcessingInput(
            source=train_step.properties.ModelArtifacts.S3ModelArtifacts,
            destination="/opt/ml/processing/model",
        ),
        sagemaker.processing.ProcessingInput(
            source=input_data_uri,
            destination="/opt/ml/processing/data",
        ),
    ],
    outputs=[
        sagemaker.processing.ProcessingOutput(
            output_name="evaluation",
            source="/opt/ml/processing/evaluation",
            destination="s3://my-bucket/models/"
        ),
    ],
    property_files=[evaluation_report],
)

Expected behavior
If a ScriptProcessor is used which is based upon a custom image, the command should just be run directly. No specific code needs to be uploaded or pulled into the container. The expected behaviour can be obtained using the SDK currently by pointing code to any dummy file on S3 or the local machine. This is then pushed to the container, but the command specified by the Script Processor is still executed.

Screenshots or logs

ValueError: code None url scheme b'' is not recognized. Please pass a file path or S3 url

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions