-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
When defining a ProcessingStep using the Python SDK the pipeline compiler complains if the code= argument is not specified. However, the SDK documentation and code have code=None as a default (which is invalid) and the AWS documentation for processing steps states that the code parameter may be None if the code already exists in the container. In this case the ScriptProcessor already contains the code, and defines how to execute it through command= parameter.
To reproduce
Defining a processing step without a code argument will cause an error.
evaluation_step = ProcessingStep(
name="EvaluateModel",
processor=script_processor,
inputs=[
sagemaker.processing.ProcessingInput(
source=train_step.properties.ModelArtifacts.S3ModelArtifacts,
destination="/opt/ml/processing/model",
),
sagemaker.processing.ProcessingInput(
source=input_data_uri,
destination="/opt/ml/processing/data",
),
],
outputs=[
sagemaker.processing.ProcessingOutput(
output_name="evaluation",
source="/opt/ml/processing/evaluation",
destination="s3://my-bucket/models/"
),
],
property_files=[evaluation_report],
)
Expected behavior
If a ScriptProcessor is used which is based upon a custom image, the command should just be run directly. No specific code needs to be uploaded or pulled into the container. The expected behaviour can be obtained using the SDK currently by pointing code to any dummy file on S3 or the local machine. This is then pushed to the container, but the command specified by the Script Processor is still executed.
Screenshots or logs
ValueError: code None url scheme b'' is not recognized. Please pass a file path or S3 url