ScriptProcessor does not check local_code config before uploading code to S3 #3560

lodo1995 · 2022-12-27T16:32:55Z

Describe the bug
When a LocalSession or LocalPipelineSession is configured to use local code, as follows

session.config = {'local': {'local_code': True}}

the code passed to a pipeline ProcessingStep or directly to the run method of a processor (ScriptProcessor, FrameworkProcessor, ...) should not be uploaded to S3.

However, ScriptProcessor does not honor this. Its _include_code_in_inputs method (which is called unconditionally by the _normalize_args of the base class Processor, which in turn is called both when running directly and through a pipeline) unconditionally tries to upload the code to S3.

sagemaker-python-sdk/src/sagemaker/processing.py

Line 625 in 554952e

def _include_code_in_inputs(self, inputs, code, kms_key=None):

Compare this to the Model class, used for example in the TrainingStep. Its _upload_code method checks the session configuration and does not upload to S3 when local code is enabled.

sagemaker-python-sdk/src/sagemaker/model.py

Line 532 in 554952e

def _upload_code(self, key_prefix: str, repack: bool = False) -> None:

To reproduce
In the absence of any AWS credentials (which should not be needed when running completely locally), the following code will fail to upload the processing.py script to S3 (botocore.exceptions.NoCredentialsError). Note that, in addition to the following code, a processing.py file must exist in the working directory (but its contents don't matter).

Code

import boto3
import sagemaker
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.pipeline_context import LocalPipelineSession
from sagemaker.processing import ProcessingInput, ProcessingOutput, ScriptProcessor
from sagemaker.workflow.steps import ProcessingStep

role = 'arn:aws:iam::123456789012:role/MyRole'

local_pipeline_session = LocalPipelineSession(boto_session = boto3.Session(region_name = 'eu-west-1'))
local_pipeline_session.config = {'local': {'local_code': True}}

script_processor = ScriptProcessor(
    image_uri = 'docker.io/library/python:3.8',
    command = ['python'],
    instance_type = 'local',
    instance_count = 1,
    sagemaker_session = local_pipeline_session,
    role = role,
)

processing_step = ProcessingStep(
    name = 'Processing Step',
    processor = script_processor,
    code = 'processing.py',
    inputs = [
        ProcessingInput(
            source = './input-data',
            destination = '/opt/ml/processing/input',
        )
    ],
    outputs = [
        ProcessingOutput(
            source = '/opt/ml/processing/output',
            destination = './output-data',
        )
    ],
)

pipeline = Pipeline(
    name = 'MyPipeline',
    steps = [processing_step],
    sagemaker_session = local_pipeline_session
)

pipeline.upsert(role_arn = role)

pipeline_run = pipeline.start()

System information
A description of your system. Please provide:

SageMaker Python SDK version: 2.126.0

The text was updated successfully, but these errors were encountered:

clausagerskov · 2023-02-20T14:05:18Z

@lodo1995 any developments? is local development actually possible at the moment?

lodo1995 · 2023-02-20T14:44:42Z

@clausagerskov in general, local development is partially possible. Meaning, some things do work, others (such as the one described in this bug) don't. Your milage may vary.

Regarding this specific bug, as far as I can tell no AWS developer even looked at it. Nor did anyone look at any of the other bugs that I opened. I don't have time to take care of all of this, so I decided to just avoid Sagemaker for the time being.

Adamwgoh · 2024-01-09T08:09:10Z

Following this as well.

https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-local-mode.html

Although local mode in documentation is said to be supported, it requires user to upload input code into S3 as stated in this issue. If there is a forced upload as a side effect, what is the reason why this is a must if local mode meant to use local resources to run said pipeline ?

clausagerskov · 2024-02-08T16:35:19Z

i wonder if it is possible to emulate an S3 location without having to pay for some tool

lodo1995 added the bug label Dec 27, 2022

trungleduc added the component: local mode label Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ScriptProcessor does not check local_code config before uploading code to S3 #3560

ScriptProcessor does not check local_code config before uploading code to S3 #3560

lodo1995 commented Dec 27, 2022

clausagerskov commented Feb 20, 2023

lodo1995 commented Feb 20, 2023

Adamwgoh commented Jan 9, 2024

clausagerskov commented Feb 8, 2024

ScriptProcessor does not check local_code config before uploading code to S3 #3560

ScriptProcessor does not check local_code config before uploading code to S3 #3560

Comments

lodo1995 commented Dec 27, 2022

clausagerskov commented Feb 20, 2023

lodo1995 commented Feb 20, 2023

Adamwgoh commented Jan 9, 2024

clausagerskov commented Feb 8, 2024