FrameworkProcessor doesn't install packages in requirements.txt if it's in a Sagemaker Project

**Describe the bug**
I created a pipeline with only one `ProcessingStep` based on `FrameworkProcessor`. When I `upsert()` and `start()` the pipeline from a notebook, it runs well.

The code is in a **Sagemaker project**, generated using an AWS-provided template. When I push a change to Codecommit, the processing job fails because of a missing package. Although the package is in _source_dir/requirements.txt_, it's not installed.

```
Traceback (most recent call last):  File "preprocess.py", line 7, in <module>    import sagemaker
--
ModuleNotFoundError: No module named 'sagemaker'
```

**To reproduce**
```
def get_pipeline(
    region,
    sagemaker_project_arn=None,
    role=None,
    default_bucket=None,
    model_package_group_name="MstfPackageGroup",
    pipeline_name="MstfPipeline",
    base_job_prefix="Mstf",
    feature_group_name="ninja-x5-y3-feature-group-02-14-30-03",
    from_date="2021-05-13",
    to_date="2022-05-13"
):

    sagemaker_session = get_session(region, default_bucket)
    s3_default_bucket = sagemaker_session.default_bucket()
    if role is None:
        role = sagemaker.session.get_execution_role(sagemaker_session)

    # parameters for pipeline execution
    processing_instance_count = ParameterInteger(name="ProcessingInstanceCount", default_value=1)
    processing_instance_type = ParameterString(
        name="ProcessingInstanceType", default_value="ml.m5.xlarge"
    )
    training_instance_type = ParameterString(
        name="TrainingInstanceType", default_value="ml.m5.xlarge"
    )
    model_approval_status = ParameterString(
        name="ModelApprovalStatus", default_value="PendingManualApproval"
    )
    
    feature_group_name_input = ParameterString(
        name="InputFeatureGroupName",
        default_value=feature_group_name
    )
    
    input_s3_bucket = ParameterString(
        name="DefaultS3Bucket",
        default_value=f"s3://{s3_default_bucket}/{base_job_prefix}"
    )
    
    input_from_date = ParameterString(
        name="DataFromDate",
        default_value=from_date
    )
    
    input_to_date = ParameterString(
        name="DataToDate",
        default_value=to_date
    )
    
    est_cls = sagemaker.sklearn.estimator.SKLearn
    framework_version_str="0.23-1"

    script_processor = FrameworkProcessor(
        estimator_cls=est_cls,
        framework_version=framework_version_str,
        role=role,
        instance_count=1,
        instance_type="ml.m5.xlarge",
        sagemaker_session=sagemaker_session
    )
    
    processor_run_args = script_processor.get_run_args(
        code="preprocess.py",
        source_dir=os.path.join(BASE_DIR, "preprocessing"),
        inputs=[],
        outputs=[
            ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
            ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
            ProcessingOutput(output_name="test", source="/opt/ml/processing/test"),
        ],
        arguments=[
                    "--feature_group_name", feature_group_name_input,
                    "--input_s3_bucket", input_s3_bucket,
                    "--from_date", input_from_date,
                    "--to_date", input_to_date],
    )
    
    step_process = ProcessingStep(
        name="PreprocessMstfData",
        processor=script_processor,
        inputs=processor_run_args.inputs,
        outputs=processor_run_args.outputs,
        job_arguments=processor_run_args.arguments,
        code=processor_run_args.code,
    )


    # pipeline instance
    pipeline = Pipeline(
        name=pipeline_name,
        parameters=[
            processing_instance_type,
            processing_instance_count,
            training_instance_type,
            model_approval_status,
            feature_group_name_input,
            input_s3_bucket,
            input_from_date,
            input_to_date
        ],
        steps=[step_process],
        sagemaker_session=sagemaker_session,
    )
    return pipeline

```

**Expected behavior**
Install sagemaker package because it's in requirements.txt and run preprocess.py successfully.



**System information**
A description of your system. Please provide:
- **SageMaker Python SDK version**: 2.86.2
- **Framework name (eg. PyTorch) or algorithm (eg. KMeans)**: SKLearn
- **Framework version**: 0.23-1
- **Python version**: 3.7
- **CPU or GPU**: CPU
- **Custom Docker image (Y/N)**: N

**Additional context**
Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FrameworkProcessor doesn't install packages in requirements.txt if it's in a Sagemaker Project #3166

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FrameworkProcessor doesn't install packages in requirements.txt if it's in a Sagemaker Project #3166

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions