Use of `source_dir` argument for `FrameworkProcessor` disables step caching functionality

**Describe the bug**
I am attempting to find a Processor that allows me to specify `requirements.txt` per container/step and allow containers to `import` custom code from scripts I write besides the one specified by the `code` argument. To this end, it seems like `sagemaker.processing.FrameworkProcessor` is a good option.

However, when I use the `source_dir` argument (which enables me to `import` custom code), it seems to prevent step caching from working.

**To reproduce**

Here is the code such that step caching does work:

I define
```Python
BASE_DIR = os.path.dirname(os.path.realpath(__file__))
```

and the processing step is defined as
```Python
    sklearn_processor = FrameworkProcessor(
        estimator_cls=SKLearn,
        framework_version="0.23-1",
        instance_type=processing_instance_type,
        instance_count=processing_instance_count,
        base_job_name=base_job_name,
        sagemaker_session=pipeline_session,
        role=role,
        )
    step_args = sklearn_processor.run(
        outputs=[
            ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
            ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
            ProcessingOutput(output_name="test", source="/opt/ml/processing/test"),
        ],
        code=os.path.join(BASE_DIR, "preprocess.py"),
        arguments=[
            '--input-data', input_data,
            '--random-seed', random_seed,
            '--train-fraction', train_fraction,
            '--validation-fraction', validation_fraction,
            ],
        )
    step_process = ProcessingStep(
        name=step_name,
        step_args=step_args,
        cache_config=CacheConfig(enable_caching=True, expire_after="T12h"),
        )
```
This works as expected; takes about 5 minutes to run the first time and then only a couple seconds afterward.

If I switch to 
```Python
        code="preprocess.py",
        source_dir=BASE_DIR,
```
Caching no longer works and it takes 5 minutes to execute the pipeline every time.

Note: if I instead switch to 
```Python
        code=os.path.join(BASE_DIR, "preprocess.py"),
        source_dir=BASE_DIR,
```
The SageMaker Pipeline fails
![image](https://github.com/aws/sagemaker-python-sdk/assets/86134108/b48a67c6-4ca8-4dca-8d1e-c30491d48818)

**System information**
A description of your system. Please provide:
- **SageMaker Python SDK version**:
    sagemaker-2.196.0
- **Framework name (eg. PyTorch) or algorithm (eg. KMeans)**:
- **Framework version**:
- **Python version**:
    3.11
- **CPU or GPU**:
- **Custom Docker image (Y/N)**:

**Additional context**
Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use of `source_dir` argument for `FrameworkProcessor` disables step caching functionality #4246

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use of source_dir argument for FrameworkProcessor disables step caching functionality #4246

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Use of `source_dir` argument for `FrameworkProcessor` disables step caching functionality #4246