Pipeline Repack step exec fails when _RepackModelStep.py and _repack_script_launcher.sh built from Windows OS context #3762

Mathonal · 2023-03-30T11:45:39Z

Describe the bug
RepackModel steps in pipeline execution fails when built and upsert from Windows Environment.

To reproduce

Start from a sagemaker tutorial training pipeline
Modify the Model and ModelStep with entry_point parameter. (as you do to prepare your model to inference properly)

from sagemaker.model import Model

model = Model(
    image_uri=image_uri,
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    entry_point="inference.py",
    sagemaker_session=pipeline_session,
    role=role,
)

Build and upsert pipeline from a windows environment (simulating local IDE pycharm debug test launch)
You should get an error during the execution of this type :

With in the cloud watch logs of the failing step :

My understanding of things is :

During the step creation/build, there is sagemaker.workflow._utils._RepackModelStep.py called , the _inject_repack_script_and_launcher method more specifically. It does write a bash script file from a string python variable (_repack_script_launcher.sh), and, if this writing operation is executed from windows OS, there seems to be some "carriage return" characters that are written down in this bash file and then pushed with the rest of the pipeline to the cloud for execution.
Once in sagemaker pipeline execution environment (linux), the _repack_script_launcher.sh generate several errors during the repack_model step, manage to still launch the _repack_model.py script but transmit a model_archive path with extra characters : #15 making the repack step failing because not able to find the model.tar.gz#015 or model.tar.gz\r object.
Note : The exact same code (build upsert run) launch from our CI/CD (linux env) do not cause this error, suggesting that this "writing bash script from python variable" does not cause problem when executed from linux env.

Expected behavior
Be able to build upsert run pipeline from anywhere, especially within my local IDE environment.

Temporary Workaround
I did not succeed in altering the "variable string to write in bash file" or altering the way to write it down in windows environment in a fashion that is still readable without error once transferred linux env...

SO, I duplicated the sagemaker\workflow\_repack_model.py in my project code (in mlops tools folder) and added a small string correction inside to make sure that ".gz" are the 3 last character of the model_archive path. -> Does nothing if bash script already written down from linux env (CICD)

AND I alter SageMaker SDK installation and overwrite the sagemaker\workflow\_repack_model.py with my corrected file right after but this is obviously not a viable way to patch code.

System information
A description of your system. Please provide:

SageMaker Python SDK version: 2.140.0
Framework name (eg. PyTorch) or algorithm (eg. KMeans): basic random forest algo from Scikit
Framework version:
Python version: 3.9
CPU or GPU: CPU
Custom Docker image (Y/N): Y

Additional context
Schneider Electric AI-HUB accounts

The text was updated successfully, but these errors were encountered:

qidewenwhen · 2023-04-25T21:27:01Z

Hi @Mathonal, thanks for reaching out!
I really appreciate your efforts on providing all these details, doing the investigation and presenting the workaround! Your investigated root cause makes sense to me.
Currently the SageMaker Python SDK supports Unix/Linux and Mac OS only, see https://github.com/aws/sagemaker-python-sdk#supported-operating-systems.
However, this is a good callout for supporting Windows environment. I'll re-label this issue to "feature request" and bring this up to my internal team to evaluate.

qidewenwhen · 2023-05-01T23:47:09Z

Synced up with the internal team. Given that the entire SageMaker PySDK does not support Windows OS, will remove the component: pipelines tag and leave this feature request in the general PySDK queue.
Will notify the SageMaker PySDK team offline on this as well.

Mathonal added the bug label Mar 30, 2023

svia3 added the component: pipelines Relates to the SageMaker Pipeline Platform label Apr 10, 2023

qidewenwhen added type: feature request and removed bug labels Apr 25, 2023

qidewenwhen removed the component: pipelines Relates to the SageMaker Pipeline Platform label May 1, 2023

martinRenou added the OS: Windows label Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline Repack step exec fails when _RepackModelStep.py and _repack_script_launcher.sh built from Windows OS context #3762

Pipeline Repack step exec fails when _RepackModelStep.py and _repack_script_launcher.sh built from Windows OS context #3762

Mathonal commented Mar 30, 2023

qidewenwhen commented Apr 25, 2023 •

edited

qidewenwhen commented May 1, 2023

Pipeline Repack step exec fails when _RepackModelStep.py and _repack_script_launcher.sh built from Windows OS context #3762

Pipeline Repack step exec fails when _RepackModelStep.py and _repack_script_launcher.sh built from Windows OS context #3762

Comments

Mathonal commented Mar 30, 2023

qidewenwhen commented Apr 25, 2023 • edited

qidewenwhen commented May 1, 2023

qidewenwhen commented Apr 25, 2023 •

edited