Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline Repack step exec fails when _RepackModelStep.py and _repack_script_launcher.sh built from Windows OS context #3762

Open
Mathonal opened this issue Mar 30, 2023 · 2 comments

Comments

@Mathonal
Copy link

Describe the bug
RepackModel steps in pipeline execution fails when built and upsert from Windows Environment.

To reproduce

  • Start from a sagemaker tutorial training pipeline
  • Modify the Model and ModelStep with entry_point parameter. (as you do to prepare your model to inference properly)
from sagemaker.model import Model

model = Model(
    image_uri=image_uri,
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    entry_point="inference.py",
    sagemaker_session=pipeline_session,
    role=role,
)
  • Build and upsert pipeline from a windows environment (simulating local IDE pycharm debug test launch)
  • You should get an error during the execution of this type :
    image

With in the cloud watch logs of the failing step :
image

My understanding of things is :

  • During the step creation/build, there is sagemaker.workflow._utils._RepackModelStep.py called , the _inject_repack_script_and_launcher method more specifically. It does write a bash script file from a string python variable (_repack_script_launcher.sh), and, if this writing operation is executed from windows OS, there seems to be some "carriage return" characters that are written down in this bash file and then pushed with the rest of the pipeline to the cloud for execution.

  • Once in sagemaker pipeline execution environment (linux), the _repack_script_launcher.sh generate several errors during the repack_model step, manage to still launch the _repack_model.py script but transmit a model_archive path with extra characters : #15 making the repack step failing because not able to find the model.tar.gz#015 or model.tar.gz\r object.

  • Note : The exact same code (build upsert run) launch from our CI/CD (linux env) do not cause this error, suggesting that this "writing bash script from python variable" does not cause problem when executed from linux env.

Expected behavior
Be able to build upsert run pipeline from anywhere, especially within my local IDE environment.

Temporary Workaround
I did not succeed in altering the "variable string to write in bash file" or altering the way to write it down in windows environment in a fashion that is still readable without error once transferred linux env...

SO, I duplicated the sagemaker\workflow\_repack_model.py in my project code (in mlops tools folder) and added a small string correction inside to make sure that ".gz" are the 3 last character of the model_archive path. -> Does nothing if bash script already written down from linux env (CICD)

AND I alter SageMaker SDK installation and overwrite the sagemaker\workflow\_repack_model.py with my corrected file right after but this is obviously not a viable way to patch code.

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.140.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): basic random forest algo from Scikit
  • Framework version:
  • Python version: 3.9
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): Y

Additional context
Schneider Electric AI-HUB accounts

@Mathonal Mathonal added the bug label Mar 30, 2023
@svia3 svia3 added the component: pipelines Relates to the SageMaker Pipeline Platform label Apr 10, 2023
@qidewenwhen
Copy link
Member

qidewenwhen commented Apr 25, 2023

Hi @Mathonal, thanks for reaching out!
I really appreciate your efforts on providing all these details, doing the investigation and presenting the workaround! Your investigated root cause makes sense to me.
Currently the SageMaker Python SDK supports Unix/Linux and Mac OS only, see https://github.com/aws/sagemaker-python-sdk#supported-operating-systems.
However, this is a good callout for supporting Windows environment. I'll re-label this issue to "feature request" and bring this up to my internal team to evaluate.

@qidewenwhen
Copy link
Member

Synced up with the internal team. Given that the entire SageMaker PySDK does not support Windows OS, will remove the component: pipelines tag and leave this feature request in the general PySDK queue.
Will notify the SageMaker PySDK team offline on this as well.

@qidewenwhen qidewenwhen removed the component: pipelines Relates to the SageMaker Pipeline Platform label May 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants