-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prebuilt PyTorch image difference #139
Comments
Could you share how do create training and then deploying trained model locally? Before we had one container ( From the error message you posted it seems that the problem is caused by using training image to run inference, though I would need more information about how you are training and hosting the model. |
There is no training, the model is pretrained. Pesudo code like following:
Please let me know if you want more details |
What image ( |
This is a customized image on top of prebuilt aws sagemaker image. For prebuilt images, I tried:
Only |
2 is expected to fail. What error do you get when using |
It cannot find the Some more observations:
|
|
I am closing the issue for now since you cannot reproduce it. I will do more experiments. I may reopen it once I got more info. |
For now, I would like to give it another try, following is the error message with
|
Thanks! When do you get this error? on start up or when trying to run predictions? |
when trying to run predictions. The container started successfully, please refer to the following logs for spinning up the container:
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Apologies for the late response. That specific error happens when attempting to import your entrypoint.py as shown here: https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/transformer.py#L143 The entrypoint.py is expected to be in a specific directory, which will get extended using the PythonPath: https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/model_server.py#L103 The specific directory itself is defined by: https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/environment.py#L32 The entrypoint.py should be placed in that specific directory by the Python SDK depending on the framework version specified as shown here: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/pytorch/model.py#L148 Looking at how you are starting the inference jobs, it looks like the framework_version is being omitted, which may not cause the conditional to place the entrypoint.py into the specified directory. I apologize for the experience as this is not ideal, however is there any chance you can retry your job after placing a framework version higher than 1.2? Thanks! |
Closing due to inactivity. |
Hi there,
I am bringing some PyTorch Model outside of SageMaker,
Here are my steps:
pytorch-training
vspytorch-inference
vssagemaker-pytorch
(before 1.2.0)model_fn
,predict_fn
,input_fn
,output_fn
.Here are my observations:
sagemaker-pytorch
version 1.1.0, CPU, everything works.pytorch-inference
, version 1.2.0, CPU, the code are not copied to the container, I guess I should follow this documentation? https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.htmlpytorch-training
, version 1.2.0, CPU, when I tried to deploy the model locally, it throws errors as following:Then wait for container to run until time out.
My questions are:
pytorch-training
andpytorch-inference
?Dockerfile
among those 3 versions, it seems there are a lot of change forpytorch-<inference|training>
fromsagemaker-pytroch
. If I am not missing something here, it is probably worth to revisit the image forpytorch-<inference|training>
?The text was updated successfully, but these errors were encountered: