Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model_fn is not recognized. Sagemaker Studio template for model building, training, and deployment #229

Open
babarory opened this issue Apr 6, 2021 · 1 comment

Comments

@babarory
Copy link

babarory commented Apr 6, 2021

Hello everyone, I'm very new on sagemaker and I'm facing a strange issue that I can't solve.

My goal : I have created a CNN that I would like to train, build and deploy in a MLOPS pipeline with sagemaker.

First of all, I created a notebook instance in SageMaker in wich i created a wasteClassification.ipynb and a train.py file.
The train.py file contain my neural network definition, some function to train and save it and several overwritted function : model_fn, predict_fn, input_fn. In my wasteClassification.ipynb I was able to create a PyTorch estimator, train the model, deploy the endpoint and make prediction using invoke_endpoint function without any issues.

After that, i decided to create a pipeline to automate training, building and deployment using the new sagemaker tool for that.
I have created a sagemaker studio project based on the template MLOps template for model building, training, and deployment. This template provides two gitCommit repos : modelbuild and modeldeploy. I simply modified the modelbuild repo in wich I put my train.py script in the folder "/pipelines/abalone/" and I modified the file "pipelines/abalone/pipeline.py" in which I created a pytorch estimator linked to my train.py script.
When the pipeline is lauched, I can see in the training job logs that my model is training without any issue and the final endpoint is created. But when I try to invoke the endpoint (invoke_endpoint), I have an error : An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "
Please provide a model_fn implementation."

This is strange because I did provide a model_fn implementation in my train.py file...

Do you have any idea to solve this issue ?

@Soroush-aali-bagi
Copy link

@babarory Did you find the answer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants