Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endpoint inference with trained HuggingFaceEstimator fails #13

Closed
la-cruche opened this issue Jul 1, 2021 · 2 comments
Closed

Endpoint inference with trained HuggingFaceEstimator fails #13

la-cruche opened this issue Jul 1, 2021 · 2 comments

Comments

@la-cruche
Copy link

Hi,

I .deploy() the model.tar.gz created by 2 sample notebooks, and both fail. It seems that dependencies are not the same between training and inference. Is this something that could be automated? Or documented? I used to think that the config.json would be enough for inference, I don't understand why SM Hosting wants to use the training script (it actually doesn't need to in theory)

  • PyTorch sample: .deploy() works correctly both on CPU and GPU, but GPU inference returns fails with No module named sklearn
  • TF sample: .deploy()works correctly both on CPU and GPU, but GPU inference fails with a No module named datasets
@philschmid
Copy link
Collaborator

@vdantu @ahsan-z-khan i thought when running .deploy() the train.py from training is not taken in to account.
I get the same error as olivier.

File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 207, in handle
self.initialize(context)
File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 75, in initialize
self.validate_and_initialize_user_module()
File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 239, in validate_and_initialize_user_module
user_module = importlib.import_module(user_module_name)
File "/opt/conda/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
Backend response time: 4
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/.sagemaker/mms/models/model/code/train.py", line 2, in <module>
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
ModuleNotFoundError: No module named 'sklearn'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/mms/service.py", line 108, in predict
ret = self._entry_point(input_batch, self.context)
File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 231, in handle
raise PredictionException(str(e), 400)
No module named 'sklearn' : 400

It is even more interesting since the train.py it not included in the model.tar.gz
image
Does this mean Estimator.deploy is also pulling/using the train.py to the inference container?

@vdantu
Copy link
Contributor

vdantu commented Jul 7, 2021

@la-cruche : Can this issue be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants