Endpoint inference with trained HuggingFaceEstimator fails #13

la-cruche · 2021-07-01T13:42:52Z

Hi,

I .deploy() the model.tar.gz created by 2 sample notebooks, and both fail. It seems that dependencies are not the same between training and inference. Is this something that could be automated? Or documented? I used to think that the config.json would be enough for inference, I don't understand why SM Hosting wants to use the training script (it actually doesn't need to in theory)

PyTorch sample: .deploy() works correctly both on CPU and GPU, but GPU inference returns fails with No module named sklearn
TF sample: .deploy()works correctly both on CPU and GPU, but GPU inference fails with a No module named datasets

The text was updated successfully, but these errors were encountered:

philschmid · 2021-07-06T13:50:50Z

@vdantu @ahsan-z-khan i thought when running .deploy() the train.py from training is not taken in to account.
I get the same error as olivier.

File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 207, in handle
self.initialize(context)
File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 75, in initialize
self.validate_and_initialize_user_module()
File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 239, in validate_and_initialize_user_module
user_module = importlib.import_module(user_module_name)
File "/opt/conda/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
Backend response time: 4
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/.sagemaker/mms/models/model/code/train.py", line 2, in <module>
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
ModuleNotFoundError: No module named 'sklearn'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/mms/service.py", line 108, in predict
ret = self._entry_point(input_batch, self.context)
File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 231, in handle
raise PredictionException(str(e), 400)
No module named 'sklearn' : 400

It is even more interesting since the train.py it not included in the model.tar.gz

Does this mean Estimator.deploy is also pulling/using the train.py to the inference container?

vdantu · 2021-07-07T21:57:33Z

@la-cruche : Can this issue be closed?

la-cruche closed this as completed Jul 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Endpoint inference with trained HuggingFaceEstimator fails #13

Endpoint inference with trained HuggingFaceEstimator fails #13

la-cruche commented Jul 1, 2021

philschmid commented Jul 6, 2021

vdantu commented Jul 7, 2021

Endpoint inference with trained HuggingFaceEstimator fails #13

Endpoint inference with trained HuggingFaceEstimator fails #13

Comments

la-cruche commented Jul 1, 2021

philschmid commented Jul 6, 2021

vdantu commented Jul 7, 2021