With sagemaker2.x not able to get tensorflow_distributed_mnist_neo_inf1.ipynb working in jupyter lab

**Description**
Was trying out the
https://github.com/aws/amazon-sagemaker-examples/tree/master/sagemaker_neo_compilation_jobs/deploy_tensorflow_model_on_Inf1_instance
 
and was able to get it to work with sagemaker1.x but running into issues with sagemaker2.x

**Steps**
1. With 2.x it was defaulting to script mode. So used the following to change the scripts.
https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/tensorflow_moving_from_framework_mode_to_script_mode/tensorflow_moving_from_framework_mode_to_script_mode.ipynb

2. Things are working fine till "Deploy the compiled model on a SageMaker endpoint"
3.  In invoke the endpoint step see the following failures.
```
Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz

---------------------------------------------------------------------------
ModelError                                Traceback (most recent call last)
<ipython-input-13-a13c7ab7b16b> in <module>
     12     display.display(im)
     13     # Invoke endpoint with image
---> 14     predict_response = optimized_predictor.predict(data)
     15 
     16     print("========================================")

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/tensorflow/serving.py in predict(self, data, initial_args)
    116                 args["CustomAttributes"] = self._model_attributes
    117 
--> 118         return super(Predictor, self).predict(data, args)
    119 
    120 

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/predictor.py in predict(self, data, initial_args, target_model, target_variant)
    111 
    112         request_args = self._create_request_args(data, initial_args, target_model, target_variant)
--> 113         response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
    114         return self._handle_response(response)
    115 

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    355                     "%s() only accepts keyword arguments." % py_operation_name)
    356             # The "self" in this scope is referring to the BaseClient.
--> 357             return self._make_api_call(operation_name, kwargs)
    358 
    359         _api_call.__name__ = str(py_operation_name)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    674             error_code = parsed_response.get("Error", {}).get("Code")
    675             error_class = self.exceptions.from_code(error_code)
--> 676             raise error_class(parsed_response, operation_name)
    677         else:
    678             return parsed_response

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from model with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/tensorflow-training-2021-02-23-23-14-00-380ml-inf1 in account 448570897954 for more information.
```
4. Looked into the cloudwatch logs and could not find what is going on.

**System information**
A description of your system. Please provide:
- **SageMaker Python SDK version**: Sagemaker2.5
- **Framework name (eg. PyTorch) or algorithm (eg. KMeans)**: Tensorflow
- **Framework version**: 1.15.0
- **Python version**: conda-tensorflow-p36
- **CPU or GPU**: Inf1
- **Custom Docker image (Y/N)**: N




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

With sagemaker2.x not able to get tensorflow_distributed_mnist_neo_inf1.ipynb working in jupyter lab #2175

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

With sagemaker2.x not able to get tensorflow_distributed_mnist_neo_inf1.ipynb working in jupyter lab #2175

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions