In [1]:
import numpy as np

import requests
import json

# Model deployment with Tensorflow Serving

In this short notebook, we show how a model deployed with Tensorflow Serving can be used to predict the class of new instances using the server APIs. 

More information is available here,
- https://github.com/tensorflow/serving
- https://github.com/tensorflow/serving/blob/master/tensorflow_serving/g3doc/docker.md

### How to deploy the model

1) Save the trained model using `tf.saved_model.save`

2) Get the git repo with the essential code for making a container using Docker

`docker pull tensorflow/serving`

3) Start the container with the pre-trained model

docker run -p 8501:8501 --mount type=bind,source=/path/to/my_model/,target=/models/my_model -e MODEL_NAME=my_model -t tensorflow/serving

NOTE: In the following, the model is called *cnn_classifier*.

### Query the model

We can now query the model using the following notebook,

In [2]:
# Check that the model is running,
r = requests.get('http://localhost:8501/v1/models/cnn_classifier')

print(r.text)

{
 "model_version_status": [
  {
   "version": "1",
   "state": "AVAILABLE",
   "status": {
    "error_code": "OK",
    "error_message": ""
   }
  }
 ]
}



Load the dataset with the inputs values (used for getting predictions), and expected outputs,

In [3]:
X_test = np.load('./datasets/vectorized/X_test.npy',allow_pickle=True)
y_test = np.load('./datasets/vectorized/y_test.npy',allow_pickle=True)

This function prepare each input for prediciton,

In [4]:
def to_input(X):
    output = X.astype(np.float32).tolist()
    return [el for el in output if el != 0.]

In the following, we prepare the input in JSON format and we query the model with it. We get its response with an output value, and we check if the model is making good predictions or not (on this test set, it should have 85% accuracy).

In [5]:
N,_ = y_test.shape
correct_pred = 0

for x_instance,y_instance in zip(X_test,y_test):
    
    # Prepare input according to Tensorflow Serving APIs
    input_data = {}
    input_data["instances"] = [to_input(x_instance)]
    
    # Need to be in json format
    input_json = json.dumps(input_data)

    # Send request to Tensorflow Serving server
    r = requests.post('http://localhost:8501/v1/models/cnn_classifier:predict',
                      data=input_json
                     )

    # Get output and process it
    output = json.loads(r.text)

    if r.status_code != 200:
        raise RuntimeError(output['error'])

    # Check if predictions are correct
    y_predict = np.argmax(output['predictions'])
    y_expected = np.argmax(y_instance)
    
    if y_predict == y_expected:
        correct_pred += 1
        
print('Model accuracy is {:.2f}'.format(correct_pred/N))

Model accuracy is 0.85


The accuracy obtained is consistent with the one we computed previously. The model seems to have been correctly deployed.

NOTE: from here it should not be so difficult to upload the model on the cloud (for example, using Amazon ECR and Lambda)