# Deploy a model as a REST API using TensorFlow Serving

In this notebook we will set up a REST API where we can send a request containing an image, and receive a classification in return.

In practice we would of course not run our endpoint service in a notebook, but we can still use a notebook to illustrate the procedure. To run the code we have to install TensorFlow Serving, which in the code below assumes that we are on a Debian-type system. This is the case if running on Google Colab, but in case you are running on your own machine, the recommended approach is to download the TensorFlow Serving [Docker image](https://www.tensorflow.org/tfx/serving/setup) rather than installing it.

https://keras.io/examples/keras_recipes/tf_serving/

## Install TensorFlow Serving

In [None]:
!echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && \
!wget --output-document - https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -

!apt-get update && apt-get install tensorflow-model-server

## Load a pretrained image classification model

Let's load the updated [MobileNet](https://arxiv.org/abs/1704.04861) model to serve as our example.

First, some imports:

In [None]:
import os
from pathlib import Path
import json
import requests
import numpy as np
import tensorflow as tf
import keras
import matplotlib.pyplot as plt

Then we can load the pretrained model from Keras [Applications](https://keras.io/api/applications/mobilenet/#mobilenetv2-function).

In [None]:
model = keras.applications.MobileNetV2()

To get the human readable class name (and not just the class number), we have to use the `decode_predictions` function for the model we have chosen:

In [None]:
# Function to convert scores to label
def postprocess(prediction):

    label = keras.applications.mobilenet_v2.decode_predictions(prediction, top=1)[0][0][1]

    return label

## Choose a test image

Here is one on the images from the ImageNet dataset. You can choose any other image that you like.

In [None]:
!wget -O testimg.jpg https://raw.githubusercontent.com/larq/zoo/master/tests/fixtures/elephant.jpg

Have a look at the image:

In [None]:
sample_img = plt.imread("testimg.jpg")
print(f"Original image shape: {sample_img.shape}")
print(f"Original image pixel range: ({sample_img.min()}, {sample_img.max()})")
plt.imshow(sample_img)
plt.show()

Now we read in the image as a numpy array. The shapes must match what the model expects, which for MobileNet is (224, 224) pixels. The Keras utility functions help us resize easily.

In [None]:
test_img = keras.utils.load_img('testimg.jpg', target_size=(224, 224))
test_img = keras.utils.img_to_array(test_img)

### Apply preprocessing

Remember that the pretrained image models have different ways of preprocessing the input, and we need to choose the corresponding function.

As part of our preprocessing we also add the batch dimension, since the model always expects batches of inputs.

In [None]:
preprocessed_img = keras.applications.mobilenet_v2.preprocess_input(test_img)

print(f"Preprocessed image shape: {preprocessed_img.shape}")
print(
    f"Preprocessed image pixel range: ({preprocessed_img.min()},",
    f"{preprocessed_img.max()})",
)

batched_img = tf.expand_dims(preprocessed_img, axis=0)
batched_img = tf.cast(batched_img, tf.float32)
print(f"Batched image shape: {batched_img.shape}")

model_outputs = model(batched_img)
print(f"Model output shape: {model_outputs.shape}")
print(f"Predicted class: {postprocess(model_outputs)}")


Seems to work when running interactively -- now, let's serve the model as a REST API.

## Serve the model

To start TensorFlow Serving, we need to save the model to file.

In [None]:
from pathlib import Path

In [None]:
model_dir = Path("./model").resolve()
model_version = 1
model_export_path = model_dir / str(model_version)

model.export(model_export_path)

print(f"SavedModel files: {os.listdir(model_export_path)}")

We can check that the save files have the expected inputs and outputs by running the following:

In [None]:
!saved_model_cli show --dir {model_export_path} --tag_set serve --signature_def serving_default

Then we export the saved model directory as an environment variable, just so the server can pick it up.

In [None]:
os.environ["MODEL_DIR"] = f"{model_dir}"

Now, start the server instance in the background, and keep it running.

Some hacks involved here (like `nohup`) -- these are required for keeping it running in the notebook, after we move to the next cel..

In [None]:
%%bash --bg
nohup tensorflow_model_server \
  --port=8500 \
  --rest_api_port=8501 \
  --model_name=model \
  --model_base_path=$MODEL_DIR >server.log 2>&1

We can have a look at the server logs to see what is going on.

You should see
```
[evhttp_server.cc : 250] NET_LOG: Entering the event loop ...
```
at the end -- if not, wait a second and try again.

In [None]:
!cat server.log

Also check if TensorFlow is listening to our requests:

In [None]:
!sudo lsof -i -P -n | grep LISTEN

## Make a request to your model in TensorFlow Serving

For the exicting part, let's finally make a request to our service. Requests have to be in JSON format, and contain our data under `"instances"`.

The request can contain several different configuration parameters, so in case you are serving different models at the same time, the request can contain the model name and switch between them. For all the details on this, have a look at the [documentation](https://www.tensorflow.org/tfx/guide).

In [None]:
# Construct the request in JSON format
data = json.dumps(
    {
        "signature_name": "serving_default",
        "instances": batched_img.numpy().tolist(),  # The image data must be native Python list
    }
)
url = "http://localhost:8501/v1/models/model:predict"

# Print to see what we will send
print(data)

Now define the function that sends the `POST` request to our server. The responce will contain a field "predictions", which is of course what we are interested in.

In [None]:
def predict_rest(json_data, url):

    json_response = requests.post(url, data=json_data)
    response = json.loads(json_response.text)
    rest_outputs = np.array(response["predictions"])

    return rest_outputs

Try it out:

In [None]:
rest_outputs = predict_rest(data, url)

print(f"REST output shape: {rest_outputs.shape}")
print(f"Predicted class: {postprocess(rest_outputs)}")
