<a href="https://colab.research.google.com/github/Anjasfedo/Learning-TensorFlow/blob/main/eat_tensorflow2_in_30_days/Chapter6_6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 6-6 Model Deploying Using tensorflow-serving

There are multiple ways to deploy and run the trained models which saved with the original tensorflow format.

For example:
- Load and run the model in the web browser using javascript through `tensorflow-js`.
- Load and run the TensorFlow model on mobile and embeded devices through `tensorflow-lite`.
- Use `tensorflow-serving` to load the model that providing interface API service and to acquire the prediction results from the model through sending request in arbitary programming languages.
- Predict using the Tensorflow model in Java or spark (scala) through the `TensorFlow for Java` port.

This section introduces model deploying by `tensorflow serving` and using spark (scala) to implement the TensorFlow models.

## 0. Introduction to model deploying by tensorflow serving

The necessary steps of model deploying using tensorflow serving areL
- Prepare the protobuf model file.
- Install the tensorflow serving.
- Start the tensorflow serving service.
- Send the request to the API service to obtain the prediction.

In [1]:
%tensorflow_version 2.x
import tensorflow as tf
print(tf.__version__)
from tensorflow.keras import *

Colab only includes TensorFlow 2.x; %tensorflow_version has no effect.
2.17.0


## 1. Prepare the protobuf Model File

Here we train a simple linear regression model with `tf.keras` and save it as protobuf file.

In [2]:
import tensorflow as tf
from tensorflow.keras import models, layers, optimizers

In [4]:
# Number of samples
n = 800

# Generating testing dataset
X = tf.random.uniform([n, 2], minval=-10, maxval=10)
w0 = tf.constant([[2.0], [-1.0]])
b0 = tf.constant(3.0)

Y = X @ w0 + b0 + tf.random.normal([n, 1], mean=0.0, stddev=2.0)

In [5]:
# Modeling
tf.keras.backend.clear_session()

inputs = layers.Input(shape=(2,), name='inputs')
outputs = layers.Dense(1, name='outputs')(inputs)
linear = models.Model(inputs=inputs, outputs=outputs)
linear.summary()

In [6]:
# Training with fit method
linear.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
linear.fit(X, Y, epochs=100, batch_size=8)

Epoch 1/100
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 295.5788 - mae: 14.6414
Epoch 2/100
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 298.9391 - mae: 14.7656  
Epoch 3/100
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 268.1929 - mae: 14.0361
Epoch 4/100
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 249.9100 - mae: 13.6357
Epoch 5/100
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 229.6005 - mae: 12.9643
Epoch 6/100
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 210.6743 - mae: 12.4261
Epoch 7/100
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 197.5682 - mae: 12.0297
Epoch 8/100
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 185.1680 - mae: 11.6811
Epoch 9/100
[1m100/100[0m [32m━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x78fdcada00d0>

In [9]:
tf.print(f'w = {linear.layers[1].kernel}')
tf.print(f'b = {linear.layers[1].bias}')

In [16]:
# Save the model as pb format
export_path = '/content/linear_model/'
version = '1'
tf.saved_model.save(linear, export_path + version)

In [17]:
# Check the saved model file
!ls {export_path + version}

assets	fingerprint.pb	saved_model.pb	variables


In [18]:
# Check the info of the model file
!saved_model_cli show --dir {export_path+str(version)} --all

2024-08-05 01:41:33.081529: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-05 01:41:33.118218: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-05 01:41:33.131053: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
       

## 2. Installing tensorflow serving

Two methods for installing tensorflow serving:
- Using Docker images
- Using apt

Docker image is the simplest way of installation and recommend.

Docker is a container that provides independent environment for various programs.

The companies that are using TensorFlow usually use Docker to install tensorflow serving by operation experts, so the algorithm engineers dont have to worry about the installation.

In [22]:
%%shell
pip install udocker
udocker --allow-root install

Collecting udocker
  Downloading udocker-1.3.16-py2.py3-none-any.whl.metadata (37 kB)
Downloading udocker-1.3.16-py2.py3-none-any.whl (119 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/119.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.4/119.4 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: udocker
Successfully installed udocker-1.3.16
Info: creating repo: /root/.udocker
Info: udocker command line interface 1.3.16
Info: searching for udockertools >= 1.2.11
Info: installing udockertools 1.2.11
Info: installation of udockertools successful




In [24]:
!udocker --allow-root pull tensorflow/serving

Info: downloading layer sha256:5f3d3351762dbf926cfcd91224256abacbe17fdc2eda6feef31f9d9043019d26
Info: downloading layer sha256:d5ed72f49cfed1eba5204d5ef92c2928e88081969bf21d0f274ac277d3ebbbe7
Info: downloading layer sha256:8c75235903f7a607d8961c2f0a98408227392c26ab5dca5c0057c8e1f62a98fb
Info: downloading layer sha256:452f8dcbf8cf4df26c258fa030c18890fbe680653e0eb27f23ebf47f9375f4e5
Info: downloading layer sha256:5fd130a3e8f9a2afef19f26743b69ffde711786ba70be4b16d86446594c6a22c
Info: downloading layer sha256:9ea8908f47652b59b8055316d9c0e16b365e2b5cee15d3efcb79e2957e3e7cad


## 3. Starting tensorflow serving Service

In [None]:
!udocker --allow-root run -t --rm -p 8501:8501 -v {export_path}:/models/linear -e MODEL_NAME=linear tensorflow/serving & >server.log 2>&1

 
 ****************************************************************************** 
 *                                                                            * 
 *               STARTING 68aee9fd-c65b-33b9-8981-d2f2ef95628e                * 
 *                                                                            * 
 ****************************************************************************** 
 executing: tf_serving_entrypoint.sh
2024-08-05 02:01:08.895948: I tensorflow_serving/model_servers/server.cc:77] Building single TensorFlow model file config:  model_name: linear model_base_path: /models/linear
2024-08-05 02:01:08.896586: I tensorflow_serving/model_servers/server_core.cc:474] Adding/updating models.
2024-08-05 02:01:08.896716: I tensorflow_serving/model_servers/server_core.cc:603]  (Re-)adding model: linear
2024-08-05 02:01:09.245896: I tensorflow_serving/core/basic_manager.cc:740] Successfully reserved resources to load servable {name: linear version: 1}
2024-08-05 02

## 4. Sending request to the API service

The request could be sent through http funciton in any kind of programming language. We demonstrate request sending using the `curl` command in Linux and the `requests` library in Python.

In [None]:
!curl -d '{"inputs":[[1.0, 2.0], [3.0, 4.0]]}' -X POST http://localhost:8501/v1/models/linear:predict

In [None]:
import json
import requests

data = json.dumps({"signature_name": "serving_default", "instances": [[1.0, 2.0], [3.0, 4.0]]})
headers = {"content-type": "application/json"}

json_response = requests.post('http://localhost:8501/v1/models/linear:predict', data=data, headers=headers)
predictions = json.loads(json_response.text)['predictions']
print(predictions)