# Installation:

Before getting start, we need to install all dependencies first. In order not to mess up the system, we'll use virtual environment here. Open your terminal and let's roll ! ! !
```shell

#Terminal
### create a virtual environment called serve_env ###
virtualenv -p python3 serve_env

### activate the virtual environment and install all dependencies ###
source serve_env/bin/activate

### we will use the serving_requirement.txt file here ###
pip install -r serving_requirement.txt

```

The above commands would install all the dependencies in serving_requirement.txt to the environment “serve_env”. 

![create_env](./pics/create_env.png)
![source_env](./pics/source_env.png)


&nbsp;

Of course, you can also use pip to install them one by one.

```shell

#Terminal
### Here only listed some must have dependencies for this hands-on ###
pip install numpy
pip install pandas
pip install tensorflow
pip install tensorflow-serving-api
pip install grpcio
pip install scikit-learn
pip install tqdm
```

# Environment Setup

Here, we will use jupyter notebook for this and the following couple of sections. As a result, we need to let our jupyter notebook to be able to use the virtual environment that we just create. (You can also refer to this [link](https://anbasile.github.io/programming/2017/06/25/jupyter-venv/))

```shell

#Terminal
### install ipykernel ###
pip install ipykernel

### adding serve_env to jupyter kernel ###
ipython kernel install --user --name=serve_env

### check available kernel of jupyter ###
ipython kernelspec list
```

![env_setup](./pics/env_setup.png)


Now our jupyter notebook can use the virtual environment that we just create.

# Loading Data,  Utils Define, and Training

In this example, we’ll use the super simple MNIST dataset and naive model structure because dataset and model are not the main point in this hands-on. 😘 

We won’t cover this part in detail. So, let's quickly go through this section. 😉 

In [174]:
#
# Import necessary dependencies
#

import os, sys
import tensorflow as tf
import tqdm
import numpy as np
import matplotlib.pyplot as plt
from sklearn.utils import shuffle
from tensorflow.keras.datasets import mnist

## Loading Dataset

We'll use the super easy MNIST dataset here

In [2]:
#
# Load dataset
#

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(-1, 28, 28, 1)
y_train = np.eye(10)[y_train]
x_test = x_test.reshape(-1, 28, 28, 1)
y_test = np.eye(10)[y_test]

## Define Utils

Define some useful fuctions here. (Don't spend too much time here. It's not the main point of this tutorial)

In [4]:
#
# ReduceLROnPlateau
#

class ReduceLROnPlateau():
    def __init__(self, lr, factor, patience, min_lr=1e-10):
        self.lr = lr
        self.factor = factor
        self.patience = patience
        self.min_lr = min_lr
        self.min_loss = None
        self.epoch_count = 0

    def on_epoch_end(self, val_loss, *args, **kwargs):
        if self.min_loss is None or val_loss < self.min_loss:
            self.epoch_count = 0
            self.min_loss = val_loss
        else:
            self.epoch_count += 1

        if self.epoch_count == self.patience:
            self.lr *= self.factor
            self.epoch_count = 0

            if self.lr <= self.min_lr:
                self.lr = self.min_lr
                
#
# Define Generator
#

def trainData_generator(x, y, total_batch):
    '''
    Generator that generate training data in batch
    Args:
        x: input data
        h: input label
        total_batch: number of batches
    '''
    assert len(x)/total_batch == int(len(x)/total_batch)
    new_ind = shuffle(range(len(x)))
    x = x[new_ind]
    x = x/255.
    y = y[new_ind]
    x_batches = np.split(x, total_batch)
    y_batches = np.split(y, total_batch)
    for batch in range(len(x_batches)):
        yield x_batches[batch], y_batches[batch]


def validData_generator(x, y):
    '''
    Function to generate validation data pairs (data, label)
    Args:
        x: input data
        y: input label
    '''
    new_ind = shuffle(range(len(x)))
    x = x[new_ind]
    x = x/255.
    y = y[new_ind]
    return x, y

## Define Model Graph

The model structure is also really naive. It just contains one convolution-polling and on dense layer.

In [4]:
#
# Define model Graph
#

tf.reset_default_graph()
main_graph = tf.Graph()

with main_graph.as_default():

    with tf.name_scope('input'):
        inputs = tf.placeholder(dtype=tf.float32, shape=[
                                None, 28, 28, 1], name='x_input')
        y_true = tf.placeholder(dtype=tf.float32, shape=[
                                None, 10], name='y_true')
        lr = tf.placeholder(dtype=tf.float32, shape=None, name='learning_rate')

    with tf.variable_scope('hidden_layers'):
        model = tf.layers.conv2d(inputs=inputs,
                                 filters=16,
                                 kernel_size=(3, 3),
                                 strides=(1, 1),
                                 padding='same',
                                 activation=tf.nn.relu,
                                 )
        model = tf.layers.max_pooling2d(inputs=model,
                                        pool_size=(2, 2),
                                        strides=2,
                                        )

    with tf.variable_scope('output_layer'):
        model = tf.layers.Flatten()(model)
        output = tf.layers.dense(inputs=model, units=10)
        prediction = tf.nn.softmax(output)

    with tf.name_scope('loss'):
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=output,
                                                                         labels=y_true,
                                                                         name='cross_entropy_loss'))

        optim = tf.train.AdamOptimizer(learning_rate=lr)
        gra_and_var = optim.compute_gradients(loss)
        update = optim.apply_gradients(gra_and_var)

    with tf.name_scope('accuracy'):
        correct_predict = tf.equal(tf.arg_max(
            prediction, dimension=1), tf.arg_max(y_true, 1))
        accuracy = tf.reduce_mean(
            tf.cast(correct_predict, dtype=tf.float32), name='accuracy')

    init_main_graph = tf.global_variables_initializer()

W0627 23:47:03.716828 4515165632 deprecation.py:323] From <ipython-input-4-3a5df2585d13>:23: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.keras.layers.Conv2D` instead.
W0627 23:47:03.724770 4515165632 deprecation.py:506] From /Users/admin/Documents/Cinnamon/Bootcamp/Serving_Example/serve_env/lib/python3.7/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0627 23:47:03.981215 4515165632 deprecation.py:323] From <ipython-input-4-3a5df2585d13>:27: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.MaxPooling2D instead.
W0627 23:47:04.411

## Setting up Hyperparameters

We only train one epoch here with batch size equals to 10000

In [5]:
#
# Hyperparameters
#

epochs = 1
batch_size = 10000
total_batch = int(len(x_train) / batch_size)
reduceLR = ReduceLROnPlateau(lr=0.001, factor=0.5, patience=10)
train_loss, train_acc = [], []
valid_loss, valid_acc = [], []

## Start Training

In [6]:
#
# Training
#


sess = tf.Session(graph=main_graph)
sess.run(init_main_graph)

for epc in tqdm.tqdm(range(epochs)):
    tr_loss_tmp, tr_acc_tmp = 0, 0
    val_loss_tmp, val_acc_tmp = 0, 0
    train_gen = trainData_generator(x_train, y_train, total_batch=total_batch)

    ### training ###
    for x_batch, y_batch in train_gen:

        tr_acc_batch, tr_loss_batch, _ = sess.run([accuracy, loss, update], feed_dict={
            inputs: x_batch,
            y_true: y_batch,
            lr: reduceLR.lr
        })

        tr_loss_tmp += tr_loss_batch
        tr_acc_tmp += tr_acc_batch

    train_loss.append(tr_loss_tmp / total_batch)
    train_acc.append(tr_acc_tmp / total_batch)

    ### validation ###
    x_batch, y_batch = validData_generator(x_test, y_test)

    val_acc_batch, val_loss_batch = sess.run([accuracy, loss], feed_dict={
        inputs: x_batch,
        y_true: y_batch
    })

    val_loss_tmp += val_loss_batch
    val_acc_tmp += val_acc_batch

    valid_loss.append(val_loss_tmp)
    valid_acc.append(val_acc_tmp)
    reduceLR.on_epoch_end(valid_loss[-1])

    print(' Epoch {}: tr_loss {:.3f}, tr_acc {:.3f}; val_loss {:.3f}, val_acc {:.3f}'.format(epc+1,
                                                                                             train_loss[-1],
                                                                                             train_acc[-1],
                                                                                             valid_loss[-1],
                                                                                             valid_acc[-1]))

100%|██████████| 1/1 [00:11<00:00, 11.60s/it]

 Epoch 1: tr_loss 2.181, tr_acc 0.375; val_loss 1.998, val_acc 0.586





In [7]:
#
# Checking our validation loss and accuracy
#

valid_loss, valid_acc

([1.9976853132247925], [0.5863000154495239])

# Save and Load Trained Model:

In this section, we will see the most common two ways to save and restore the trained model in TensorFlow. (You can refer to Deployment Introduction’s [Exporting part](https://paper.dropbox.com/doc/Deployment-Introduction--AgMfk42UNCSVDYgtk2pHmqKdAg-TrYTDU739OGQAP6iyidcc))


## Use Saver: Checkpoint

In [8]:
#
# Save Model (.ckpt)
#

with main_graph.as_default():
    saver = tf.train.Saver()
    saver.save(sess, './excellentModel/myExcellentModel.ckpt')

After saving, we should be able to see the excellentModel folder containing our model’s checkpoint file.

![saver_ckpt](./pics/saver_ckpt.png)


Let’s see what each file means.
- **meta file**: describes the saved graph structure, including GraphDef, SaveDef, and so on


- **index file**: a string-string immutable table (tensor name: metadata of tensor)


- **data file**: saves the values of all variables

In [9]:
#
# Load saved Model (.ckpt)
#

""" create new graph """
tf.reset_default_graph()
ckpt_graph = tf.Graph()
ckpt_sess = tf.Session(graph=ckpt_graph)

with ckpt_graph.as_default():
    saver = tf.train.import_meta_graph(
        './excellentModel/myExcellentModel.ckpt.meta')
    saver.restore(ckpt_sess, './excellentModel/myExcellentModel.ckpt')    

for node in ckpt_graph.as_graph_def().node:
    print(node.name)

W0627 23:48:38.199552 4515165632 deprecation.py:323] From /Users/admin/Documents/Cinnamon/Bootcamp/Serving_Example/serve_env/lib/python3.7/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.


input/x_input
input/y_true
input/learning_rate
hidden_layers/conv2d/kernel/Initializer/random_uniform/shape
hidden_layers/conv2d/kernel/Initializer/random_uniform/min
hidden_layers/conv2d/kernel/Initializer/random_uniform/max
hidden_layers/conv2d/kernel/Initializer/random_uniform/RandomUniform
hidden_layers/conv2d/kernel/Initializer/random_uniform/sub
hidden_layers/conv2d/kernel/Initializer/random_uniform/mul
hidden_layers/conv2d/kernel/Initializer/random_uniform
hidden_layers/conv2d/kernel
hidden_layers/conv2d/kernel/Assign
hidden_layers/conv2d/kernel/read
hidden_layers/conv2d/bias/Initializer/zeros
hidden_layers/conv2d/bias
hidden_layers/conv2d/bias/Assign
hidden_layers/conv2d/bias/read
hidden_layers/conv2d/dilation_rate
hidden_layers/conv2d/Conv2D
hidden_layers/conv2d/BiasAdd
hidden_layers/conv2d/Relu
hidden_layers/max_pooling2d/MaxPool
output_layer/flatten/Shape
output_layer/flatten/strided_slice/stack
output_layer/flatten/strided_slice/stack_1
output_layer/flatten/strided_slice/st

In [10]:
#
# Get tensors and check accuracy and loss
#

# If you want to retrained this model, you need to get the tensors back by their names.

ckpt_acc = ckpt_sess.graph.get_tensor_by_name('accuracy/accuracy:0')
ckpt_loss = ckpt_sess.graph.get_tensor_by_name('loss/Mean:0')
ckpt_input = ckpt_sess.graph.get_tensor_by_name('input/x_input:0')
ckpt_true = ckpt_sess.graph.get_tensor_by_name('input/y_true:0')
ckpt_out = ckpt_sess.graph.get_tensor_by_name('output_layer/Softmax:0')

acc_ckpt, loss_ckpt, pred_ckpt = ckpt_sess.run([ckpt_acc, ckpt_loss, ckpt_out], feed_dict={ckpt_input: x_batch, 
                                                                                           ckpt_true: y_batch})

loss_ckpt, acc_ckpt

(1.9976853, 0.5863)

## Use [SaveModel](https://paper.dropbox.com/doc/Deployment-Introduction--AgMfk42UNCSVDYgtk2pHmqKdAg-TrYTDU739OGQAP6iyidcc): Protobuf

Using Saver, you have to know the name of the model’s output and input tensors. Sometimes it will be super tedious if there is no standard naming policy. As a result, here we will use another way to export the model to Protobuf, and this is also the **standard ways to export TensorFlow models for serving**.

&nbsp;

Here, we will go through three different scenarios. One is using **saved model builder without signature**, another one is using **saved model builder with signature**, and the other is using **simple save**.


### a. Export Model with Saved Model Builder without Signature:

In [11]:
#
# Exporting model to protobuf without defining SignatureDef
#

export_path = './Saved_Model_noSig/1'
with ckpt_graph.as_default():
    builder_noSig = tf.saved_model.builder.SavedModelBuilder(export_path)
    builder_noSig.add_meta_graph_and_variables(
        ckpt_sess, ['noSig_tag1'])
    builder_noSig.add_meta_graph(['noSig_tag2'])
    builder_noSig.save()

Now, we should be able to see the exported Protobuf Model.

![saved_model_nosig](./pics/saved_model_nosig.png)

We can use the command **saved_model_cli** to check the Protobuf Model information. You can install it from [source](https://www.tensorflow.org/guide/saved_model#install_the_savedmodel_cli). However, it should already be installed if we use pip install to install TensorFlow.

```shell
#Terminal
saved_model_cli show --dir ./Saved_Model_noSig/1 --all
```

![saved_model_nosig_cli](./pics/saved_model_nosig_cli.png)

There is not much information because we didn’t set signature here. We can only see tag-set.

Let's load the exported model. 
Because we save two graphs in the same Protobuf, we can load both of them with respective tag names.

In [12]:
#
# Load Model from .pb without Signature
#


tf.reset_default_graph()
noSig_graph1 = tf.Graph()
noSig_graph2 = tf.Graph()
noSig_sess1 = tf.Session(graph=noSig_graph1)
noSig_sess2 = tf.Session(graph=noSig_graph2)


""" Load noSig_tag1 """
with noSig_graph1.as_default():
    meta_graph_noSignature1 = tf.saved_model.loader.load(
        noSig_sess1, ['noSig_tag1'], export_path)

""" Load noSig_tag2 """
with noSig_graph2.as_default():
    meta_graph_noSignature2 = tf.saved_model.loader.load(
        noSig_sess2, ['noSig_tag2'], export_path)

### Get Accuracy and Loss ###
noSig_acc1 = noSig_sess1.graph.get_tensor_by_name('accuracy/accuracy:0')
noSig_loss1 = noSig_sess1.graph.get_tensor_by_name('loss/Mean:0')
noSig_input1 = noSig_sess1.graph.get_tensor_by_name('input/x_input:0')
noSig_true1 = noSig_sess1.graph.get_tensor_by_name('input/y_true:0')
noSig_out1 = noSig_sess1.graph.get_tensor_by_name('output_layer/Softmax:0')

acc_noSig1, loss_noSig1, noSig_pred1 = noSig_sess1.run([noSig_acc1, noSig_loss1, noSig_out1], feed_dict={
    noSig_input1: x_batch,
    noSig_true1: y_batch
})

loss_noSig1, acc_noSig1

W0627 23:48:43.701056 4515165632 deprecation.py:323] From <ipython-input-12-7e1cd9eabd79>:15: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.


(1.9976853, 0.5863)

As you can see, we still have to know the name of the model’s output and input. That’s because we didn’t define signature before exporting the model.

Now, let’s see what signature do.

### b. Export Model with Saved Model Builder with Signature:

Notice that before exporting, we need to transform the tensors to **TensorInfo protocol buffer** which will be used by signature definition.

In [31]:
#
# Export Model to .pb with defining SignatureDef
#

export_path = './Saved_Model_withSig/1'
builder_withSig = tf.saved_model.builder.SavedModelBuilder(export_path)

""" Get tensor info from input and output tensors: transform tensor to TensorInfo protocol buff """
input_withSig = tf.saved_model.utils.build_tensor_info(ckpt_input)
label_withSig = tf.saved_model.utils.build_tensor_info(ckpt_true)
acc_withSig = tf.saved_model.utils.build_tensor_info(ckpt_acc)
loss_withSig = tf.saved_model.utils.build_tensor_info(ckpt_loss)
pred_withSig = tf.saved_model.utils.build_tensor_info(ckpt_out)

""" Define signature_definition: define input and output proto """
# a signature is the set of inputs to and outputs from a graph
signature_definition = tf.saved_model.signature_def_utils.build_signature_def(
    inputs={
        'input_x': input_withSig,
        'input_label': label_withSig
    },
    outputs={
        'accuracy': acc_withSig,
        'loss': loss_withSig,
        'softmaxOut': pred_withSig
    },
    method_name='withSig_method_name' #self-defined method_name
)

""" export """
with ckpt_graph.as_default():
    builder_withSig.add_meta_graph_and_variables(ckpt_sess,
                                                 ['withSig_tag'],
                                                 signature_def_map={'withSig_Key': signature_definition}) #self-defined signature key
    builder_withSig.save()

As you can see, we need to convert input and output tensors to TensorInfo object first. Then, we can define *signature_definition*. 

**The logical keys of inputs and outputs are self-defined.**


Also, we can specify our own **method_name**. (Be aware that when using Predict API, we usually use pre-defined predict method constant)


The tag and the key to signature_definition are both self-defined. Like above, we also usually **used pre-defined serving tag-constant and signature-constant for Serving.**


Now, we should be able to see the exported model.

![saved_model_withsig](./pics/saved_model_withsig.png)

Like above, let's check exported model with **saved_model_cli**

```shell
#terminal
saved_model_cli show --dir Saved_Model_withSig/1 --all
```

![saved_model_withsig_cli](./pics/saved_model_withsig_cli.png)

&nbsp;

Now we can see all the detailed information about the model that we just exported. We can see the **inputs keys**, **outputs keys**, **method name**, **signature key**, etc.

#### All these information will be used for client when making request to the server ! ! !


In [62]:
#
# Load Model from .pb with defining SignatureDef
#

tf.reset_default_graph()
withSig_graph = tf.Graph()
withSig_sess = tf.Session(graph=withSig_graph)


with withSig_graph.as_default():
    meta_graph_withSig = tf.saved_model.loader.load(
        withSig_sess, ['withSig_tag'], export_path)
    signature = meta_graph_withSig.signature_def
    
print('Signature: \n')
print(signature)

Signature: 

{'withSig_Key': inputs {
  key: "input_label"
  value {
    name: "input/y_true:0"
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: -1
      }
      dim {
        size: 10
      }
    }
  }
}
inputs {
  key: "input_x"
  value {
    name: "input/x_input:0"
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: -1
      }
      dim {
        size: 28
      }
      dim {
        size: 28
      }
      dim {
        size: 1
      }
    }
  }
}
outputs {
  key: "accuracy"
  value {
    name: "accuracy/accuracy:0"
    dtype: DT_FLOAT
    tensor_shape {
    }
  }
}
outputs {
  key: "loss"
  value {
    name: "loss/Mean:0"
    dtype: DT_FLOAT
    tensor_shape {
    }
  }
}
outputs {
  key: "softmaxOut"
  value {
    name: "output_layer/Softmax:0"
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: -1
      }
      dim {
        size: 10
      }
    }
  }
}
method_name: "withSig_method_name"
}


With signature, we can easily see the names of inputs, outputs, and their corresponding keys.

Having all those information, we can retrieve the input and output tensors’ names by the logical keys stored in the signature.

(**No need to search through the whole tensors to look for the right names anymore ! ! ! !**🤗 )

In [64]:
### Getting names for desired tensors ###
x_name = signature['withSig_Key'].inputs['input_x'].name
label_name = signature['withSig_Key'].inputs['input_label'].name
acc_name = signature['withSig_Key'].outputs['accuracy'].name
loss_name = signature['withSig_Key'].outputs['loss'].name
softmaxOut_name = signature['withSig_Key'].outputs['softmaxOut'].name

### Getting tensors by names ###
withSig_input = withSig_sess.graph.get_tensor_by_name(x_name)
withSig_true = withSig_sess.graph.get_tensor_by_name(label_name)
withSig_acc = withSig_sess.graph.get_tensor_by_name(acc_name)
withSig_loss = withSig_sess.graph.get_tensor_by_name(loss_name)
withSig_pred = withSig_sess.graph.get_tensor_by_name(softmaxOut_name)

### Testing ###
acc_withSig, loss_withSig, pred_withSig = withSig_sess.run([withSig_acc, withSig_loss, withSig_pred], feed_dict={
    withSig_input: x_batch,
    withSig_true: y_batch
})

loss_withSig, acc_withSig

(1.9976853, 0.5863)

### c. Export Model with Simple Save:

The simple save method is pretty easy. But, there are a few things we need to be aware of.
If we use **simple_save** to export the model:

1. The default tag is **"serve"** (tf.saved_model.tag_constants.SERVING)

2. The signature key is **"serving_default"** (tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY)

3. The method_name for signature_def is **"tensorflow/serving/predict"** (tf.saved_model.signature_constants.PREDICT_METHOD_NAME)


In [66]:
#
# Export Model to .pb with simple_save
#
export_path = './Saved_Model_SimpleSave/1'

### export ###
with ckpt_graph.as_default():
    tf.saved_model.simple_save(
        ckpt_sess,
        export_path,
        inputs={'input_x': ckpt_input, 'input_label': ckpt_true},
        outputs={'accuracy': ckpt_acc, 'loss': ckpt_loss, 'softmaxOut': ckpt_out}
    )

Now, we should be able to see the exported model.

![saved_model_simplesave](./pics/saved_model_simplesave.png)


Let's check exported model by **saved_model_cli**.
```shell
#terminal
saved_model_cli show --dir Saved_Model_SimpleSave/1 --all
```

![saved_model_simplesave_cli](./pics/saved_model_simplesave_cli.png)


As mentioned above, the tag, signature key, and method_name will all use default serving constants. 


Here, we can get all the information we need via the meta graph’s signature.

In [67]:
#
# Load Model from .pb exported by simple_save
#
tf.reset_default_graph()
withSig_graph = tf.Graph()
withSig_sess = tf.Session(graph=withSig_graph)

tf.reset_default_graph()
simpleSave_graph = tf.Graph()
simpleSavfe_sess = tf.Session(graph=simpleSave_graph)

with simpleSave_graph.as_default():
    meta_graph_simpleSave = tf.saved_model.loader.load(simpleSavfe_sess,
                                                       [tf.saved_model.tag_constants.SERVING], #tag: serve
                                                       export_path)
    
    signature_simpleSave = meta_graph_simpleSave.signature_def
    
print('Signature:')
print(signature_simpleSave)

Signature:
{'serving_default': inputs {
  key: "input_label"
  value {
    name: "input/y_true:0"
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: -1
      }
      dim {
        size: 10
      }
    }
  }
}
inputs {
  key: "input_x"
  value {
    name: "input/x_input:0"
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: -1
      }
      dim {
        size: 28
      }
      dim {
        size: 28
      }
      dim {
        size: 1
      }
    }
  }
}
outputs {
  key: "accuracy"
  value {
    name: "accuracy/accuracy:0"
    dtype: DT_FLOAT
    tensor_shape {
    }
  }
}
outputs {
  key: "loss"
  value {
    name: "loss/Mean:0"
    dtype: DT_FLOAT
    tensor_shape {
    }
  }
}
outputs {
  key: "softmaxOut"
  value {
    name: "output_layer/Softmax:0"
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: -1
      }
      dim {
        size: 10
      }
    }
  }
}
method_name: "tensorflow/serving/predict"
}


We can also retrieve input and output tensors’ names by the local keys that we assigned in the signature. 

Just remember the signature key here is **tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY**.

In [69]:
### Getting names for desired tensors ###
x_name = signature_simpleSave['serving_default'].inputs['input_x'].name
label_name = signature_simpleSave['serving_default'].inputs['input_label'].name
acc_name = signature_simpleSave['serving_default'].outputs['accuracy'].name
loss_name = signature_simpleSave['serving_default'].outputs['loss'].name
softmaxOut_name = signature_simpleSave['serving_default'].outputs['softmaxOut'].name

### Getting tensors by names ###
simpleSave_input = simpleSavfe_sess.graph.get_tensor_by_name(x_name)
simpleSave_true = simpleSavfe_sess.graph.get_tensor_by_name(label_name)
simpleSave_acc = simpleSavfe_sess.graph.get_tensor_by_name(acc_name)
simpleSave_loss = simpleSavfe_sess.graph.get_tensor_by_name(loss_name)
simpleSave_pred = simpleSavfe_sess.graph.get_tensor_by_name(softmaxOut_name)

### Testing ###
acc_simpleSave, loss_simpleSave, pred_simpleSave = simpleSavfe_sess.run(
    [simpleSave_acc, simpleSave_loss, simpleSave_pred],
    feed_dict={
                simpleSave_input: x_batch,
                simpleSave_true: y_batch
    })

loss_simpleSave, acc_simpleSave

(1.9976853, 0.5863)

# Export Model for Serving

So far, we know many ways to export or save our trained model. Now, let's really export our model for serving.

There are a few things that need to be aware of.

1. We used **tf.saved_model.signature_constants.PREDICT_METHOD_NAME ** method name here because we will use **predict API** to make request in client section.



2. The exported model will be used for serving, so we assign the tag to be **tf.saved_model.tag_constants.SERVING**



3. Like the above, we assign the key of signature definition to be **tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY**

In [100]:
#
# Export Model for serving
#
export_path = './Saved_Model/1'
serving_builder = tf.saved_model.builder.SavedModelBuilder(export_path)

""" Get tensor info from input and output tensors: transform tensor to TensorInfo protocol buff """
serving_input = tf.saved_model.utils.build_tensor_info(ckpt_input)
serving_label = tf.saved_model.utils.build_tensor_info(ckpt_true)
serving_acc = tf.saved_model.utils.build_tensor_info(ckpt_acc)
serving_loss = tf.saved_model.utils.build_tensor_info(ckpt_loss)
serving_pred = tf.saved_model.utils.build_tensor_info(ckpt_out)

""" Define signature_definition: define input and output proto """
# a signature is the set of inputs to and outputs from a graph
signature_definition = tf.saved_model.signature_def_utils.build_signature_def(
    inputs={
        'input_x': serving_input,
    },
    outputs={
        'softmaxOut': serving_pred
    },
    method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME 
)

### export ###
with ckpt_graph.as_default():
    serving_builder.add_meta_graph_and_variables(ckpt_sess,
                                                 [tf.saved_model.tag_constants.SERVING],
                                                 signature_def_map={
                                                     tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: signature_definition
                                                 })
    serving_builder.save()

As usual, let's check the exported model by **saved_model_cli** to make sure everything is set. (Especially for **method_name**, **tag**, and **signature key**)

```shell
#Terminal
saved_model_cli show --dir Saved_Model/1 --all
```

![serve_model_cli](./pics/serve_model_cli.png)


The model looks just fine! We can go for the server part.

# TensorFlow Serving with Docker

Here, we’ll start to setup our server with Docker. (If you forget this part, you can check [previous tutorial](https://paper.dropbox.com/doc/Deployment-Introduction--AgMfk42UNCSVDYgtk2pHmqKdAg-TrYTDU739OGQAP6iyidcc) or the [official document](https://www.tensorflow.org/tfx/serving/docker) for recap)

Please run below command in your terminal.

### a. Pull tensorflow/serving image from Docker Hub:
```shell
#terminal
### pull tensorflow/serving image from Docker Hub ###
docker pull tensorflow/serving

### check images in your local registry ###
docker images
```

After pulling, we should see the tensorflow/serving image in our local registry now.

![pull_image](./pics/pull_image.png)

&nbsp;

### b. Run serving container via tensorflow/serving image:

```shell
#Terminal
### build docker container ###

docker run -p 8501:8501 -p 8500:8500 --mount type=bind,source="$(pwd)"/Saved_Model,target=/models/myExcellentModel -e MODEL_NAME=myExcellentModel -it -d tensorflow/serving

### check running container ###
docker ps

### show log of container in real-time ###
docker logs -f <YOUR_CONTAINER_ID>
```

![run_container](./pics/run_container.png)

# Client : RESTfull API

One of the major method for client to communicate with TensorFlow Serving server is through RESTfull API.


We can get the inference requests via RESTfull API. 

First, let’s check our serving model’s metadata and status.

### a. Checking model metadata and model status:

With model's metadata and status, we can know the information like input data shape, output shape, or the status of current serving model (AVAILABLE or END).

In [11]:
"""import necessary libraries"""
import requests
import json

"""checking model metadata"""
print('metadata:')
print(requests.get('http://localhost:8501/v1/models/myExcellentModel/metadata').text)

"""checking model status"""
print('status:')
print(requests.get('http://localhost:8501/v1/models/myExcellentModel').text)

metadata:
{
"model_spec":{
 "name": "myExcellentModel",
 "signature_name": "",
 "version": "1"
}
,
"metadata": {"signature_def": {
 "signature_def": {
  "serving_default": {
   "inputs": {
    "input_x": {
     "dtype": "DT_FLOAT",
     "tensor_shape": {
      "dim": [
       {
        "size": "-1",
        "name": ""
       },
       {
        "size": "28",
        "name": ""
       },
       {
        "size": "28",
        "name": ""
       },
       {
        "size": "1",
        "name": ""
       }
      ],
      "unknown_rank": false
     },
     "name": "input/x_input:0"
    }
   },
   "outputs": {
    "softmaxOut": {
     "dtype": "DT_FLOAT",
     "tensor_shape": {
      "dim": [
       {
        "size": "-1",
        "name": ""
       },
       {
        "size": "10",
        "name": ""
       }
      ],
      "unknown_rank": false
     },
     "name": "output_layer/Softmax:0"
    }
   },
   "method_name": "tensorflow/serving/predict"
  }
 }
}
}
}

status:
{
 "model_version_sta

### b. Make predict request:

In [18]:
"""preparing inference data"""
payload = {'signature_name': 'serving_default', 'instances': x_batch[0:3].tolist()}
data = json.dumps(payload)

"""make request to the server"""
headers = {'content-type': 'application/json'} #you can omit the header
json_response = requests.post('http://localhost:8501/v1/models/myExcellentModel:predict', data=data, headers=headers)
print(json_response.text, '\n')

"""parse prediction from response"""
predictions = json.loads(json_response.text)['predictions']
print('predictions: {}'.format(np.argmax(predictions, axis=-1)))


{
    "predictions": [[0.124829, 0.0614052, 0.0995001, 0.157356, 0.0761166, 0.0929893, 0.0945179, 0.0863661, 0.113767, 0.0931527], [0.116204, 0.0558889, 0.107396, 0.144861, 0.0811449, 0.0963742, 0.0748477, 0.0818477, 0.130106, 0.111329], [0.132955, 0.0726666, 0.157384, 0.102203, 0.0766418, 0.0852655, 0.0950602, 0.0959828, 0.102379, 0.0794623]
    ]
} 

predictions: [3 3 2]


We can see that the output format is in accordance with the format mentioned in the [official document](https://www.tensorflow.org/tfx/serving/api_rest#response_format_4).

# Client: gRPC

Another method for client to communicate with TensorFlow Serving server is through gRPC API.

We can do almost the same thing that RESTfull API can do through gRPC API.

First, let’s check our serving model’s metadata and status.

### a. Checking model status:

In [118]:
""" import necessary libraries"""
import grpc
from tensorflow_serving.apis import model_service_pb2_grpc
from tensorflow_serving.apis import get_model_status_pb2

""" create stub(client) via hosted port"""
channel = grpc.insecure_channel('localhost:8500')
status_stub = model_service_pb2_grpc.ModelServiceStub(channel)

""" create and config request"""
request = get_model_status_pb2.GetModelStatusRequest()
request.model_spec.name = 'myExcellentModel'

""" send request and get response """
response = status_stub.GetModelStatus(request, 2)
print(response)

model_version_status {
  version: 1
  state: AVAILABLE
  status {
  }
}



### b. Checking model metadata:

In [30]:
""" import necessary libraries """
from tensorflow_serving.apis import prediction_service_pb2_grpc
from tensorflow_serving.apis import get_model_metadata_pb2

""" create stub (client) via hosted port """
meta_stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

""" create and config request """
request = get_model_metadata_pb2.GetModelMetadataRequest()
request.model_spec.name = 'myExcellentModel'
request.metadata_field.append('signature_def')

""" send request and get response """
response = meta_stub.GetModelMetadata(request, 2)
print(response)

model_spec {
  name: "myExcellentModel"
  version {
    value: 1
  }
}
metadata {
  key: "signature_def"
  value {
    type_url: "type.googleapis.com/tensorflow.serving.SignatureDefMap"
    value: "\n\250\001\n\017serving_default\022\224\001\n9\n\007input_x\022.\n\017input/x_input:0\020\001\032\031\022\013\010\377\377\377\377\377\377\377\377\377\001\022\002\010\034\022\002\010\034\022\002\010\001\022;\n\nsoftmaxOut\022-\n\026output_layer/Softmax:0\020\001\032\021\022\013\010\377\377\377\377\377\377\377\377\377\001\022\002\010\n\032\032tensorflow/serving/predict"
  }
}



### c. Make predict request:

In [117]:
""" import necessary libraries """
from tensorflow_serving.apis import predict_pb2

""" create stub (client) via hosted port """ 
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

""" create and config request """
request = predict_pb2.PredictRequest()
request.model_spec.name = 'myExcellentModel'
request.model_spec.signature_name = tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY #serving_default
request.inputs['input_x'].CopyFrom(tf.make_tensor_proto(x_batch[0:3].astype(dtype=np.float32), shape=[3, 28, 28, 1])) #you can omit shape here

""" send request and get response """
response = stub.Predict(request, 2) # Synchronous request
# response = stub.Predict.future(request, 2) #Asynchronous request

""" get the output scores """
result = response.outputs['softmaxOut'].float_val
print([np.argmax(result[i*10:(i+1)*10]) for i in range(len(x_batch[0:3]))])

[3, 3, 2]


# New version in da house:

Let’s say our customers ask us to give them a service with higher accuracy. Now, we need to train a new model and deploy it without shutting down current service. 

Now, just keep your server running, and start training a new model and export new model for serving.

### a. Train a New Model

In [138]:
#
# Define model Graph
#

tf.reset_default_graph()
main_graph = tf.Graph()

with main_graph.as_default():

    with tf.name_scope('input'):
        inputs = tf.placeholder(dtype=tf.float32, shape=[
                                None, 28, 28, 1], name='x_input')
        y_true = tf.placeholder(dtype=tf.float32, shape=[
                                None, 10], name='y_true')
        lr = tf.placeholder(dtype=tf.float32, shape=None, name='learning_rate')

    with tf.variable_scope('hidden_layers'):
        model = tf.layers.conv2d(inputs=inputs,
                                 filters=16,
                                 kernel_size=(3, 3),
                                 strides=(1, 1),
                                 padding='same',
                                 activation=tf.nn.relu,
                                 )
        model = tf.layers.max_pooling2d(inputs=model,
                                        pool_size=(2, 2),
                                        strides=2,
                                        )
        model = tf.layers.conv2d(inputs=inputs,
                                 filters=16,
                                 kernel_size=(3, 3),
                                 strides=(1, 1),
                                 padding='same',
                                 activation=tf.nn.relu,
                                 )
        model = tf.layers.max_pooling2d(inputs=model,
                                        pool_size=(2, 2),
                                        strides=2,
                                        )

    with tf.variable_scope('output_layer'):
        model = tf.layers.Flatten()(model)
        output = tf.layers.dense(inputs=model, units=10)
        prediction = tf.nn.softmax(output)

    with tf.name_scope('loss'):
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=output,
                                                                         labels=y_true,
                                                                         name='cross_entropy_loss'))

        optim = tf.train.AdamOptimizer(learning_rate=lr)
        gra_and_var = optim.compute_gradients(loss)
        update = optim.apply_gradients(gra_and_var)

    with tf.name_scope('accuracy'):
        correct_predict = tf.equal(tf.arg_max(
            prediction, dimension=1), tf.arg_max(y_true, 1))
        accuracy = tf.reduce_mean(
            tf.cast(correct_predict, dtype=tf.float32), name='accuracy')

    init_main_graph = tf.global_variables_initializer()

W0702 11:51:02.296238 4541388224 deprecation.py:323] From <ipython-input-138-37e4ca855b92>:23: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.keras.layers.Conv2D` instead.
W0702 11:51:02.395655 4541388224 deprecation.py:506] From /Users/admin/Documents/Cinnamon/Bootcamp/Serving_Example/serve_env/lib/python3.7/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0702 11:51:02.938024 4541388224 deprecation.py:323] From <ipython-input-138-37e4ca855b92>:27: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.MaxPooling2D instead.
W0702 11:51:03

In [141]:
#
# Hyperparameters
#

epochs = 2
batch_size = 10000
total_batch = int(len(x_train) / batch_size)
reduceLR = ReduceLROnPlateau(lr=0.001, factor=0.5, patience=10)
train_loss, train_acc = [], []
valid_loss, valid_acc = [], []

In [142]:
#
# Training
#


sess = tf.Session(graph=main_graph)
sess.run(init_main_graph)

for epc in tqdm.tqdm(range(epochs)):
    tr_loss_tmp, tr_acc_tmp = 0, 0
    val_loss_tmp, val_acc_tmp = 0, 0
    train_gen = trainData_generator(x_train, y_train, total_batch=total_batch)

    ### training ###
    for x_batch, y_batch in train_gen:

        tr_acc_batch, tr_loss_batch, _ = sess.run([accuracy, loss, update], feed_dict={
            inputs: x_batch,
            y_true: y_batch,
            lr: reduceLR.lr
        })

        tr_loss_tmp += tr_loss_batch
        tr_acc_tmp += tr_acc_batch

    train_loss.append(tr_loss_tmp / total_batch)
    train_acc.append(tr_acc_tmp / total_batch)

    ### validation ###
    x_batch, y_batch = validData_generator(x_test, y_test)

    val_acc_batch, val_loss_batch = sess.run([accuracy, loss], feed_dict={
        inputs: x_batch,
        y_true: y_batch
    })

    val_loss_tmp += val_loss_batch
    val_acc_tmp += val_acc_batch

    valid_loss.append(val_loss_tmp)
    valid_acc.append(val_acc_tmp)
    reduceLR.on_epoch_end(valid_loss[-1])

    print(' Epoch {}: tr_loss {:.3f}, tr_acc {:.3f}; val_loss {:.3f}, val_acc {:.3f}'.format(epc+1,
                                                                                             train_loss[-1],
                                                                                             train_acc[-1],
                                                                                             valid_loss[-1],
                                                                                             valid_acc[-1]))

 50%|█████     | 1/2 [00:09<00:09,  9.17s/it]

 Epoch 1: tr_loss 2.164, tr_acc 0.319; val_loss 1.964, val_acc 0.549


100%|██████████| 2/2 [00:20<00:00,  9.71s/it]

 Epoch 2: tr_loss 1.829, tr_acc 0.647; val_loss 1.619, val_acc 0.743





### b. Export New Model for Serving

In [146]:
#
# Get tensors and check accuracy and loss
#

ckpt_acc = sess.graph.get_tensor_by_name('accuracy/accuracy:0')
ckpt_loss = sess.graph.get_tensor_by_name('loss/Mean:0')
ckpt_input = sess.graph.get_tensor_by_name('input/x_input:0')
ckpt_true = sess.graph.get_tensor_by_name('input/y_true:0')
ckpt_out = sess.graph.get_tensor_by_name('output_layer/Softmax:0')


#
# Export Model for serving
#
export_path = './Saved_Model/2'
serving_builder = tf.saved_model.builder.SavedModelBuilder(export_path)

### Get tensor info from input and output tensors: transform tensor to TensorInfo protocol buff ###
serving_input = tf.saved_model.utils.build_tensor_info(ckpt_input)
serving_label = tf.saved_model.utils.build_tensor_info(ckpt_true)
serving_acc = tf.saved_model.utils.build_tensor_info(ckpt_acc)
serving_loss = tf.saved_model.utils.build_tensor_info(ckpt_loss)
serving_pred = tf.saved_model.utils.build_tensor_info(ckpt_out)

### Define signature_definition: define input and output proto ###
# a signature is the set of inputs to and outputs from a graph
signature_definition = tf.saved_model.signature_def_utils.build_signature_def(
    inputs={
        'input_x': serving_input,
    },
    outputs={
        'softmaxOut': serving_pred
    },
    method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME
)

### export ###
with main_graph.as_default():
    serving_builder.add_meta_graph_and_variables(sess,
                                                 [tf.saved_model.tag_constants.SERVING],
                                                 signature_def_map={
                                                     tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: signature_definition
                                                 })
    serving_builder.save()

Now, we can see the server loading version 2 and unloading version 1.

![new_model_loading](./pics/new_model_loading.png)

# TensorFlow Serving with Model Config:

In some circumstances, we might want to serve multiple models with multiple serving policy.

We can achieve this by model config file.

&nbsp;

### a. Fake Some Models:

First of all, let’s fake some servable models just by copy-paste the exported model.

```shell

    #Terminal
    ### create a folder called Saved_Models ###
    mkdir Saved_Models
    cd Saved_Models
    
    ### make three folders for storing three servable models ###
    mkdir myExcellentModel myExcellentModel2 myExcellentModel3
    
    ### copy the exported model in Saved_Model to the three folders ###
    cp -r ../Saved_Model/1 myExcellentModel/1
    cp -r ../Saved_Model/2 myExcellentModel/2
    cp -r ../Saved_Model/1 myExcellentModel2/1
    cp -r ../Saved_Model/1 myExcellentModel2/2
    cp -r ../Saved_Model/1 myExcellentModel3/1
    cp -r ../Saved_Model/1 myExcellentModel3/2
```


Now, the Saved_Models folder should look like below.

![saved_models_folder](./pics/saved_models_folder.png)


&nbsp;

### b. Write Model Config File:

Here comes the fun part. Let’s write the model config file serving_model.conf.

```shell
#serving_model.conf
model_config_list: {
    config: {
        name: "myExcellentModel",
        base_path: "/models/myExcellentModel",
        model_platform: "tensorflow",
        model_version_policy: {
            all{}
        }
    }
    config: {
        name: "myExcellentModel2",
        base_path: "/models/myExcellentModel2",
        model_platform: "tensorflow",
        model_version_policy: {latest{
            num_versions: 1 
        }}
    }
    config: {
        name: "myExcellentModel3",
        base_path: "/models/myExcellentModel3",
        model_platform: "tensorflow",
        model_version_policy: {specific{
            versions: 1,
            versions: 2
        }}
    }
}
```

We assign three difference model_version_policy to these three different models. Now, let’s see the differences.


&nbsp;

### c. Run TensorFlow Serving with Model Config File:
```shell
#Terminal
### Run TensorFlow Serving ###
docker run -p 8501:8501 -p 8500:8500 \
    --mount type=bind,source="$(pwd)"/Saved_Models,target=/models/ \
    --mount type=bind,source="$(pwd)"/serving_model.conf,target=/models/model.conf \
    -it -d tensorflow/serving --model_config_file=/models/model.conf

### check container ID ###
docker ps

### show log of container in real-time ###
docker logs -f <YOUR_CONTAINER_ID>
```

&nbsp;

Here are couple of things we need to be aware of.


- We **bind Saved_Models folder to container’s models folder**. (Just like what we did in the previous example)


- We also **bind serving_model.conf file to container’s /models/model.conf**. (Here is where the TensorFlow Serving will look for config file)


- We **don’t need to specify the MODEL_NAME and MODEL_BASE_PATH here**. (We already specify them in the model config file)


After that, we will see a bunch of log messages.  Check the log messages, we should see something like below.


![model_config_serving_1](./pics/model_config_serving_1.png)
![model_config_serving_2](./pics/model_config_serving_2.png)
![model_config_serving_3](./pics/model_config_serving_3.png)


&nbsp;

The server loads all the models according to the version policy that we write in the model config file. (ex: It loads the version 1 and 2 of myExcellentModel3 because we specify the specific policy in the model config file)


### d. Check models' metadata: RESTfull API

In [244]:
"""import necessary libraries"""
import requests
import json

"""checking model metadata"""
print('myExcellentModel metadata:')
print(requests.get('http://localhost:8501/v1/models/myExcellentModel/metadata').text)
print('myExcellentModel:version 1 metadata')
print(requests.get('http://localhost:8501/v1/models/myExcellentModel/versions/1/metadata').text)

print('myExcellentModel2 metadata:')
print(requests.get('http://localhost:8501/v1/models/myExcellentModel2/metadata').text)

print('myExcellentModel3 metadata:')
print(requests.get('http://localhost:8501/v1/models/myExcellentModel3/metadata').text)
print('myExcellentModel3:version 1 metadata')
print(requests.get('http://localhost:8501/v1/models/myExcellentModel3/versions/1/metadata').text)

myExcellentModel metadata:
{
"model_spec":{
 "name": "myExcellentModel",
 "signature_name": "",
 "version": "2"
}
,
"metadata": {"signature_def": {
 "signature_def": {
  "serving_default": {
   "inputs": {
    "input_x": {
     "dtype": "DT_FLOAT",
     "tensor_shape": {
      "dim": [
       {
        "size": "-1",
        "name": ""
       },
       {
        "size": "28",
        "name": ""
       },
       {
        "size": "28",
        "name": ""
       },
       {
        "size": "1",
        "name": ""
       }
      ],
      "unknown_rank": false
     },
     "name": "input/x_input:0"
    }
   },
   "outputs": {
    "softmaxOut": {
     "dtype": "DT_FLOAT",
     "tensor_shape": {
      "dim": [
       {
        "size": "-1",
        "name": ""
       },
       {
        "size": "10",
        "name": ""
       }
      ],
      "unknown_rank": false
     },
     "name": "output_layer/Softmax:0"
    }
   },
   "method_name": "tensorflow/serving/predict"
  }
 }
}
}
}

myExcellentM

### e. Check models' status: RESTfull API

In [245]:
"""checking model status"""
print('myExcellentModel status:')
print(requests.get('http://localhost:8501/v1/models/myExcellentModel').text)

print('myExcellentModel2 status:')
print(requests.get('http://localhost:8501/v1/models/myExcellentModel2').text)

print('myExcellentModel3 status:')
print(requests.get('http://localhost:8501/v1/models/myExcellentModel3').text)

myExcellentModel status:
{
 "model_version_status": [
  {
   "version": "2",
   "state": "AVAILABLE",
   "status": {
    "error_code": "OK",
    "error_message": ""
   }
  },
  {
   "version": "1",
   "state": "AVAILABLE",
   "status": {
    "error_code": "OK",
    "error_message": ""
   }
  }
 ]
}

myExcellentModel2 status:
{
 "model_version_status": [
  {
   "version": "3",
   "state": "END",
   "status": {
    "error_code": "OK",
    "error_message": ""
   }
  },
  {
   "version": "2",
   "state": "AVAILABLE",
   "status": {
    "error_code": "OK",
    "error_message": ""
   }
  }
 ]
}

myExcellentModel3 status:
{
 "model_version_status": [
  {
   "version": "2",
   "state": "AVAILABLE",
   "status": {
    "error_code": "OK",
    "error_message": ""
   }
  },
  {
   "version": "1",
   "state": "AVAILABLE",
   "status": {
    "error_code": "OK",
    "error_message": ""
   }
  }
 ]
}



### f. Make predict request: RESTfull API (Take myExcellentModel version 2 as an example)

In [246]:
"""preparing inference data"""
payload = {'signature_name': 'serving_default', 'instances': x_batch[0:3].tolist()}
data = json.dumps(payload)

"""make request to the server"""
headers = {'content-type': 'application/json'} #you can omit the header
json_response = requests.post('http://localhost:8501/v1/models/myExcellentModel/versions/2:predict', data=data, headers=headers)
print(json_response.text, '\n')

"""parse prediction from response"""
predictions = json.loads(json_response.text)['predictions']
print('predictions: {}'.format(np.argmax(predictions, axis=-1)))

{
    "predictions": [[0.103439, 0.0853091, 0.0982532, 0.118523, 0.0811818, 0.0819828, 0.120437, 0.0812559, 0.137039, 0.0925796], [0.0946726, 0.0570425, 0.133527, 0.105784, 0.114615, 0.0966139, 0.0889287, 0.106509, 0.0995103, 0.102796], [0.186527, 0.0494206, 0.0914837, 0.116658, 0.080768, 0.0927301, 0.100025, 0.0688501, 0.118372, 0.0951648]
    ]
} 

predictions: [8 2 0]


### g. Check model status: gRPC

In [247]:
""" import necessary libraries"""
import grpc
from tensorflow_serving.apis import model_service_pb2_grpc
from tensorflow_serving.apis import get_model_status_pb2

""" create stub(client) via hosted port"""
channel = grpc.insecure_channel('localhost:8500')
status_stub = model_service_pb2_grpc.ModelServiceStub(channel)

""" create and config request"""
request = get_model_status_pb2.GetModelStatusRequest()
request.model_spec.name = 'myExcellentModel'

request2 = get_model_status_pb2.GetModelStatusRequest()
request2.model_spec.name = 'myExcellentModel2'

request3 = get_model_status_pb2.GetModelStatusRequest()
request3.model_spec.name = 'myExcellentModel3'
request3.model_spec.version.value = 1 #assign version value

""" send request and get response """
response = status_stub.GetModelStatus(request, 2)
print('myExcellentModel:')
print(response)

response2 = status_stub.GetModelStatus(request2, 2)
print('myExcellentModel2:')
print(response2)

response3 = status_stub.GetModelStatus(request3, 2)
print('myExcellentModel3:')
print(response3)

myExcellentModel:
model_version_status {
  version: 2
  state: AVAILABLE
  status {
  }
}
model_version_status {
  version: 1
  state: AVAILABLE
  status {
  }
}

myExcellentModel2:
model_version_status {
  version: 3
  state: END
  status {
  }
}
model_version_status {
  version: 2
  state: AVAILABLE
  status {
  }
}

myExcellentModel3:
model_version_status {
  version: 1
  state: AVAILABLE
  status {
  }
}



### h. Check model metadata: gRPC

In [248]:
### Here we only take myExcellentModel version 1 as an example ###

""" import necessary libraries """
from tensorflow_serving.apis import prediction_service_pb2_grpc
from tensorflow_serving.apis import get_model_metadata_pb2

""" create stub (client) via hosted port """
meta_stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

""" create and config request """
request = get_model_metadata_pb2.GetModelMetadataRequest()
request.model_spec.name = 'myExcellentModel3'
request.model_spec.version.value = 1
request.metadata_field.append('signature_def')

""" send request and get response """
response = meta_stub.GetModelMetadata(request, 2)
print(response)

model_spec {
  name: "myExcellentModel3"
  version {
    value: 1
  }
}
metadata {
  key: "signature_def"
  value {
    type_url: "type.googleapis.com/tensorflow.serving.SignatureDefMap"
    value: "\n\250\001\n\017serving_default\022\224\001\n9\n\007input_x\022.\n\017input/x_input:0\020\001\032\031\022\013\010\377\377\377\377\377\377\377\377\377\001\022\002\010\034\022\002\010\034\022\002\010\001\022;\n\nsoftmaxOut\022-\n\026output_layer/Softmax:0\020\001\032\021\022\013\010\377\377\377\377\377\377\377\377\377\001\022\002\010\n\032\032tensorflow/serving/predict"
  }
}



### i. Make predict request: gRPC (Take myExcellentModel version 1 as an example)

In [249]:
""" import necessary libraries """
from tensorflow_serving.apis import predict_pb2

""" create stub (client) via hosted port """ 
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

""" create and config request """
request = predict_pb2.PredictRequest()
request.model_spec.name = 'myExcellentModel'
request.model_spec.version.value = 1
request.model_spec.signature_name = tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY #serving_default
request.inputs['input_x'].CopyFrom(tf.make_tensor_proto(x_batch[0:3].astype(dtype=np.float32)))

""" send request and get response """
response = stub.Predict(request, 2) # Synchronous request
# response = stub.Predict.future(request, 2) #Asynchronous request

""" get the output scores """
result = response.outputs['softmaxOut'].float_val
print([np.argmax(result[i*10:(i+1)*10]) for i in range(len(x_batch[0:3]))])

[8, 2, 0]


# Update Model Config File at Runtime

Now, let’s say we have new strategy to server our fabulous models. We don’t want our customer to use all version of our models, and we don’t want them to know which version they are using. How can we do all the above changes with out shutting down our service? We can do that through gRPC.

&nbsp;


### a. Create a new model config file called serving_model_new.conf 

```shell
#terminal
vim serving_model_new.conf

#copy and paste below new config file
model_config_list: {
    config: {
        name: "myExcellentModel",
        base_path: "/models/myExcellentModel",
        model_platform: "tensorflow",
        model_version_policy: {latest{
            num_versions: 1
        }}
    }
    config: {
        name: "myExcellentModel2",
        base_path: "/models/myExcellentModel2",
        model_platform: "tensorflow",
        model_version_policy: {latest{
            num_versions: 1                                         
        }}
    }
    config: {
        name: "myExcellentModel3",
        base_path: "/models/myExcellentModel3",
        model_platform: "tensorflow",
        model_version_policy: {specific{
            versions: 2,
            versions: 1                                                    
        }}
        version_labels: {
            key: 'stable'
            value: 1 
        }
        version_labels: {
            key: 'canary'
            value: 2 
        }
    }
}
```

Here we change the version policy of myExcellentModel and myExcellentModel2. Also, we assign **version labels** to myExcellentModel3.

### b. Update server’s model config:

In [250]:
os.getcwd()

'/Users/admin/Documents/Cinnamon/Bootcamp/Serving_Example'

In [251]:
from tensorflow_serving.apis import model_service_pb2_grpc
from tensorflow_serving.apis import model_management_pb2
from tensorflow_serving.config import model_server_config_pb2
from google.protobuf import text_format

import grpc

def update_model_config(host, names, base_path, model_platform, config_file_path):
    '''
    Function to update model config list at serving runtime
    Args:
        host: str, hosting port
        names: list, list of models' names
        base_path: str, model's base path
        model_platform: str, model's platform, ex: tensorflow
        config_file_path: str, the path of the new model config file
    '''
    
    """ read the config file"""
    with open(config_file_path, 'r+') as f:
        config_ini = f.read()
    
    """ parse text and merge into model server config"""
    model_server_config = model_server_config_pb2.ModelServerConfig()
    model_server_config = text_format.Parse(text=config_ini, message=model_server_config)
    
    
    """ create stub(client) via hosted port"""
    channel = grpc.insecure_channel(host) 
    stub = model_service_pb2_grpc.ModelServiceStub(channel)
    
    """ create request """
    request = model_management_pb2.ReloadConfigRequest() 

    """ create a config to add to the list of served models"""
    config_list = model_server_config_pb2.ModelConfigList()
    
    """ config request"""
    request.config.CopyFrom(model_server_config)

    print(request.IsInitialized())
    print(request.ListFields())
    
    """ handle reload request"""
    response = stub.HandleReloadConfigRequest(request,10)
    
    if response.status.error_code == 0:
        print("Reload sucessfully")
    else:
        print("Reload failed!")
        print(response.status.error_code)
        print(response.status.error_message)


update_model_config(host="localhost:8500", 
                    names=["myExcellentModel", "myExcellentModel2", "myExcellentModel3"], 
                    base_path="/models", 
                    model_platform="tensorflow", 
                    config_file_path='./serving_model_new.conf')

True
[(<google.protobuf.pyext._message.FieldDescriptor object at 0x147ab3dd8>, model_config_list {
  config {
    name: "myExcellentModel"
    base_path: "/models/myExcellentModel"
    model_platform: "tensorflow"
    model_version_policy {
      latest {
        num_versions: 1
      }
    }
  }
  config {
    name: "myExcellentModel2"
    base_path: "/models/myExcellentModel2"
    model_platform: "tensorflow"
    model_version_policy {
      latest {
        num_versions: 1
      }
    }
  }
  config {
    name: "myExcellentModel3"
    base_path: "/models/myExcellentModel3"
    model_platform: "tensorflow"
    model_version_policy {
      specific {
        versions: 2
        versions: 1
      }
    }
    version_labels {
      key: "canary"
      value: 2
    }
    version_labels {
      key: "stable"
      value: 1
    }
  }
}
)]
Reload sucessfully


### c. Check models' status

In [252]:
""" import necessary libraries"""
import grpc
from tensorflow_serving.apis import model_service_pb2_grpc
from tensorflow_serving.apis import get_model_status_pb2

""" create stub(client) via hosted port"""
channel = grpc.insecure_channel('localhost:8500')
status_stub = model_service_pb2_grpc.ModelServiceStub(channel)

""" create and config request"""
request = get_model_status_pb2.GetModelStatusRequest()
request.model_spec.name = 'myExcellentModel'

request2 = get_model_status_pb2.GetModelStatusRequest()
request2.model_spec.name = 'myExcellentModel2'

request3 = get_model_status_pb2.GetModelStatusRequest()
request3.model_spec.name = 'myExcellentModel3'


""" send request and get response """
response = status_stub.GetModelStatus(request, 2)
print('myExcellentModel:')
print(response)

response2 = status_stub.GetModelStatus(request2, 2)
print('myExcellentModel2:')
print(response2)

response3 = status_stub.GetModelStatus(request3, 2)
print('myExcellentModel3:')
print(response3)

myExcellentModel:
model_version_status {
  version: 2
  state: AVAILABLE
  status {
  }
}
model_version_status {
  version: 1
  state: END
  status {
  }
}

myExcellentModel2:
model_version_status {
  version: 3
  state: END
  status {
  }
}
model_version_status {
  version: 2
  state: AVAILABLE
  status {
  }
}

myExcellentModel3:
model_version_status {
  version: 2
  state: AVAILABLE
  status {
  }
}
model_version_status {
  version: 1
  state: AVAILABLE
  status {
  }
}



### d. Make predict request via version label:

In [253]:
""" import necessary libraries """
from tensorflow_serving.apis import predict_pb2

""" create stub (client) via hosted port """ 
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

""" create and config request """
request = predict_pb2.PredictRequest()
request.model_spec.name = 'myExcellentModel3'
request.model_spec.version_label = 'stable' #assing version label
request.model_spec.signature_name = tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY #serving_default
request.inputs['input_x'].CopyFrom(tf.make_tensor_proto(x_batch[0:3].astype(dtype=np.float32)))

""" send request and get response """
response = stub.Predict(request, 2) # Synchronous request
# response = stub.Predict.future(request, 2) #Asynchronous request

""" get the output scores """
result = response.outputs['softmaxOut'].float_val
print([np.argmax(result[i*10:(i+1)*10]) for i in range(len(x_batch[0:3]))])

[8, 2, 0]
