# Create a Servable ML Model with Tensorflow
In this notebook we will create, save, load, and employ models with Tensorflow. We will work through how to structure code to create models that can be saved and used for inference in the cloud or at the edge with applications such as dashboards, games, anomaly detection, and much more.

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
import pandas as pd
from sklearn.model_selection import train_test_split
import os
print("Tensorflow version:", tf.version.VERSION)
print("Numpy version:", np.version.version)

Tensorflow version: 2.1.0
Numpy version: 1.18.1


In [2]:
MODEL_DIR = 'models' # We will save model to this directory
DATA_PATH = os.path.join('data', 'data.csv') 

Tensorflow 2 includes a [SavedModel](https://www.tensorflow.org/guide/saved_model "Tensorflow SavedModel docs") format that can be utilised for transfer learning and/or inference in applications. In order for a model to be saveable, it has to be of the type `tf.Module`. Keras models satisfy this criterium and are thus relatively simple to save. Before jumping into a specific example of a Keras model, we will, however, address the components of a Tensorflow SavedModel with a general example using pure Tensorflow.

# General Example in Pure Tensorflow

## An Example Model
A Tensorflow SavedModel can be created from a `tf.Module`, so let us create a very simple example model using this object structure. The saved model will include any `tf.Modules`s, any methods with the `@tf.function` decorator, and any `tf.Variable`, but will not include any Python code or functionality.

In [3]:
class LinearScaler(tf.Module):
    '''
    LinearScaler is a very simple linear function that takes in a variable, multiplies it
     by a weight and adds a bias before returning the result.
    '''
    def __init__(self):
        super(LinearScaler, self).__init__()
        self.bias = tf.Variable(1.)
        self.weight = tf.Variable(2.)
    
    # Uncomment to set input signature
    @tf.function#(input_signature=[tf.TensorSpec([], tf.float32)])
    def __call__(self, x):
        '''
        Linearly rescale y = x * weight + bias
        :param x: The variable to be linearly scaled
        :type x: tf.float32
        :output dict: "y" = x * weight + bias
        '''
        return {"y" : x * self.weight + self.bias}

    @tf.function(input_signature=[tf.TensorSpec([], tf.float32), tf.TensorSpec([], tf.float32)])
    def calibrate(self, weight, bias):
        '''
        Set the parameters of the linear function.
        :param weight: Weight of the linear function
        :type weight: tf.float32
        :param bias: Bias of the linear function
        :type bias: tf.float32
        '''
        self.weight.assign(weight)
        self.bias.assign(bias)

Let us make sure that is works by evaluating it. Note that by evaluating the model, it is compiled into a graph - a step that is needed before we save

In [4]:
model = LinearScaler()
assert model(tf.constant(4.))["y"].numpy() == 9 #  2 * 4 + 1
model.calibrate(weight=5, bias=2)
assert model(tf.constant(4.))["y"].numpy() == 22 # 5 * 4 + 2

Fantastic! Now we have a working model. 

## Save and Load
Let us go right ahead and save our model using the `tf.saved_model.save` method:

In [6]:
no_signatures_path = os.path.join(MODEL_DIR, 'no_signatures')
tf.saved_model.save(model, no_signatures_path) # Save the function to a dir

INFO:tensorflow:Assets written to: models\no_signatures\assets


Loading the model also only takes a single line with the `tf.saved_model.load` method:

In [7]:
loaded_model = tf.saved_model.load(no_signatures_path)

What we have done so far will work perfectly well in many cases but, in some cases, the loaded model might behave differently from our expectations. Let us look at some characteristics of the model as it is now.<br><br>
Firstly, the parameters of the loaded model are the same as the original at the time we saved it:

In [8]:
assert model.weight.numpy() == loaded_model.weight.numpy()
print("Weight in loaded model:", loaded_model.weight.numpy())
print("Bias in loaded model:", loaded_model.bias.numpy())

Weight in loaded model: 5.0
Bias in loaded model: 2.0


Specifically, the weight and bias parameters to not attain the values of a newly initialised model but the values that we set just before saving.<br><br>
The loaded model will also evaluate inputs like the original model:

In [9]:
assert model(tf.constant(4.))["y"].numpy() == loaded_model(tf.constant(4.))["y"].numpy()
assert model(tf.constant(22.))["y"].numpy() == loaded_model(tf.constant(22.))["y"].numpy()
print(loaded_model(tf.constant(22.)))

{'y': <tf.Tensor: shape=(), dtype=float32, numpy=112.0>}


However, this is where the simillarity ends. Our original model will happily evaluate a different kind of input. Here for instance a Tensor of several floats:

In [10]:
print(model(tf.constant([1., 2., 3.]))) # Passing a Tensor of floats

{'y': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([ 7., 12., 17.], dtype=float32)>}


But when providing the same input to our loaded model, we are presented with a ValueError

In [11]:
try:
    print(loaded_model(tf.constant([1., 2., 3.]))) # Passing a Tensor of floats
except ValueError as e:
    print("We got a ValueError:\n", e)

We got a ValueError:
 Could not find matching function to call loaded from the SavedModel. Got:
  Positional arguments (1 total):
    * Tensor("x:0", shape=(3,), dtype=float32)
  Keyword arguments: {}

Expected these arguments to match one of the following 1 option(s):

Option 1:
  Positional arguments (1 total):
    * TensorSpec(shape=(), dtype=tf.float32, name='x')
  Keyword arguments: {}


It is not that we did anything wrong. It is just that we did not pay much attention to the *input* of our model. When we evaluated the model before saving it we caused the class we defined to be compiled into a `tf.Graph` object. The Graph object must assume a fixed structure of the input, i.e. an input signature. When we pass a new input to our original model, the graph can get recompiled to fit that input, but our saved and loaded model represents a fixed Graph that can only accept the expected input and will throw a ValueError otherwise. In other words, we should specify the input signature that our model should expect. We did specify the input signature implicitly when evaluating the model the first time, but we might want to do it explicitly.

## Save with an Input Signature
There are two was to specify an input signature to a Tensorflow method. One way is to use the `@tf.function` decorator. The `@tf.function` decorator takes an input_signature as a kwarg, and we can pass it a list of `tf.TensorSpec`s to template the input. We actually already did this in the example model for the `.calibrate` method to show that we are expecting two constant tensors as input, but for the `__call__` method, the signature was commented out.<br>
The second method is to explicitly compile the `tf.Graph` and, in the process, passing an input signature. We can do this by invoking the `.get_concrete_function` method on the `__call__` method:

In [12]:
model_with_signature = LinearScaler() # Get a fresh instance of the function
input_signature_array = tf.TensorSpec([None], tf.float32) # Note! Specifices a Tensor array of floats
call = model_with_signature.__call__.get_concrete_function(input_signature_array) # Compile Graph

When we save the compiled graph, we will pass this compilation

In [13]:
with_signature_path = os.path.join(MODEL_DIR, 'with_signature')
tf.saved_model.save(model_with_signature, with_signature_path, signatures=call) # Save the function

INFO:tensorflow:Assets written to: models\with_signature\assets


Let us load up the model and whether it can handle the input we specified

In [14]:
loaded_model_with_signature = tf.saved_model.load(with_signature_path)
print(loaded_model_with_signature(tf.constant([3., 4., 5.])))

{'y': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([ 7.,  9., 11.], dtype=float32)>}


That is one challenge solved; we now know how to explicitly define the input signature. The model works with the specified input, but any other type of input will cause a ValueError

In [15]:
try:
    print(loaded_model_with_signature(tf.constant(3.)))
except ValueError as e:
    print("We got a ValueError:", e)

We got a ValueError: Could not find matching function to call loaded from the SavedModel. Got:
  Positional arguments (1 total):
    * Tensor("x:0", shape=(), dtype=float32)
  Keyword arguments: {}

Expected these arguments to match one of the following 1 option(s):

Option 1:
  Positional arguments (1 total):
    * TensorSpec(shape=(None,), dtype=tf.float32, name='x')
  Keyword arguments: {}


## Save with Multiple Input Signatures
We might want to save and serve our model with multiple kinds of inputs. For instance in our example model, we would like to serve the model to single inputs as well as a list of inputs. Everything else, like the weights and biases, should be the same, but the input should be flexible. One way to do this might be to save two or more seperate models, but that would create multiple duplicates of the same weights and, as a result, extra operational overhead. Fortunately, there is a better way.<br>
A loaded model actually has a dictionary, `.signatures`, mapping the input signatures that can be evaluated in the model. We only specified a single input signature, so there should be only one entry in the dictionary:

In [16]:
print(list(loaded_model_with_signature.signatures.keys()))

['serving_default']


There is indeed only a single entry, which is the default `"serving_default"` entry. When we call the loaded model and do not specify a key to this dictionary, it will look up `"serving_default"` and decide whether the input signature matches the input we used to call the model.<br>
We could also have supplied the key and created an object of our model expecting the specified input:

In [17]:
inference_object_array = loaded_model_with_signature.signatures["serving_default"]
print(inference_object_array(tf.constant(6.)))

{'y': <tf.Tensor: shape=(), dtype=float32, numpy=13.0>}


We can use this functionality to create one saved model that serves multiple kinds of inputs. We do this by supplying a dictionary mapping keys to Graphs to the `tf.saved_model.save` method

In [18]:
model_multiple_signatures = LinearScaler() # Get a fresh instance of the function

# Input signatures
input_signature_single = tf.TensorSpec(None, tf.float32) # Specifices a single Tensor float
input_signature_array = tf.TensorSpec([None], tf.float32) # Specifices a Tensor array of floats

# Compiled Graphs
call_single = model_multiple_signatures.__call__.get_concrete_function(input_signature_single)
call_array = model_multiple_signatures.__call__.get_concrete_function(input_signature_array)

# Input signature dictionary
signatures = {"serving_default": call_single,
              "array_input": call_array}

# Save the model
multiple_signatures_path = os.path.join(MODEL_DIR, "multiple_signatures")
tf.saved_model.save(model_multiple_signatures, multiple_signatures_path, signatures=signatures)

INFO:tensorflow:Assets written to: models\multiple_signatures\assets


We can load up the model and create two different inference objects 

In [19]:
loaded_model_multiple_signatures = tf.saved_model.load(multiple_signatures_path)
print("Signature dictionary:")
print(loaded_model_multiple_signatures.signatures)
inference_object_single = loaded_model_multiple_signatures.signatures["serving_default"]
inference_object_array = loaded_model_multiple_signatures.signatures["array_input"]

Signature dictionary:
_SignatureMap({'serving_default': <tensorflow.python.saved_model.load._WrapperFunction object at 0x0000018D5CE59E48>, 'array_input': <tensorflow.python.saved_model.load._WrapperFunction object at 0x0000018D5CE24FC8>})


The inference objects still point to the same model, so if we change the parameters of the model, it will apply to both

In [20]:
# Evaluate two types of inputs
print(inference_object_array(tf.constant([3., 4., 5.])))
print(inference_object_single(tf.constant(5.)))

# Change parameters of the model
loaded_model_multiple_signatures.calibrate(tf.constant(3.), tf.constant(4.))

# Evaluate the same input again - note the difference in both!
print(inference_object_array(tf.constant([3., 4., 5.])))
print(inference_object_single(tf.constant(5.)))

{'y': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([ 7.,  9., 11.], dtype=float32)>}
{'y': <tf.Tensor: shape=(), dtype=float32, numpy=11.0>}
{'y': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([13., 16., 19.], dtype=float32)>}
{'y': <tf.Tensor: shape=(), dtype=float32, numpy=19.0>}


Note that this also means that our model is mutable even during inference. In fact the SavedModel format can be used for inference as well as retraining or transfer learning cases.<br>
Now we know almost everythin there is to know about the SavedModel format. Let's just take a quick look inside one of the directories created each time we save a model.

## Inside a Saved Model
Let us have a look at what is in one of the directories we created.

In [21]:
os.listdir(multiple_signatures_path)

['assets', 'saved_model.pb', 'variables']

The saved model consists of several elements.<br>
- `saved_model.pb` contains the model architecture that is used to rebuild the function
- The directory `variables` contain one or more data files holding the values of the parameters in the model at the time it was saved. For large models with billions of parameters, these data files can grow large. A `variables.index` file maps the stored parameters to their right spot in the function
- The directory `assets` holds additional artefacts needed to recreate the function, but should be empty in our case

# A Data Science Workflow Example
Now that we are clear on the basics, let us take a look at a more realistic workflow. Defining, saving, and serving a model built with Keras.<br>
Our model starts with a bit of data and something to be modelled. I have prepared a small dataset consisting of weather observations from a station that observes temperature, relative humidity, air pressure, and whether it is raining or not. Our task is to build a model that predicts whether it rains or not given the temperature, relative humidity, and air pressure. Our target is not to build an awesome or precise model. It is to mimic a real data model lifecycle.<br>
## Example Data
First order of business, let us have a look at the example data.

In [22]:
df = pd.read_csv(DATA_PATH)
feature_cols = ['pressure', 'temperature', 'humidity']
label_col = ['rain']
print("Example observations")
print(df.sample(10))
print()
print("Statistics")
feature_stats = df[feature_cols].describe()
print(feature_stats)

Example observations
      pressure  rain  temperature  humidity
2478    1007.6     0          2.7      36.0
1174     994.3     0          3.1      77.0
1474     984.2     0          5.0      85.0
1655     999.7     0          4.3      95.0
1393     997.7     0         -3.0      74.0
684      984.8     1          1.9      95.0
1449     974.8     1          0.7      95.0
1139     978.1     0          6.8      77.0
1904    1026.8     0          1.6      57.0
1762    1018.6     0          0.0      50.0

Statistics
          pressure  temperature     humidity
count  3151.000000  3151.000000  3151.000000
mean   1005.235989     4.737417    71.054268
std      15.113994     3.998234    20.464912
min     951.100000    -6.400000    22.000000
25%     996.150000     2.200000    55.000000
50%    1005.700000     5.000000    75.000000
75%    1016.500000     6.700000    89.000000
max    1036.400000    20.100000    97.000000


## Transform the Data
The data is of relatively high quality, all we need to do is to reduce the numerical difference and align the variances. Specifically, we will standardise (z norm) the features. If we were just doing analytics this is rather straightforward and we could use all sorts of out of the box functions. But later on we will be serving the model given new data, and any transformations we apply now we should be able to apply to production data later. Fortunately, we know how to build and save a Tensorflow model, so let us build standardisation as as a `tf.Module` with the means and standard deviations of our features as `tf.Variable`s:

In [23]:
class ZNorm(tf.Module):
    '''
    Implements standardisation (Z normalisation) as a Tensorflow model.
    Set the means and standard deviations of the data using the calibrate mathod
     before calling the model on data. The model will not change the data if the
     means and standard deviations are not set.
    :param num_features: Expected number of features
    :type num_features: int
    '''
    def __init__(self, num_features):
        super(ZNorm, self).__init__()
        self.std_devs = tf.Variable(tf.ones([num_features]), dtype=tf.float32)
        self.means = tf.Variable(tf.zeros([num_features]), dtype=tf.float32)
    
    @tf.function
    def __call__(self, x):
        '''
        Compute the Z norm of input features
        :param x: Tensor of length num_features
        :type x: Tensor of floats
        :output: x' = (x - mean)/std_dev
        '''
        return {"x_prime" : tf.divide(tf.subtract(x, self.means), self.std_devs)}

    @tf.function(input_signature=[tf.TensorSpec([None], tf.float32), tf.TensorSpec([None], tf.float32)])
    def calibrate(self, means, standard_deviations):
        '''
        Sets the means and standard deviations of the standardiser.
        :param means: Means of the features in the same order as the features
        :type means: list
        :param standard_deviations: Standard deviations of the features in the same order as the features
        :type standard_deviations: list
        '''
        self.std_devs.assign(standard_deviations)
        self.means.assign(means)

Now we have a Tensorflow model that can take our weather data as input and output standardised features, provided we have set the proper means and standard deviations to do so. So let us create an instance of the model, set the parameters, and save it:

In [24]:
# Build model
num_features = 3
standardiser = ZNorm(num_features) # Get our transformer model
means = feature_stats.loc['mean',:].values
std_devs = feature_stats.loc['std',:].values
standardiser.calibrate(means=means, standard_deviations=std_devs) # Set the parameters of the transformer model
# Specify input signatures
input_signature_array = tf.TensorSpec([num_features], tf.float32) # Specifices a Tensor array of entries
input_signature_single = tf.TensorSpec(num_features, tf.float32) # Specifices a single Tensor entry
call_array = standardiser.__call__.get_concrete_function(input_signature_array) # Compile Graph
call_single = standardiser.__call__.get_concrete_function(input_signature_single) # Compile Graph
signatures = {"serving_default": call_array,
              "single_input": call_single}
# Save the model
standardiser_path = os.path.join(MODEL_DIR, 'standardiser')
tf.saved_model.save(standardiser, standardiser_path, signatures=signatures) # Save the model

INFO:tensorflow:Assets written to: models\standardiser\assets


We could continue to use the model we configured, but for the purpose of demonstration and to make sure that it works, let us load the model and use it.

In [25]:
loaded_standardiser = tf.saved_model.load(standardiser_path)
inference_standardiser = loaded_standardiser.signatures["serving_default"]

We use the transformer model to produce features from our raw data

In [26]:
def standardise(pandas_row):
    '''
    Convenience function to apply our Tensorflow functions to a Dataframe
    '''
    x = tf.constant(pandas_row.values, dtype=tf.float32)
    xp = inference_standardiser(x)["x_prime"].numpy()
    return pd.Series(xp.tolist())
features = df[feature_cols].apply(standardise, axis=1)

Now to see whether we did everything right. Remember, we expect our features to be centered around 0 with unit standard deviation

In [27]:
features.describe()

Unnamed: 0,0,1,2
count,3151.0,3151.0,3151.0
mean,2e-06,-1.540569e-08,2.586615e-08
std,1.0,1.0,1.0
min,-3.581845,-2.785584,-2.396994
25%,-0.601163,-0.6346343,-0.7844777
50%,0.030703,0.0656748,0.1928047
75%,0.745272,0.4908624,0.8769025
max,2.061934,3.842342,1.267815


Everything checks out, so we are ready to build and train a model. 
## Build and Train the Keras Model
Our model will be a simple dense neural network built as a Keras sequential model

In [28]:
def mymodel(X,y):
    '''
    A simple Keras nerual network with training and testing
    :param X: The features (observations X features)
    :type X: NumPy array
    :param y: Labels
    :type y: NumPy array
    :output: Keras model
    '''
    seed = 42
    
    tf.random.set_seed(seed=seed)

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=seed, shuffle=True)
    
    # Create Keras model
    model = keras.Sequential(name="mymodel", layers=[
        keras.layers.InputLayer(input_shape=(X.shape[-1]), name="input"),
        keras.layers.Dense(6, activation="sigmoid", name="dense"),
        keras.layers.Dense(1, activation="sigmoid", name="y")
    ])

    # Compile model with optimizer
    model.compile(optimizer=keras.optimizers.Adam(0.05),
                  loss="binary_crossentropy",
                  metrics=["accuracy"])

    # Train model
    model.fit(x=[X_train], y=[y_train], batch_size=50, epochs=5)

    # Test model
    test_loss, test_acc = model.evaluate(x=[X_test], y=[y_test])
    print("Model test accuracy: ")
    print(test_acc)
    print("Baseline accuracy by random guessing:")
    print((np.random.randint(0,2,y_test.shape) == y_test).sum()/y_test.shape[0])
    print("Baseline accuracy by guessing all zeros:")
    print(1 - np.mean(y_test))
    
    return model

It is finally time to train the model

In [29]:
X = features.values.astype('float32')
y = df[label_col].values
keras_model = mymodel(X, y) # Train and evaluate a model

Train on 2111 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Model test accuracy: 
0.89903843
Baseline accuracy by random guessing:
0.49423076923076925
Baseline accuracy by guessing all zeros:
0.8692307692307693


We are not beating the baseline by much, but let us assume that this is the model we wish to deploy.

## Save and Load the Keras Model
A Keras model has a `.save` method, but the low level Tensorflow method we have been working with, actually works just as well.<br>
Our Keras model differs from our pure Tensorflow models in that we actually did specify an input signature. We did so by using `keras.InputLayer`, telling our model to expect inputs of shape num_observations X num_features. So let us go ahead and just save the model and have a look at what we get

In [30]:
keras_model_path = os.path.join(MODEL_DIR, 'keras_model')
tf.saved_model.save(keras_model, keras_model_path)

INFO:tensorflow:Assets written to: models\keras_model\assets


We load it up again

In [31]:
loaded_keras_model = tf.saved_model.load(keras_model_path)
print("Signature dictionary:")
print(loaded_keras_model.signatures)

Signature dictionary:
_SignatureMap({'serving_default': <tensorflow.python.saved_model.load._WrapperFunction object at 0x0000018D5E8C7EC8>})


The saved model just has the default `serving_default` signature, which in this case specifies the signature we provided to the InputLayer

In [32]:
inference_model = loaded_keras_model.signatures['serving_default']
print(inference_model(tf.constant([[0.55, -1.2, 1.4]]))['y']) # Example

tf.Tensor([[0.20116203]], shape=(1, 1), dtype=float32)


## Inference
Finally, let us have a look at how inference could look. We might have a piece of weather data:

In [33]:
# Here are different ways we might get our data
#inference_data = [[1005., 21., 78.],[1015., 10., 80.]]
inference_data = [[1005., 21., 78.]]
#inference_data = [1005., 21., 78.] # This will not work

We would convert that to a tensor and pass it through the standardiser to get our standardised features

In [34]:
inference_data_tensor = tf.constant(inference_data)
inference_feature_tensor = inference_standardiser(tf.constant(inference_data_tensor))['x_prime']
print("Standardised features:", inference_feature_tensor)

Standardised features: tf.Tensor([[-0.01561215  4.067441    0.3393971 ]], shape=(1, 3), dtype=float32)


We can pass that tensor right on to the model

In [35]:
raw_prediction = inference_model(inference_feature_tensor)['y'].numpy()
print("Raw prediction:", raw_prediction)

Raw prediction: [[0.1579284]]


Now all that remains is to interpret the prediction and get it to the format we want

In [36]:
classification_threshold = 0.5
prediction = (raw_prediction >= classification_threshold).astype(int).tolist()
print("Our final prediction:", prediction)

Our final prediction: [[0]]


# In Production
## Applications
Greengrass, [Tensorflow serving](https://www.tensorflow.org/tfx/serving/serving_basic)
## Managing Artefacts