# Tips and Tactics for Production

This notebook is more or less a collection of tricks and ideas that can help you build deep learning software for production systems. It is not an exhaustive collection of such tips by any stretch. 

## Sections:

* Saving models.
* Training with checkpoints.
* Persisting metrics for dashboards and reports.
* Collect incoming data as potential future training data and to reproduce failures.

### Saving Models

We don't want to retrain a neural network every time we spin up a new server. Instead, we want to load a pretrained model from a file (which could live in Amazon's S3, another cloud storage service, or as a blob in a database). The following code would be written in standard python files, versioned with `git` or some other version control system, and deployed to a powerful machine with a good GPU or cluster. 

In [1]:
## Simple neural network example.
## So far this should all look very familiar.
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical

num_classes = 10 
image_size = 784

(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_data = training_images.reshape(training_images.shape[0], image_size) 
test_data = test_images.reshape(test_images.shape[0], image_size)

training_labels = to_categorical(training_labels, num_classes)
test_labels = to_categorical(test_labels, num_classes)

model = Sequential()
model.add(Dense(units=256, activation='relu', input_shape=(image_size,)))
model.add(Dense(units=128, activation='relu'))
model.add(Dense(units=64, activation='relu'))
model.add(Dense(units=num_classes, activation='softmax'))

model.compile(optimizer="adam", loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(training_data, training_labels, batch_size=128, epochs=2, verbose=True, validation_split=.1)

model.save('save_files/trained-model-filename-keras-only.h5')

# If you use Keras via tensorflow can also save in a more generic tensorflow format
# That works in all tensorflow implementations, which can give you some nice flexibility
# if you're already invested in a language other than Python.
# See: https://www.tensorflow.org/guide/keras/save_and_serialize
from tensorflow import keras as tf_keras

model_2 = tf_keras.models.Sequential()
model_2.add(tf_keras.layers.Dense(units=256, activation='relu', input_shape=(image_size,)))
model_2.add(tf_keras.layers.Dense(units=128, activation='relu'))
model_2.add(tf_keras.layers.Dense(units=64, activation='relu'))
model_2.add(tf_keras.layers.Dense(units=num_classes, activation='softmax'))

model_2.compile(optimizer="adam", loss='categorical_crossentropy', metrics=['accuracy'])
model_2.fit(training_data, training_labels, batch_size=128, epochs=2, verbose=True, validation_split=.1)

model_2.save('save_files/trained-model-tensorflow-generic', save_format='tf')

Train on 54000 samples, validate on 6000 samples
Epoch 1/2
Epoch 2/2
Train on 54000 samples, validate on 6000 samples
Epoch 1/2
Epoch 2/2
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Assets written to: save_files/trained-model-tensorflow-generic/assets


In [2]:
from pprint import pprint
import pickle
import json

# You can also save JUST the configuration / network architecture.
# This will not persist the results of training — all your learned 
# parameters will be randomly reinitialized. The config is a plain
# old Python dictionary, so it can be saved with something like Pickle 
# or serialized to JSON. 
config = model.get_config()
pprint(config)
pickle.dump(config, open('save_files/pickled-config.p', "wb"))

# You can use the json module to write the config dict to a json file
json.dump(config, open('save_files/json-config.json', 'w')) 

# You can also get the config AS JSON FORMATTED TEXT from Keras.
json_string_config = model.to_json()
print(json_string_config)

# And write it directly to a file
with open('save_files/json-from-keras-config.json', 'w') as json_file:
    json_file.write(json_string_config)

{'layers': [{'class_name': 'Dense',
             'config': {'activation': 'relu',
                        'activity_regularizer': None,
                        'batch_input_shape': (None, 784),
                        'bias_constraint': None,
                        'bias_initializer': {'class_name': 'Zeros',
                                             'config': {}},
                        'bias_regularizer': None,
                        'dtype': 'float32',
                        'kernel_constraint': None,
                        'kernel_initializer': {'class_name': 'GlorotUniform',
                                               'config': {'seed': None}},
                        'kernel_regularizer': None,
                        'name': 'dense',
                        'trainable': True,
                        'units': 256,
                        'use_bias': True}},
            {'class_name': 'Dense',
             'config': {'activation': 'relu',
                        'activit

### Loading Models

The result of your training on the GPU is a file. Part of your service deployment is now fetching the latest version of that file and putting it in the right place. Part of your server or application code now has to load the saved model into it's memory and run it. 

This **does require** a significant degree of integration, specifically your server code now has to be in Python and must depend on Keras. In some cases this is not a problem, in some cases it might require standing up a standalone API server in Python and having your (say) Ruby on Rails webserver make web requests to the Python server, which runs the model and returns the predictions. 

In [3]:
# Loading models from save files is pretty easy. 
from tensorflow.keras.models import load_model
import numpy as np

trained_loaded_model = load_model('save_files/trained-model-filename-keras-only')
tf_trained_loaded_model = tf_keras.models.load_model('save_files/trained-model-tensorflow-generic')

# Loss, Accuracy
a = trained_loaded_model.evaluate(test_data, test_labels, verbose=False)
print(a)

b = tf_trained_loaded_model.evaluate(test_data, test_labels, verbose=False)
print(b)

[0.29273637280445547, 0.9363]
[0.23916887150686233, 0.9377]


In [4]:
# Loading a fresh (untrained) version of the model from configuration files
# is straight forward:
config_from_pickle = pickle.load(open('save_files/pickled-config.p', "rb"))
config_from_json = json.load(open('save_files/json-config.json', 'r'))
alternate_json_config = json.load(open('save_files/json-from-keras-config.json', 'r'))

untrained_model_from_pickle = Sequential.from_config(config_from_pickle)
untrained_model_from_json = Sequential.from_config(config_from_json)
alt_untrained_from_json = Sequential.from_config(alternate_json_config['config']) # NOTE THIS SUBTLE DIFFERENCE!

# Note, if you used Model rather than Sequental to create the model
# you should use this instead: Model.from_config(config)

# Lets see that these have the same config:
# Note that even the names of the layers are the same.
untrained_model_from_pickle.summary()
untrained_model_from_json.summary()
alt_untrained_from_json.summary()

# But note, all of these will still be untrained and uncompiled!
untrained_model_from_json.compile(optimizer="adam", loss='categorical_crossentropy', metrics=['accuracy'])
results = untrained_model_from_json.evaluate(test_data, test_labels, verbose=False)
print("\n Bad Results (loss, accuracy): ", results)

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 256)               200960    
_________________________________________________________________
dense_1 (Dense)              (None, 128)               32896     
_________________________________________________________________
dense_2 (Dense)              (None, 64)                8256      
_________________________________________________________________
dense_3 (Dense)              (None, 10)                650       
Total params: 242,762
Trainable params: 242,762
Non-trainable params: 0
_________________________________________________________________
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 256)               200960    
______________________________

### Weights Only

There are also ways to save and restore the weights of a model separately. For example, you could save the config file exactly once and then save the weights for different versions of the same model over time (e.g. as training goes on), or with different subsets of the training data. This can reduce storage costs at some cost to configuration. See the examples here: https://www.tensorflow.org/guide/keras/save_and_serialize

For this to be useful you'll either need code that can recreate a matching architecuture, or a config file. 

In [5]:
# Fetching weights within python
weights = trained_loaded_model.get_weights()  # Retrieves the state of the model.
untrained_model_from_pickle.set_weights(weights)  # Sets the state of the model.
untrained_model_from_pickle.compile(optimizer="adam", loss='categorical_crossentropy', metrics=['accuracy'])
print(untrained_model_from_pickle.evaluate(test_data, test_labels))

# persisting/restoring weights
trained_loaded_model.save_weights('save_files/trained_weights')
untrained_model_from_json.load_weights('save_files/trained_weights')
untrained_model_from_json.compile(optimizer="adam", loss='categorical_crossentropy', metrics=['accuracy'])
print(untrained_model_from_json.evaluate(test_data, test_labels))

[0.29273637280445547, 0.9363]
[0.29273637280445547, 0.9363]


### Where Should We Load The Model?

With the ability to save and restore models (even restoring them to other languages using the Tensorflow format!) the question is... where **SHOULD** we have the model live? While it's not completely exhaustive, one of these three options will cover most use cases:

* In the existing webserver that supports the application.
* In a standalone webserver that exposes an API for using the model.
* In the application itself. 

All of these options have strengths and weaknesses. Here are some advantages to keeping the model on one of your servers (standalone or an existing server):

* You don't have to configure and deploy a new set of servers.
* You might consider the trained model to be an important piece of intellectual property, so you wouldn't want to just ship that critical secret to all your customers application.
* Your model might be computationally expensive to run and your app might be running on phones or other not-so-powerful computers.
* It's much easier for you to collect a lot of metrics about how the code is being used, if it's running fast enough,  what data is being sent to it, and so on. 
    * But remember to get your customers consent before collecting that data!

Some advantages to shipping the model in the application:

* You don't have to configure and deploy a new set of servers.
* Your customers hardware executes the model, saving you from the cost of computation.
* Your customers data can stay on their device, so you use this strategy to protect consumer privacy.
    * But you'll likely lose a lot of data if you do in fact respect their privacy. Potential future training data as well as performance metrics.
* The application can still work without network connectivity. 

There are some reasons to consider a standalone API even when the application code is also in Python, for example:

* You might not want to pay for beefy webservers when most of the time they won't be using their full power just serving standard HTTPS requests. 
* It will probably be easier to administer the standalone web service than integrate the model into your existing web app.
* You can more easily deploy new versions of the model independently from new versions of the app

## Create Checkpoints While Training

Your code or computer could crash for any number of reasons at any time. If you've been training for 10 hours and the server running that training goes down but you haven't persisted the results of your training to the hard drive, then you're going to be very sad. Instead of training with `.fit` and `epochs=999999` consider building some custom code that wraps calls to `fit` to ensure that you're periodically saving the model.

In [7]:
# This is the simplest version of the idea. Again, you may wish to save the configuration
# once, and persist only the weights during the training process instead of saving the whole
# model every checkpoint.

EPOCHS_PER_CHECKPOINT = 5
CHECKPOINTS = 10

for i in range(CHECKPOINTS):
    model.fit(training_data, training_labels, batch_size=128, epochs=EPOCHS_PER_CHECKPOINT, verbose=False, validation_split=.1)
    cp_name = f'save_files/descriptive-model-name-e{(i+1)*EPOCHS_PER_CHECKPOINT}.h5'
    print("Persisting checkpoint: ", cp_name)
    model.save(cp_name)

Persisting checkpoint:  save_files/descriptive-model-name-e5.h5
Persisting checkpoint:  save_files/descriptive-model-name-e10.h5
Persisting checkpoint:  save_files/descriptive-model-name-e15.h5
Persisting checkpoint:  save_files/descriptive-model-name-e20.h5
Persisting checkpoint:  save_files/descriptive-model-name-e25.h5
Persisting checkpoint:  save_files/descriptive-model-name-e30.h5
Persisting checkpoint:  save_files/descriptive-model-name-e35.h5
Persisting checkpoint:  save_files/descriptive-model-name-e40.h5
Persisting checkpoint:  save_files/descriptive-model-name-e45.h5
Persisting checkpoint:  save_files/descriptive-model-name-e50.h5


### The Checkpoint Callback

Keras also has a helpful callback class that can automatically persist the model during the training process based on the results. For example, this callback makes it easy to make a checkpoint of the model every time validation accuracy improves, instead of over a fixed number of epochs. This callback can also be configured to only save the weights, see  the [ModelCheckpoint Documentation](https://keras.io/callbacks/#modelcheckpoint).

In [8]:
from tensorflow.keras.callbacks import ModelCheckpoint

filename_format = 'save_files/model-checkpoint.{epoch:02d}-{val_loss:.2f}.h5'

model_checkpointer = ModelCheckpoint(
    filename_format,
    monitor='val_accuracy', 
    verbose=1, 
    save_best_only=True,     # If True, the checkpoint will be replaced every time the model improves on val_accuracy.
    save_weights_only=False, # If True the saved files will be the weights only, not the whole model.
    mode='auto', 
    period=1 # If larger, the checkpointer will only run every n epochs.
)

fresh_model = Sequential()
fresh_model.add(Dense(units=256, activation='relu', input_shape=(image_size,)))
fresh_model.add(Dense(units=128, activation='relu'))
fresh_model.add(Dense(units=64, activation='relu'))
fresh_model.add(Dense(units=num_classes, activation='softmax'))

fresh_model.compile(optimizer="adam", loss='categorical_crossentropy', metrics=['accuracy'])
fresh_model.fit(
    training_data, 
    training_labels, 
    batch_size=128, 
    epochs=30, 
    verbose=False, 
    validation_split=.1,
    callbacks=[model_checkpointer]
)


Epoch 00001: val_accuracy improved from -inf to 0.93017, saving model to save_files/model-checkpoint.01-0.36.h5

Epoch 00002: val_accuracy improved from 0.93017 to 0.94933, saving model to save_files/model-checkpoint.02-0.22.h5

Epoch 00003: val_accuracy improved from 0.94933 to 0.95683, saving model to save_files/model-checkpoint.03-0.19.h5

Epoch 00004: val_accuracy improved from 0.95683 to 0.95900, saving model to save_files/model-checkpoint.04-0.16.h5

Epoch 00005: val_accuracy improved from 0.95900 to 0.96450, saving model to save_files/model-checkpoint.05-0.15.h5

Epoch 00006: val_accuracy improved from 0.96450 to 0.96633, saving model to save_files/model-checkpoint.06-0.15.h5

Epoch 00007: val_accuracy did not improve from 0.96633

Epoch 00008: val_accuracy improved from 0.96633 to 0.96750, saving model to save_files/model-checkpoint.08-0.15.h5

Epoch 00009: val_accuracy did not improve from 0.96750

Epoch 00010: val_accuracy improved from 0.96750 to 0.96850, saving model to sa

<tensorflow.python.keras.callbacks.History at 0x13f97da10>

## Persisting Metrics For Dashboards

You should save important success metrics both the training process AND when your model is being used to make predictions in the wild. In addition to the metrics that Keras provides, you might consider adding some timing code or performance tracking code and persist that information as well. Saving this data into a standard database (e.g. some kind of SQL) can make it much easier to share this information with other teams, make it available to other programs, and make it much easier to compare model performance over time. 

Consider establishing a set of metrics you'll collect early on and add code that persists that data into a database. 

Also consider creating custom metrics if they make sense for your use case. Examples: 

* Union over Intersection for segmentation and object localization.
* Precision, Recall, F1 score.
* ...

In [9]:
from tensorflow.keras import backend as K

def recall_m(y_true, y_pred):
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
        recall = true_positives / (possible_positives + K.epsilon())
        return recall

def precision_m(y_true, y_pred):
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
        precision = true_positives / (predicted_positives + K.epsilon())
        return precision

def f1_m(y_true, y_pred):
    precision = precision_m(y_true, y_pred)
    recall = recall_m(y_true, y_pred)
    return 2*((precision*recall)/(precision+recall+K.epsilon()))

fresh_model.compile(
    optimizer="adam", 
    loss='categorical_crossentropy', 
    metrics=['accuracy', recall_m, precision_m, f1_m]
)

history = fresh_model.fit(
    training_data, 
    training_labels, 
    batch_size=128, 
    epochs=5, 
    verbose=True, 
    validation_split=.1,
)

Train on 54000 samples, validate on 6000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [10]:
# The history object will have entries for every metric you provided in the compile function.
pprint(history.history)

{'accuracy': [0.99216664, 0.99314815, 0.99407405, 0.9936482, 0.9948518],
 'f1_m': [0.99218994, 0.9932336, 0.9940877, 0.99363106, 0.9949228],
 'loss': [0.02911178703129257,
          0.02431498431325545,
          0.02007762794816625,
          0.028210866998346452,
          0.018093164894014115],
 'precision_m': [0.9926425, 0.99369496, 0.9944767, 0.99413884, 0.995368],
 'recall_m': [0.9917432, 0.9927772, 0.9937029, 0.993129, 0.9944831],
 'val_accuracy': [0.97783333, 0.9766667, 0.9775, 0.97683334, 0.9773333],
 'val_f1_m': [0.97808516, 0.97668386, 0.9780868, 0.97685844, 0.9773288],
 'val_loss': [0.12683462197775952,
              0.12538980335365826,
              0.1530539061570307,
              0.14601827831566333,
              0.1528835011962801],
 'val_precision_m': [0.978834, 0.97774476, 0.97914827, 0.97891545, 0.9789691],
 'val_recall_m': [0.9773461, 0.9756364, 0.9770374, 0.974829, 0.9757076]}


In [11]:
# Similar to making checkpoints, you can train in a loop and persist these metrics periodially
# You could also implement a custom callback to stream these data to a CSV or other log file 
# At the completion of every epoch, or every batch. See: https://keras.io/callbacks/#example-recording-loss-history
from tensorflow.keras.callbacks import Callback

class LossHistory(Callback):
    def on_train_begin(self, logs={}):
        self.losses = []

    def on_batch_end(self, batch, logs={}):
        self.losses.append(logs.get('loss'))
        ## Could also add code to write the loss to a file or DB here.
        
    # Also available...
#     def on_epoch_end(self, epoch):
#         pass
        
model = Sequential()
model.add(Dense(10, input_dim=784, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

history = LossHistory()
model.fit(training_data, training_labels, batch_size=128, epochs=20, verbose=0, callbacks=[history])

print(history.losses)

[143.9068, 134.3935, 109.62331, 81.13052, 73.33801, 67.08039, 68.3964, 59.369225, 56.911037, 45.053886, 47.75571, 38.94652, 43.34039, 42.606865, 32.626797, 31.139101, 25.408745, 39.87616, 29.443756, 33.258076, 30.139965, 24.05362, 27.596989, 20.259838, 20.786697, 22.721388, 37.352623, 22.442198, 21.050108, 25.81966, 18.68213, 26.213871, 21.418213, 16.088514, 17.188904, 25.06511, 18.961231, 15.641539, 21.627922, 21.592358, 14.574046, 16.620975, 17.258175, 20.750778, 23.002686, 11.33976, 22.314734, 17.733006, 19.123583, 10.391592, 21.413288, 21.949339, 16.263052, 15.397123, 10.776805, 17.12458, 13.154018, 12.101742, 12.477562, 16.201181, 16.814018, 11.686541, 8.361408, 10.635086, 15.326824, 13.652922, 15.419547, 13.634888, 21.099415, 14.477655, 12.618079, 14.613989, 15.21596, 10.284824, 21.112732, 20.645874, 9.030853, 14.310209, 12.579844, 8.967873, 9.72604, 12.964922, 10.813301, 11.74852, 12.108445, 8.041573, 11.749802, 10.626911, 10.75108, 12.007942, 7.0112176, 10.59489, 10.849512, 12.

### Saving Input Data For Later

As your system gets used in the wild you won't have access to the true label. That is, you won't know if your model is correct or incorrect for any given prediction. Because of this, it can be hard to tell how well your model is performing. Additionally, if you're in a field with a "war of attrition" such as spam or fraud detection, your model might start failing more as your adversaries adapt to your AI. An important aspect of combating these problems is to save some (or all) of the data that is sent to your model. 

Similar to the advice about persisting metrics, consider establishing a format for saving new inputs to your model along with your model's predictions. You can monitor these for signs that something is going wrong for example:

* Humans can periodically look through samples for obvious signs of fraud or adversarial input.
* Automatically monitor for low confidence predictions.
* Collect metadata such as IP address and monitor that metadata for patterns (did a single user just send 20,000 requests? They're probably trying to abuse your system somehow.)

Additionally, if you're in a business where you can sometimes collect the true lables (e.g. Netflix users rating movies, "Thumbs Up" on a Pandora song, "Like" a Twitter post, and so on) then you can collect these as well and compare them with versions of your model (evaluate its performance) and/or use the labels to train your model (either online, or by adding the newly labeled data to a batch of training data).

In [12]:
# Extract from HTTP... Save to DB... Nothing is actually that specific to keras here EXCEPT:

prediction = fresh_model.predict(np.array([training_data[0]]))

# You'll probably want to persist the raw prediction confidence :)
pprint(prediction[0])

array([2.3939674e-12, 1.6986448e-17, 1.2851826e-08, 1.8124734e-05,
       7.4689179e-16, 9.9998176e-01, 1.1526551e-07, 2.0227273e-13,
       4.4060511e-09, 9.1414681e-11], dtype=float32)
