# TF-TRT Keras Classification Examples:

In this notebook, we cover a variety of classification base networks pulled from the tensorflow.keras.applications project!

This demonstrates TF-TRT working on a variety of model architectures out of the box. This is a great way to demonstrate the ease of use of TF-TRT. TF-TRT can still optimize parts of your network even if it contains layers that are not supported by TensorRT itself. This makes it easy to get a first-pass at an optimized model - as we will demonstrate here.

Let's make sure our GPUs are properly configured and visible:

In [1]:
!nvidia-smi

Fri Jan 29 22:55:18 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-DGXS...  On   | 00000000:07:00.0 Off |                    0 |
| N/A   43C    P0    62W / 300W |    125MiB / 16155MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-DGXS...  On   | 00000000:08:00.0 Off |                    0 |
| N/A   42C    P0    38W / 300W |      6MiB / 16158MiB |      0%      Default |
|       

Remember to sucessfully deploy a TensorRT model, you have to make __five key decisions__:

1. __What format should I save my model in?__
2. __What batch size(s) am I running inference at?__
3. __What precision am I running inference at?__
4. __What TensorRT path am I using to convert my model?__
5. __What runtime am I targeting?__

Let's get to it!

## 1. What format should I save my model in?

TF-TRT requires SavedModel format in Tensorflow 2.x:

In [2]:
!mkdir -p tmp_savedmodels

In [3]:
from tensorflow.keras.applications import ResNet50, VGG16, InceptionV3, Xception, MobileNetV2, DenseNet121, ResNet50V2

print("Downloading and initializing models...")
models = [ResNet50, VGG16, InceptionV3, Xception, MobileNetV2, DenseNet121, ResNet50V2]
models = [model(include_top=True, weights='imagenet') for model in models]

model_dirs = []
for idx, model in enumerate(models):
    print("Saving", model,"...")
    model_dir = 'tmp_savedmodels/%s' % idx
    model_dirs.append(model_dir)
    model.save(model_dir) 
    print("Finished saving!")

downloading and initializing models...
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels.h5
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/inception_v3/inception_v3_weights_tf_dim_ordering_tf_kernels.h5
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/xception/xception_weights_tf_dim_ordering_tf_kernels.h5
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224.h5
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/densenet/densenet121_weights_tf_dim_ordering_tf_kernels.h5
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50v2_weights_tf_dim_ordering_tf_kernels.h5
saving <tensorflow.python.keras.engine.functional.Functional object at 0x7f5cb859b0f0> ...
Instructions fo

## 2. What batch size(s) am I running inference at?

We will use a batch size of 32 for all models:

In [4]:
BATCH_SIZE = 32

We create a series of randomized "dummy" batches to test our model on:

In [5]:
import numpy as np

dummy_input_batch = lambda x: np.zeros((BATCH_SIZE, x, x, 3))

dummy_inputs = [224, 224, 299, 299, 224, 224, 224]
dummy_inputs = [dummy_input_batch(size) for size in dummy_inputs]

Last, we "warm up" all of our models so their one time start-up costs aren't throw off any of our Jupyter magic %%timeit timer calls:

In [6]:
# Warm up:
for idx, model in enumerate(models):
    model.predict(dummy_inputs[idx])



## 3. What precision am I running inference at?

We will leave it as the default:

In [7]:
PRECISION = "FP32"

## 4. What TensorRT tool or integration am I using to convert my model?

We will be using TF-TRT through the ModelOptimizer example wrapper used in this guide:

In [8]:
from helper import ModelOptimizer

opt_models = []
for model_class, model, dummy in zip(models, model_dirs, dummy_inputs):
    print("Starting", model_class._name, model)
    model_opt = ModelOptimizer(model)
    opt_trt = model_opt.convert(model+'_'+PRECISION, precision=PRECISION)

    print(opt_trt.predict(dummy))
    
    opt_models.append(opt_trt)
    
    print("Finished!\n")

Starting resnet50 tmp_savedmodels/0
INFO:tensorflow:Linked TensorRT version: (7, 2, 1)
INFO:tensorflow:Loaded TensorRT version: (7, 2, 2)
INFO:tensorflow:Loaded TensorRT 7.2.2 and linked TensorFlow against TensorRT 7.2.1. This is supported because TensorRT  minor/patch upgrades are backward compatible
INFO:tensorflow:Could not find TRTEngineOp_0_0 in TF-TRT cache. This can happen if build() is not called, which means TensorRT engines will be built and cached at runtime.
INFO:tensorflow:Assets written to: tmp_savedmodels/0_FP32/assets
[[1.6964252e-04 3.3007402e-04 6.1350249e-05 ... 1.4622317e-05
  1.4449877e-04 6.6086568e-04]
 [1.6964252e-04 3.3007402e-04 6.1350249e-05 ... 1.4622317e-05
  1.4449877e-04 6.6086568e-04]
 [1.6964252e-04 3.3007402e-04 6.1350249e-05 ... 1.4622317e-05
  1.4449877e-04 6.6086568e-04]
 ...
 [1.6964252e-04 3.3007402e-04 6.1350249e-05 ... 1.4622317e-05
  1.4449877e-04 6.6086568e-04]
 [1.6964252e-04 3.3007402e-04 6.1350249e-05 ... 1.4622317e-05
  1.4449877e-04 6.608

## 5. What TensorRT runtime am I targeting?

We will stay inside our Tensorflow/Python runtime:

In [9]:
opt_models[idx].predict(dummy_inputs[idx])

array([[0.00082353, 0.00079469, 0.00060477, ..., 0.00036948, 0.00069747,
        0.00154858],
       [0.00082353, 0.00079469, 0.00060477, ..., 0.00036948, 0.00069747,
        0.00154858],
       [0.00082353, 0.00079469, 0.00060477, ..., 0.00036948, 0.00069747,
        0.00154858],
       ...,
       [0.00082353, 0.00079469, 0.00060477, ..., 0.00036948, 0.00069747,
        0.00154858],
       [0.00082353, 0.00079469, 0.00060477, ..., 0.00036948, 0.00069747,
        0.00154858],
       [0.00082353, 0.00079469, 0.00060477, ..., 0.00036948, 0.00069747,
        0.00154858]], dtype=float32)

## Performance Comparisons:

In [10]:
idx = 0 #resnet

In [11]:
%%time

models[idx].predict(dummy_inputs[idx])

CPU times: user 160 ms, sys: 5.52 ms, total: 166 ms
Wall time: 148 ms


array([[1.69642386e-04, 3.30075040e-04, 6.13506127e-05, ...,
        1.46224065e-05, 1.44499005e-04, 6.60870341e-04],
       [1.69642386e-04, 3.30075040e-04, 6.13506127e-05, ...,
        1.46224065e-05, 1.44499005e-04, 6.60870341e-04],
       [1.69642386e-04, 3.30075040e-04, 6.13506127e-05, ...,
        1.46224065e-05, 1.44499005e-04, 6.60870341e-04],
       ...,
       [1.69642386e-04, 3.30075040e-04, 6.13506127e-05, ...,
        1.46224065e-05, 1.44499005e-04, 6.60870341e-04],
       [1.69642386e-04, 3.30075040e-04, 6.13506127e-05, ...,
        1.46224065e-05, 1.44499005e-04, 6.60870341e-04],
       [1.69642386e-04, 3.30075040e-04, 6.13506127e-05, ...,
        1.46224065e-05, 1.44499005e-04, 6.60870341e-04]], dtype=float32)

In [12]:
%%time

opt_models[idx].predict(dummy_inputs[idx])

CPU times: user 30.2 ms, sys: 8.3 ms, total: 38.5 ms
Wall time: 36.6 ms


array([[1.6964252e-04, 3.3007402e-04, 6.1350249e-05, ..., 1.4622317e-05,
        1.4449877e-04, 6.6086568e-04],
       [1.6964252e-04, 3.3007402e-04, 6.1350249e-05, ..., 1.4622317e-05,
        1.4449877e-04, 6.6086568e-04],
       [1.6964252e-04, 3.3007402e-04, 6.1350249e-05, ..., 1.4622317e-05,
        1.4449877e-04, 6.6086568e-04],
       ...,
       [1.6964252e-04, 3.3007402e-04, 6.1350249e-05, ..., 1.4622317e-05,
        1.4449877e-04, 6.6086568e-04],
       [1.6964252e-04, 3.3007402e-04, 6.1350249e-05, ..., 1.4622317e-05,
        1.4449877e-04, 6.6086568e-04],
       [1.6964252e-04, 3.3007402e-04, 6.1350249e-05, ..., 1.4622317e-05,
        1.4449877e-04, 6.6086568e-04]], dtype=float32)

In [13]:
idx = -3 # mobilenets

In [14]:
%%time

models[idx].predict(dummy_inputs[idx])

CPU times: user 105 ms, sys: 14.4 ms, total: 120 ms
Wall time: 63.5 ms


array([[1.8110899e-04, 6.4530974e-04, 6.8695901e-04, ..., 7.9570033e-05,
        1.3486811e-04, 3.3462986e-03],
       [1.8110899e-04, 6.4530974e-04, 6.8695901e-04, ..., 7.9570033e-05,
        1.3486811e-04, 3.3462986e-03],
       [1.8110899e-04, 6.4530974e-04, 6.8695901e-04, ..., 7.9570033e-05,
        1.3486811e-04, 3.3462986e-03],
       ...,
       [1.8110899e-04, 6.4530974e-04, 6.8695901e-04, ..., 7.9570033e-05,
        1.3486811e-04, 3.3462986e-03],
       [1.8110899e-04, 6.4530974e-04, 6.8695901e-04, ..., 7.9570033e-05,
        1.3486811e-04, 3.3462986e-03],
       [1.8110899e-04, 6.4530974e-04, 6.8695901e-04, ..., 7.9570033e-05,
        1.3486811e-04, 3.3462986e-03]], dtype=float32)

In [15]:
%%time

opt_models[idx].predict(dummy_inputs[idx])

CPU times: user 19.9 ms, sys: 4.48 ms, total: 24.4 ms
Wall time: 22.4 ms


array([[1.8110585e-04, 6.4528472e-04, 6.8695762e-04, ..., 7.9570833e-05,
        1.3486181e-04, 3.3463116e-03],
       [1.8110585e-04, 6.4528472e-04, 6.8695762e-04, ..., 7.9570833e-05,
        1.3486181e-04, 3.3463116e-03],
       [1.8110585e-04, 6.4528472e-04, 6.8695762e-04, ..., 7.9570833e-05,
        1.3486181e-04, 3.3463116e-03],
       ...,
       [1.8110585e-04, 6.4528472e-04, 6.8695762e-04, ..., 7.9570833e-05,
        1.3486181e-04, 3.3463116e-03],
       [1.8110585e-04, 6.4528472e-04, 6.8695762e-04, ..., 7.9570833e-05,
        1.3486181e-04, 3.3463116e-03],
       [1.8110585e-04, 6.4528472e-04, 6.8695762e-04, ..., 7.9570833e-05,
        1.3486181e-04, 3.3463116e-03]], dtype=float32)