# TF-TRT Keras Retinanet Detection Example:

In this notebook, we are going to optimize a Retinanet detection model from the official Keras examples! 

You can find the implementation here: https://keras.io/examples/vision/retinanet/

In general, detection models can be tricky to optimize because they tend to require a lot of custom logic for sub-tasks such as region proposal, output decoding, or non-maximum suppression. This makes them a good demonstration of TF-TRT's capabilities - It does a great job of optimizing a large part of the network while leaving the custom logic untouched.

Let's make sure our GPUs are properly configured and visible:

In [1]:
!nvidia-smi

Fri Jan 29 23:17:01 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-DGXS...  On   | 00000000:07:00.0 Off |                    0 |
| N/A   42C    P0    37W / 300W |    125MiB / 16155MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-DGXS...  On   | 00000000:08:00.0 Off |                    0 |
| N/A   43C    P0    38W / 300W |      6MiB / 16158MiB |      0%      Default |
|       

We will also need matplotlib to run the model. If you do not have it, run:

In [2]:
!pip install matplotlib

You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.[0m


Remember to sucessfully deploy a TensorRT model, you have to make __five key decisions__:

1. __What format should I save my model in?__
2. __What batch size(s) am I running inference at?__
3. __What precision am I running inference at?__
4. __What TensorRT path am I using to convert my model?__
5. __What runtime am I targeting?__

Let's give it a shot!

## 1. What format should I save my model in?

We will work with one of the Keras example RetinaNet implementations. We can download the implementation code for the specific version of it required here:

In [3]:
!wget -O retinanet.py https://raw.githubusercontent.com/keras-team/keras-io/cd6201c1bfa37625f503f51e8fd3c572666770e4/examples/vision/retinanet.py

--2021-01-29 23:17:05--  https://raw.githubusercontent.com/keras-team/keras-io/cd6201c1bfa37625f503f51e8fd3c572666770e4/examples/vision/retinanet.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.40.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.40.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 35046 (34K) [text/plain]
Saving to: ‘retinanet.py’


2021-01-29 23:17:05 (20.1 MB/s) - ‘retinanet.py’ saved [35046/35046]



The code has some unnecessary setup steps, so we will pull out just the model implementation itself using sed (you can check the end result in the [retinanet_model.py](./retinanet_model.py) file)

In [4]:
!sed -n '1,40 p; 71,820 p' retinanet.py > retinanet_model.py

In [5]:
!mkdir -p tmp_savedmodels

We perform some imports and setup:

In [6]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

In [7]:
img_size = (224, 224)
num_classes = 10

We can import our necessary RetinaNet functions from the example and initialize our detection model:

In [8]:
from retinanet_model import RetinaNet, DecodePredictions, get_backbone

resnet50_backbone = get_backbone()
model = RetinaNet(num_classes, resnet50_backbone)

image = tf.keras.Input(shape=[None, None, 3], name="image")
predictions = model(image, training=False)
detections = DecodePredictions(confidence_threshold=0.5)(image, predictions)
inference_model = tf.keras.Model(inputs=image, outputs=detections)

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5


Finally, we save our model in SavedModel format!

In [9]:
model_dir = "tmp_savedmodels/detect_model"
model.save(model_dir) 

Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
INFO:tensorflow:Assets written to: tmp_savedmodels/detect_model/assets


## 2. What batch size(s) am I running inference at?

We will create a dummy batch of size 32:

In [10]:
import numpy as np

dummy_input = np.zeros((32, img_size[0], img_size[1], 3))

In [11]:
inference_model.predict(dummy_input)

CombinedNonMaxSuppression(nmsed_boxes=array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        ...,
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        ...,
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        ...,
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       ...,

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        ...,
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        ...,
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        ...,
   

## 3. What precision am I running inference at?

We will stick with the same FP32 precision used during training:

In [12]:
PRECISION = "FP32"

## 4. What TensorRT path am I using to convert my model?

We will use our example TF-TRT based ModelOptimizer wrapper:

In [13]:
from helper import ModelOptimizer

model_opt = ModelOptimizer(model_dir)

Convert to our target precision, saving the result in a new SavedModel:

In [14]:
opt_trt = model_opt.convert(model_dir+'_'+PRECISION, precision=PRECISION)
print("conversion complete! prediction shape:", opt_trt.predict(dummy_input).shape)

INFO:tensorflow:Linked TensorRT version: (7, 2, 1)
INFO:tensorflow:Loaded TensorRT version: (7, 2, 2)
INFO:tensorflow:Loaded TensorRT 7.2.2 and linked TensorFlow against TensorRT 7.2.1. This is supported because TensorRT  minor/patch upgrades are backward compatible
INFO:tensorflow:Could not find TRTEngineOp_0_2 in TF-TRT cache. This can happen if build() is not called, which means TensorRT engines will be built and cached at runtime.
INFO:tensorflow:Could not find TRTEngineOp_0_0 in TF-TRT cache. This can happen if build() is not called, which means TensorRT engines will be built and cached at runtime.
INFO:tensorflow:Could not find TRTEngineOp_0_1 in TF-TRT cache. This can happen if build() is not called, which means TensorRT engines will be built and cached at runtime.
INFO:tensorflow:Could not find TRTEngineOp_0_3 in TF-TRT cache. This can happen if build() is not called, which means TensorRT engines will be built and cached at runtime.
INFO:tensorflow:Assets written to: tmp_savedm

## 5. What TensorRT runtime am I targeting?

We will stick to our TF-TRT/Tensorflow runtime:

In [15]:
print("Warming up...")

print(model.predict(dummy_input).shape)
print(opt_trt.predict(dummy_input).shape)

print("Done warming up!")

Warming up...
(32, 9441, 14)
(32, 9441, 14)
Done warming up!


## Performance Comparisons:

In [16]:
%%timeit

preds = model.predict(dummy_input)

109 ms ± 5.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [17]:
%%timeit

preds = opt_trt.predict(dummy_input)

45.1 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
