<a href="https://colab.research.google.com/github/PratikhyaManas/TensorRT/blob/master/Tensorflow_to_TensorRT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## What is TensorRT?

TensorRT is an optimization tool provided by NVIDIA that applies graph optimization and layer fusion, and finds the fastest implementation of a deep learning model. In other words, TensorRT will optimize our deep learning model so that we expect a faster inference time than the original model (before optimization), such as 5x faster or 2x faster. The bigger model we have, the bigger space for TensorRT to optimize the model. Furthermore, this TensorRT supports all NVIDIA GPU devices, such as 1080Ti, Titan XP for Desktop, and Jetson TX1, TX2 for embedded device.

# Mount Google Drive

In [0]:
# Mounting Google Drive
from google.colab import drive
drive.mount('/content/drive',force_remount=True)

Mounted at /content/drive


# Switch the current directory to the project folder of Google Drive

In [0]:
import os
project_path = '/content/drive/My Drive/ML_Datasets/nvidia'
os.chdir(project_path)

# Check the version of CUDA,TensorFlow

In [0]:
!nvcc --version
%tensorflow_version 1.x
import tensorflow as tf
print(tf.__version__)

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
TensorFlow is already loaded. Please restart the runtime to change versions.
1.15.2


In [0]:
# !wget -O nv-tensorrt-repo-ubuntu1804-cuda10.0-trt5.1.2.2-rc-20190227_1-1_amd64.deb https://www.dropbox.com/s/45pz13r4e8ip4bl/nv-tensorrt-repo-ubuntu1804-cuda10.0-trt5.1.2.2-rc-20190227_1-1_amd64.deb?dl=0

--2020-04-06 21:44:05--  https://www.dropbox.com/s/45pz13r4e8ip4bl/nv-tensorrt-repo-ubuntu1804-cuda10.0-trt5.1.2.2-rc-20190227_1-1_amd64.deb?dl=0
Resolving www.dropbox.com (www.dropbox.com)... 162.125.65.1, 2620:100:6021:1::a27d:4101
Connecting to www.dropbox.com (www.dropbox.com)|162.125.65.1|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /s/raw/45pz13r4e8ip4bl/nv-tensorrt-repo-ubuntu1804-cuda10.0-trt5.1.2.2-rc-20190227_1-1_amd64.deb [following]
--2020-04-06 21:44:05--  https://www.dropbox.com/s/raw/45pz13r4e8ip4bl/nv-tensorrt-repo-ubuntu1804-cuda10.0-trt5.1.2.2-rc-20190227_1-1_amd64.deb
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uca0d5b1bbfd2b57f00039b11f88.dl.dropboxusercontent.com/cd/0/inline/A1W4kdj9LrMc2dUHSCO3TmtbwUeqhmI5hmJ3ks545H-zErTQBqnop53vvFkGTMhfq2xNmQVPFFKMhjLRqhDjuBdvuRYCBx8EbF8ZMVlYALTt7A/file# [following]
--2020-04-06 21:44:05--  https://uca0d5b1bbf

# Installing the additional TensorRT packages

In [0]:
!dpkg -i nv-tensorrt-repo-ubuntu1804-cuda10.0-trt7.0.0.11-ga-20191216_1-1_amd64.deb
!apt-key add /var/nv-tensorrt-repo-cuda10.0-trt7.0.0.11-ga-20191216/7fa2af80.pub
!apt-get update
!apt-get install -y --no-install-recommends libnvinfer7=7.0.0-1+cuda10.0
!apt-get install -y --no-install-recommends libnvinfer-plugin7=7.0.0-1+cuda10.0
!apt-get install -y --no-install-recommends libnvparsers7=7.0.0-1+cuda10.0
!apt-get install -y --no-install-recommends libnvonnxparsers7=7.0.0-1+cuda10.0
!apt-get install -y --no-install-recommends libnvinfer-bin=7.0.0-1+cuda10.0
!apt-get install -y --no-install-recommends libnvinfer-dev=7.0.0-1+cuda10.0
!apt-get install -y --no-install-recommends libnvinfer-plugin-dev=7.0.0-1+cuda10.0
!apt-get install -y --no-install-recommends libnvparsers-dev=7.0.0-1+cuda10.0
!apt-get install -y --no-install-recommends libnvonnxparsers-dev=7.0.0-1+cuda10.0
!apt-get install -y --no-install-recommends libnvinfer-samples=7.0.0-1+cuda10.0
!apt-get install -y --no-install-recommends libnvinfer-doc=7.0.0-1+cuda10.0 
!apt-get install tensorrt
!apt-get install uff-converter-tf

(Reading database ... (Reading database ... 5%(Reading database ... 10%(Reading database ... 15%(Reading database ... 20%(Reading database ... 25%(Reading database ... 30%(Reading database ... 35%(Reading database ... 40%(Reading database ... 45%(Reading database ... 50%(Reading database ... 55%(Reading database ... 60%(Reading database ... 65%(Reading database ... 70%(Reading database ... 75%(Reading database ... 80%(Reading database ... 85%(Reading database ... 90%(Reading database ... 95%(Reading database ... 100%(Reading database ... 144907 files and directories currently installed.)
Preparing to unpack nv-tensorrt-repo-ubuntu1804-cuda10.0-trt7.0.0.11-ga-20191216_1-1_amd64.deb ...
Unpacking nv-tensorrt-repo-ubuntu1804-cuda10.0-trt7.0.0.11-ga-20191216 (1-1) over (1-1) ...
Setting up nv-tensorrt-repo-ubuntu1804-cuda10.0-trt7.0.0.11-ga-20191216 (1-1) ...
OK
Get:1 file:/var/nv-tensorrt-repo-cuda10.0-trt5.1.2.2-rc-20190227  InRelease
Ign:1 file:/var/nv-tensorrt-repo

# Check the TensorRT installation

In [0]:
!dpkg -l | grep TensorRT

ii  graphsurgeon-tf                                              7.0.0-1+cuda10.0                                  amd64        GraphSurgeon for TensorRT package
ii  libnvinfer-bin                                               7.0.0-1+cuda10.0                                  amd64        TensorRT binaries
ii  libnvinfer-dev                                               7.0.0-1+cuda10.0                                  amd64        TensorRT development libraries and headers
ii  libnvinfer-doc                                               7.0.0-1+cuda10.0                                  all          TensorRT documentation
ii  libnvinfer-plugin-dev                                        7.0.0-1+cuda10.0                                  amd64        TensorRT plugin libraries
ii  libnvinfer-plugin7                                           7.0.0-1+cuda10.0                                  amd64        TensorRT plugin libraries
ii  libnvinfer-samples                                        

# Importing the Required Libraries and Packages

In [0]:
# Import the needed libraries
import tensorflow.contrib.tensorrt as trt
from tensorflow.python.platform import gfile

# Convert Tensorflow Model to Frozen Model

In [0]:
# has to be use this setting to make a session for TensorRT optimization
with tf.Session(config=tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.50))) as sess:
    # Import the meta graph of the tensorflow model
    saver = tf.train.import_meta_graph("/content/drive/My Drive/ML_Datasets/nvidia/model/tensorflow/small/model_small.meta")
    # then, restore the weights to the meta graph
    saver.restore(sess, "/content/drive/My Drive/ML_Datasets/nvidia/model/tensorflow/small/model_small")
    
    # specify which tensor output you want to obtain 
    # (correspond to prediction result)
    your_outputs = ["output_tensor/Softmax"]
    
    # convert to frozen model
    frozen_graph = tf.graph_util.convert_variables_to_constants(
        sess, # session
        tf.get_default_graph().as_graph_def(),# graph+weight from the session
        output_node_names=your_outputs)
    #write the TensorRT model to be used later for inference
    with gfile.FastGFile("/content/drive/My Drive/ML_Datasets/nvidia/model/frozen_model.pb", 'wb') as f:
        f.write(frozen_graph.SerializeToString())
    print("Frozen model is successfully stored!")

INFO:tensorflow:Restoring parameters from /content/drive/My Drive/ML_Datasets/nvidia/model/tensorflow/small/model_small
INFO:tensorflow:Froze 10 variables.
INFO:tensorflow:Converted 10 variables to const ops.
Frozen model is successfully stored!


# Optimize the frozen model to TensorRT graph

In [0]:
# convert (optimize) frozen model to TensorRT model
trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,# frozen model
    outputs=your_outputs,
    max_batch_size=2,# specify your max batch size
    max_workspace_size_bytes=2*(10**9),# specify the max workspace
    precision_mode="FP32") # precision, can be "FP32" (32 floating point precision) or "FP16"

#write the TensorRT model to be used later for inference
with gfile.FastGFile("/content/drive/My Drive/ML_Datasets/nvidia/model/TensorRT_model.pb", 'wb') as f:
    f.write(trt_graph.SerializeToString())
print("TensorRT model is successfully stored!")

INFO:tensorflow:Linked TensorRT version: (0, 0, 0)
INFO:tensorflow:Loaded TensorRT version: (0, 0, 0)
INFO:tensorflow:Running against TensorRT version 0.0.0
TensorRT model is successfully stored!


# Count how many nodes/operations before and after optimization

In [0]:
# check how many ops of the original frozen model
all_nodes = len([1 for n in frozen_graph.node])
print("numb. of all_nodes in frozen graph:", all_nodes)

# check how many ops that is converted to TensorRT engine
trt_engine_nodes = len([1 for n in trt_graph.node if str(n.op) == 'TRTEngineOp'])
print("numb. of trt_engine_nodes in TensorRT graph:", trt_engine_nodes)
all_nodes = len([1 for n in trt_graph.node])
print("numb. of all_nodes in TensorRT graph:", all_nodes)

numb. of all_nodes in frozen graph: 46
numb. of trt_engine_nodes in TensorRT graph: 0
numb. of all_nodes in TensorRT graph: 41
