# Convert to TF-TRT Float32

In this notebook we will demonstrate how to convert a TensorFlow saved model into a TF-TRT optimized graph using Float32 precision. We will use the optimized graph to make predictions and will benchmark its performance. In the next notebook, you will be asked to make your first optimized TF-TRT graph using Float16 precision.

## Objectives

By the end of this notebook you will be able to:

- Convert a saved TensorFlow model into an optimized TF-TRT graph with Float32 precision

## Imports

As of TensorFlow 2.0, TRT is integrated into Tensorflow under the `tensorflow.python.compiler` module:

In [None]:
from tensorflow.python.compiler.tensorrt import trt_convert as trt

As in the previous notebook, we will rely on several helper functions. If needed, please use the file menu on the left hand side of the JupyterLab environment to open and inspect `./lab_helpers.py` for more details about the helper functions.

In [None]:
from lab_helpers import (
    get_images, batch_input, load_tf_saved_model,
    predict_and_benchmark_throughput_from_saved, display_prediction_info
)

## Create Batched Input

As in the previous notebook, we will create a batched input of many images to send to the GPU for inference at once.

In [None]:
number_of_images = 32
images = get_images(number_of_images)

In [None]:
batched_input = batch_input(images)

In [None]:
batched_input.shape

## Make Conversion

`convert_to_trt_graph_and_save` expects the directory of a saved model, which it will convert to an optimized TF-TRT graph with Float32 precision, and then save. Please read the comments for this function.

In [None]:
def convert_to_trt_graph_and_save(input_saved_model_dir='resnet_v2_152_saved_model'):
    
    precision_mode = trt.TrtPrecisionMode.FP32
    converted_save_suffix = '_TFTRT_FP32'
        
    output_saved_model_dir = input_saved_model_dir + converted_save_suffix
    
    # Here we overwrite the default conversion parameters to suit our needs.
    conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS._replace(
        precision_mode=precision_mode, 
        max_workspace_size_bytes=8000000000
    )

    # Trt.GraphConverterV2 takes the saved model and conversion parameters, and returns a TF-TRT converter.
    converter = trt.TrtGraphConverterV2(
        input_saved_model_dir=input_saved_model_dir,
        conversion_params=conversion_params
    )

    print('Converting {} to TF-TRT graph precision mode {}...'.format(input_saved_model_dir, 'float32'))
    
    # converter.convert() performs the optimization.
    converter.convert()

    print('Saving converted model to {}...'.format(output_saved_model_dir))
    
    # converter.save will save the model as a TF (not Keras) saved-model at the specified directory.
    converter.save(output_saved_model_dir=output_saved_model_dir)
    
    print('Complete')

In [None]:
convert_to_trt_graph_and_save(input_saved_model_dir='resnet_v2_152_saved_model') # Takes about a minute

## Benchmark TF-TRT Float32

Here we load the optimized TF model. Note that this is a TF saved model, as opposed to a Keras saved model. If you wish, refer to `lab_helpers.py` for details on the helper functions.

In [None]:
infer = load_tf_saved_model('resnet_v2_152_saved_model_TFTRT_FP32')

Now we perform inference with the optimized graph, and after a warmup, time and calculate throughput.

In [None]:
all_preds = predict_and_benchmark_throughput_from_saved(batched_input, infer, N_warmup_run=50, N_run=150)

**Compare *Throughput* to the naive TF 2 inference perfomed earlier.**

Run this cell to view predictions, which you can use to compare to the naive TF 2 run. You should see very little difference in the accuracy of the predicitons.

In [None]:
last_run_preds = all_preds[0]
display_prediction_info(last_run_preds, images)

## Restart Kernel

Please execute the cell below to restart the kernel and clear GPU memory.

In [None]:
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

## Next

In the next notebook, you will be asked to make your first optimized TF-TRT graph using Float16 precision.