# Naive Inference with TensorFlow 2

In this notebook we will run inference with TensorFlow 2, without the help of TF-TRT. In doing so we will establish baselines for image throughput and prediction accuracy which we can use as we optimze with TF-TRT.

Additionally, we will spend some time getting familiar with several helper functions we will use throughout this workshop that will allow us to perform common tasks easily so we can focus on the impact of using TF-TRT.

## Objectives

By the time you complete this notebook you should be able to:

- Use provided helper functions to load images, batch input, make and benchmark predictions, and display prediction information
- Obtain a baseline for naive TensorFlow 2 inference

## Imports

In [None]:
import tensorflow as tf
from tensorflow.python.saved_model import tag_constants

Throughout the workshop we will make extensive use of helper functions defined in `./lab_helpers.py`. Comments will be provided throughout to give context to their use, but at any time, feel free to use the JupyterLab file viewer on the left-hand side of the screen to open `lab_helpers.py` and view the helper functions in thier entirety.

In [None]:
from lab_helpers import (
    get_images, batch_input, predict_and_benchmark_throughput_from_saved, display_prediction_info
)

## Load and Save Model

Thoughout much of this workshop we will be using ResNetV2. Here we import the model from Keras.

In [None]:
from tensorflow.keras.applications.resnet_v2 import ResNet152V2

In [None]:
model = ResNet152V2(weights='imagenet')

When we benchmark our optimized TF-TRT models, they will be saved TensorFlow (not Keras) models. In order to have a fair comparison, here we save our Keras model as a TensorFlow model.

In [None]:
tf.saved_model.save(model, 'resnet_v2_152_saved_model')

## Create Batched Input

Using **batch inference** to send many images to the GPU at once promotes parallel processing and improve throughput.

### Get Images

The `get_images` helper function will use Keras to load the number of images specified, returning for each image the image itself in PIL format, and its file path, which we will need later to load and view the images from within these notebooks.

In [None]:
number_of_images = 32
images = get_images(number_of_images)

In [None]:
images[:1]

### Batch Input

The `batch_input` helper function takes a list of images with their paths, as returned by `get_images`, and returns a tensor with the the images preprocessed.

In [None]:
batched_input = batch_input(images)

In [None]:
type(batched_input)

In [None]:
batched_input.shape

## Get Baseline for Prediction Throughput and Accuracy

The following will serve as a baseline for prediction throughput and accuracy.

## Load Model

Here we load a previously-saved ResnetV2 model.

In [None]:
def load_tf_saved_model(input_saved_model_dir):

    print('Loading saved model {}...'.format(input_saved_model_dir))
    saved_model_loaded = tf.saved_model.load(input_saved_model_dir, tags=[tag_constants.SERVING])
    return saved_model_loaded

In [None]:
saved_model_loaded = load_tf_saved_model('resnet_v2_152_saved_model')

In [None]:
infer = saved_model_loaded.signatures['serving_default']

### Make Prediction and Get Throughput

Now we perform inference with the optimized graph, and after a warmup, time and calculate throughput.

The helper functoin `predict_and_benchmark_throughput_from_saved` will use the passed in model to perform predictions on the passed in batched input over a number of runs. It measures and reports throughput, as well as time for ranges of runs.

Because, due to GPU initialization operations, we do not want to profile against initial inference, we can set a number of warmup runs to perform prior to benchmarking.

`predict_and_benchmark_throughput_from_saved` returns the predictions for all images for all runs, after the warmup.

In [None]:
all_preds = predict_and_benchmark_throughput_from_saved(batched_input, infer, N_warmup_run=50, N_run=150)

**Make note of the *Throughput* value for this naive TensorFlow 2 inference.**

### Observe Accuracy

The helper function `display_prediction_info` will display the images along with their top predictions from a single run. You can **Right Click** on the image output and then choose **Enable Scrolling** to prevent the many displayed images from taking up the whole screen.

**NOTE:** We are not so concerned in this workshop about the accuracy of our predicions per se, only that they remain consistent as we optimize our models.

In [None]:
last_run_preds = all_preds[0]
display_prediction_info(last_run_preds, images, top=2)

## Restart Kernel

Before going to the next notebook, please execute the cell below to restart the kernel and clear GPU memory.

In [None]:
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

## Next

In the next notebook you will learn how TF-TRT optimizes saved models.