# CNN inference with eGPU - part 2

Today we'll run neural networks inference on NVIDIA Jetson Nano eGPUs.

You can find out more here: https://www.nvidia.com/pl-pl/autonomous-machines/embedded-systems/jetson-nano-developer-kit/

First, install and import nessessery libraries.

In [1]:
!pip install tensorflow-datasets
import tensorflow as tf



Before we start, chech if the GPU is available. In this part of the laboratory we don't need GPU - if you use Colab - change runtime type! It you use local PC - google how to disable GPU for TensorFlow

In [2]:
print(tf.config.list_physical_devices('GPU'))

[]


Today, we are going to try a bit bigger model and harder dataset. We'll try running Imagenet classification with ResNet network.

More about the dataset: https://www.image-net.org/

More about the network: https://keras.io/api/applications/resnet/

It would take some time to train this model, so we'll just use ready, pretrained neural network. Use `tf.keras.applications.ResNet50()` function to create `CNN` instance (just study the link above). Use `include_top=True`, `weights="imagenet"` and `classes=1000`. Based on documentation answer the question - what is model's input size and what is it's output size?



In [3]:
# Today, we are going to try a bit bigger model and harder dataset. We'll try running Imagenet classification with ResNet network.
# More about the dataset: https://www.image-net.org/
# More about the network: https://keras.io/api/applications/resnet/
# It would take some time to train this model, so we'll just use ready, pretrained neural network. Use `tf.keras.applications.ResNet50()` function to create `CNN` instance (just study the link above). Use `include_top=True`, `weights="imagenet"` and `classes=1000`. Based on documentation answer the question - what is model's input size and what is it's output size?
model = tf.keras.applications.ResNet50(include_top=True, weights="imagenet", classes=1000)

Now, let's download the dataset with `tfds` module. Use `tfds.load()` function with `imagenet_v2` dataset name (this is the dataset used for MobileNet training), `split='test[70%:]` (we need just 3000 samples), and `shuffle_files=True` and `as_supervised=True` parameters.

In [4]:
# Now, let's download the dataset with `tfds` module. Use `tfds.load()` function with `imagenet_v2` dataset name (this is the dataset used for MobileNet training), `split='test[70%:]` (we need just 3000 samples), and `shuffle_files=True` and `as_supervised=True` parameters.
import tensorflow_datasets as tfds
ds = tfds.load ('imagenet_v2', split='test[70%:]', shuffle_files=True, as_supervised=True)

[1mDownloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to C:\Users\krzys\tensorflow_datasets\imagenet_v2\matched-frequency\3.0.0...[0m


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Extraction completed...: 0 file [00:00, ? file/s]

DownloadError: Failed to get url https://s3-us-west-2.amazonaws.com/imagenetv2public/imagenetv2-matched-frequency.tar.gz. HTTP code: 403.

Perfect. Now, we have both pretrained model and test dataset ready. We can benchmark the model. Implement benchmarking similarly as in previous lab, but:
- calulate not only throughput, and TOP1 accuracy, but also TOP5 accuracy (is correct label found in 5 classes with highest prediction probability?). The `((-preds[0]).argsort()[:5])` function may prove useful here.
- you can loop through dataset with `for image, label in tfds.as_numpy(ds):`
- each image should be resized to model input size and then reshaped to `(1,input_size, input_size, nr_of_channels)` before `predict` function. Just before model's input use `preprocess_input()`

In [None]:
from tensorflow.keras.applications.mobilenet import preprocess_input, decode_predictions
import time
import numpy as np
import cv2
from tqdm import tqdm

input_size = 224
nr_of_channels = 3

N_warmup_run = 50
N_run = 500

# Initialize variables to keep track of results
start_time = time.time()
correct_top1 = 0
correct_top5 = 0

# Use tqdm to add a progress bar
for image, label in tqdm(tfds.as_numpy(ds), total=N_run + N_warmup_run):
    img = cv2.resize(image, (input_size, input_size))
    img = preprocess_input(img)
    img = np.reshape(img, (1, input_size, input_size, nr_of_channels))
    preds = model.predict(img,verbose=0)
    preds_decoded = decode_predictions(preds, top=5)[0]
    top1_label = preds_decoded[0][1]
    top5_labels = [label for (_, label, _) in preds_decoded]

    # Warm-up run
    if N_warmup_run > 0:
        N_warmup_run -= 1
    else:
        if label == top1_label:
            correct_top1 += 1
        if label in top5_labels:
            correct_top5 += 1
        N_run -= 1

end_time = time.time()

# Calculate throughput
elapsed_time = end_time - start_time
throughput = N_run / elapsed_time

# Calculate TOP1 and TOP5 accuracy
top1_accuracy = correct_top1 / N_run
top5_accuracy = correct_top5 / N_run

print("Throughput: {:.2f} samples per second".format(throughput))
print("TOP1 Accuracy: {:.2f}".format(top1_accuracy))
print("TOP5 Accuracy: {:.2f}".format(top5_accuracy))


We got familiar with ResNet50 model and ImageNet dataset.
Now we can carry on with Jetson Nano. Show this part of exercise to the teacher and ask for Jeston Nano board.

**Prepare Jetson Nano**
- First, take Jetson Nano board, connect it to power source, internet, monitor, mouse and keyboard.
- Log in to Jetson and finish OS instalation (the boards were not used yet).
- Open terminal and add cuda to PATH `export PATH=$PATH:/usr/local/cuda-10/bin`. Verify CUDA with `nvcc --version`.
- Connect USB camera to Jetson board.

Well, we have our Jetson Nano ready to go! We could load our Resnet50 model with Keras, convert and build it with TensorRT and run inference (remember, that `converter.build()` should always be run on final device, so the model would be optimazed for particular hardware).

However, this eGPU board however has only 2GB RAM (shared between CPU and GPU). This would take some time (it's possible, especially with more advanced Jetson boards).

Today, to save some time, we'll use NVIDIA Jetson Inference tools for testing! We'll compare how does this little eGPU with limited memory does in comparison with our CPU-based benchmark.

**Run camera DEMO**
- clone https://github.com/dusty-nv/jetson-inference repository on Jetson
- run Jetson-inference docker with `./docker/run.sh`
- inside the docker, we should be able to test inference for supported models (ResNet50 is supported) - change directory to `./build/aarch64/bin/`
- run `./imagenet-camera.py --help` and study the arguments. We want to perform the inference for ResNet50 model, with TOP5 prediction displayed for our USB camera (it's source should be visible with `ls /dev/video*`)
- run `./imagenet-camera.py` with correct parameters. Study the outputs. What is the model precision in converted TensorRT model? Why? What is FPS for this demo?
- try to direct camera at some objects that can be correctly classified with ImageNet-pretrained network

**Extention exercise**

Compare JetsonNano performance with your GPU (or Colab's GPU). Enable GPU in Colab (or locally), use TensorRT to convert CNN to FP16, build the engine and benchmark the model. What is the difference in TOP1 and TOP5 accuracy after TensorRT optimization? What is the throughput for local GPU?