# Torch-ONNX-TensorRT Workflow example

This Jupyter notebook serves as a guide to understand how the workflow implemented in the experiments of this work functions. The workflow itself is described in the following image:

![Workflow.](/outputs/img_readme/TensorRT_workflow.pdf)

In this case, we will use the described workflow in a specific example: image classification using the ImageNet-1k dataset with the MobileNetV2 model.

## Base model

To obtain the base model, we use the pre-trained models on ImageNet-1k provided by PyTorch:

In [None]:
import torch
model = torch.hub.load('pytorch/vision:v0.15.2', "mobilenet_v2", weights=f'MobileNet_V2_Weights.DEFAULT')

We define some constants that will be used by our model throughout the workflow:

In [None]:
# CONSTANTES
BATCH_SIZE = 1
C = 3 # number of canals of the input image
H = 224 # height of the input image
W = 224 # with of the input image
NETWORK = 'mobilenet' # mobiletv2
current_directory = os.getcwd()

## ONNX: Model Conversion

To convert the model from `.pt` to `.onnx`, we use the code described in `onnx_transform.py` as follows:

In [None]:
%run onnx_transform.py --weights weights/best.pth --pretrained --network $NETWORK --input_shape $BATCH_SIZE $C $H $W

Esto ha creado el modelo ONNX, ahora queda crear la TensorRT Application y optimizar

## TensorRT Application

Para crear la TensorRT Application y optimizar el modelo en formato onnx, usamos el codigo en `build_trt.py`, en este se llama a el codigo `utils/engine.py` el cual describe la TesorRT Application, y en este se pueden hacer mayores cambios para jugar con los tipos de optimizaciones

### TRT fp32

In [None]:
%run ./build_trt.py --weights weights/best.onnx  --fp32 --input_shape $BATCH_SIZE $C $H $W --engine_name best_fp32.engine

### TRT fp16

In [None]:
%run build_trt.py --weights weights/best.onnx  --fp16 --input_shape $BATCH_SIZE $C $H $W --engine_name best_fp16.engine

### TRT int8

In [None]:
import subprocess
subprocess.Popen('rm -r outputs/cache > /dev/null 2>&1', shell=True)
%run build_trt.py --weights weights/best.onnx  --int8 --input_shape $BATCH_SIZE $C $H $W --engine_name best_int8.engine

Para optimizaciones más especificas, revisar la wiki... (proximamente existira una wiki)

Los Optimized Model se encuentran guardads en la caperta `weights`.

## Running Phase: Inference using optimized model

Primero no aseguramos que el sistema cuenta con gpu capaz de usar CUDA para puego cargar los modelos optimizados y luego deserealizarlos:

In [None]:
import os
from utils.engine import TRTModule

gpu_available = torch.cuda.is_available()
if not gpu_available:
    print('CUDA is not available.')
else:
    print('CUDA is available.')

device = torch.device("cuda:0" if gpu_available else "cpu")

engine_path_1 = os.path.join(current_directory,"weights/best_fp32.engine")
engine_path_2 = os.path.join(current_directory,"weights/best_fp16.engine")
engine_path_3 = os.path.join(current_directory,"weights/best_int8.engine")
Engine_fp32 = TRTModule(engine_path_1,device)
Engine_fp16 = TRTModule(engine_path_2,device)
Engine_int8 = TRTModule(engine_path_3,device)
Engine_fp32.set_desired(['outputs'])
Engine_fp16.set_desired(['outputs'])
Engine_int8.set_desired(['outputs'])

cargamos el dataset de validación de imagenet-1k, disponible en (añadir algun enlace de descarga):

In [None]:
from utils.data_loader import val_data_loader
val_loader = val_data_loader(os.path.join(current_directory,'datasets/dataset_val/val'), batch_size=BATCH_SIZE, workers=4, pin_memory=False)

Validamos los modelos usando la funcion validate:

In [None]:
from utils.helper import AverageMeter, accuracy
import time
import torch.nn as nn

def validate(model_version, val_loader, model, criterion=nn.CrossEntropyLoss().to(device)):
    batch_time_all = AverageMeter()
    losses = AverageMeter()
    top1 = AverageMeter()
    top5 = AverageMeter()
    model.to(device)
    model.eval()

    # Calculate 10% of total batches
    warmup_batches = int(0.1 * len(val_loader))
    
    # Initialize the maximum and minimum processing time after warm-up
    max_time_all = 0
    min_time_all = float('inf')

    num_batches_to_process = int(1 * len(val_loader))

    for i, (input, target) in enumerate(val_loader):
        if i >= num_batches_to_process:
            break

        target = target.to(device)
        start_all = time.time() # start time, moving data to gpu
        input = input.to(device)
        
        with torch.no_grad():
            output = model(input)
            output_cpu = output.cpu() # needed to get the time from gpu to cpu
            all_time = (time.time() - start_all) * 1000  # Convert to milliseconds / time when the result pass to cpu again 
            loss = criterion(output, target)

        # measure accuracy and record loss
        prec1, prec5 = accuracy(output.data, target, topk=(1, 5))
        losses.update(loss.item(), input.size(0))
        top1.update(prec1[0], input.size(0))
        top5.update(prec5[0], input.size(0))

        # measure elapsed time in milliseconds and ignore first 10% batches
        if i >= warmup_batches:
            batch_time_all.update(all_time)
            max_time_all = max(max_time_all, all_time)
            min_time_all = min(min_time_all, all_time)
       
    print("|  Model          | Latency avg (ms)| Latency max (ms) | accuracy (Prec@1) (%)|accuracy (Prec@5) (%)|")
    print("|-----------------|-----------------|------------------|----------------------|---------------------|")
    print("| {:<15} | {:<15.1f} | {:<15.1f} |  {:<20.2f} | {:<20.2f} | ".format(
        model_version,
        batch_time_all.avg, max_time_all,
        top1.avg, top5.avg))
    return

### Base Model Validation

In [None]:
validate('Base Model', val_loader, model)

### TRT fp32 Validation

In [None]:
validate('TRT fp32', val_loader, Engine_fp32)

### TRT fp16 Validation

In [None]:
validate('TRT fp32', val_loader, Engine_fp16)

### TRT int8 Validation

In [None]:
validate('TRT fp32', val_loader, Engine_int8)