## Preparing model towards deployment

This is carried out after developments & validation phase

In [None]:
# Necessary imports

import torch
from torchinfo import summary
from qai_hub_models.models.ffnet_40s import Model

In [2]:
# getting summary
model = Model.from_pretrained()
input_shape = (1, 3, 1024, 2048)
stats = summary(model,
  input_size=input_shape,
  col_names=["num_params", "mult_adds"]
)
print(stats)

Downloading data at https://github.com/quic/aimet-model-zoo/releases/download/torch_segmentation_ffnet/ffnet40S_dBBB_cityscapes_state_dict_quarts.pth to /root/.qaihm/models/ffnet/v1/ffnet40S/ffnet40S_dBBB_cityscapes_state_dict_quarts.pth... 

100%|██████████| 55.8M/55.8M [00:01<00:00, 50.1MB/s]


Done
cityscapes_segmentation requires repository https://github.com/Qualcomm-AI-research/FFNet.git . Ok to clone? [Y/n] y
Cloning https://github.com/Qualcomm-AI-research/FFNet.git to /root/.qaihm/models/cityscapes_segmentation/v2/Qualcomm-AI-research_FFNet_git...
Done




Loading pretrained model state dict from /root/.qaihm/models/ffnet/v1/ffnet40S/ffnet40S_dBBB_cityscapes_state_dict_quarts.pth
Initializing ffnnet40S_dBBB_mobile weights
Layer (type:depth-idx)                                       Param #                   Mult-Adds
FFNet40S                                                     --                        --
├─FFNet: 1-1                                                 --                        --
│    └─ResNetS: 2-1                                          --                        --
│    │    └─Conv2d: 3-1                                      864                       452,984,832
│    │    └─BatchNorm2d: 3-2                                 64                        64
│    │    └─ReLU: 3-3                                        --                        --
│    │    └─Conv2d: 3-4                                      18,432                    2,415,919,104
│    │    └─BatchNorm2d: 3-5                                 128                    

1. Capturing model graph

  **PyTorch JIT** basically reduces python overhead and fuses consecutive operations (like ReLU) to corresponding previous layers so as to come up with an graph for efficient computation.

In [3]:
model.to('cpu')
example_inputs = torch.rand(input_shape)
traced_model = torch.jit.trace(model, example_inputs)
traced_model

FFNet40S(
  original_name=FFNet40S
  (model): FFNet(
    original_name=FFNet
    (backbone_model): ResNetS(
      original_name=ResNetS
      (conv0): Conv2d(original_name=Conv2d)
      (bn0): BatchNorm2d(original_name=BatchNorm2d)
      (relu0): ReLU(original_name=ReLU)
      (conv1): Conv2d(original_name=Conv2d)
      (bn1): BatchNorm2d(original_name=BatchNorm2d)
      (relu1): ReLU(original_name=ReLU)
      (layer1): Sequential(
        original_name=Sequential
        (0): BasicBlock(
          original_name=BasicBlock
          (conv1): Conv2d(original_name=Conv2d)
          (bn1): BatchNorm2d(original_name=BatchNorm2d)
          (conv2): Conv2d(original_name=Conv2d)
          (bn2): BatchNorm2d(original_name=BatchNorm2d)
          (relu): ReLU(original_name=ReLU)
          (downsample): Sequential(
            original_name=Sequential
            (0): Conv2d(original_name=Conv2d)
            (1): BatchNorm2d(original_name=BatchNorm2d)
          )
        )
        (1): BasicBlock

Follow up steps are :

2. Compile traced_model for target device with target runtime environment (eg. **ONNX** for Windows, **Tflite** for Android, **Qualcomm AI Engine** for embedded devices)
3. Profile performance in terms of latency/fps on device and on target compute (like **CPU** for general purpose computations, **GPU** for parallel computations & **NPU** for efficient neural network computations but less flexible than CPU & GPU).
4. Perform on device validation. This involves inferencing on test data on target device and comparing predictions with that of predictions in development environment. We usually monitor the delta between on device prediction and development PC prediction which is called **Peak Signal To Noise Ratio (PSNR)**
5. Download model and deploy on target device