# Fast-Depth Estimation - Quantization for IMX500

[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/imx500_notebooks/pytorch/pytorch_fastdepth_for_imx500.ipynb)

## Overview

In this tutorial, we will illustrate a basic and quick process of preparing a pre-trained model for deployment using MCT. Specifically, we will demonstrate how to download a pre-trained pytorch fast-depth model, compress it, and make it deployment-ready using MCT's post-training quantization techniques.

We will use an existing pre-trained Fast-Depth model based on [Fast-Depth](https://github.com/dwofk/fast-depth). We will quantize the model using MCT post training quantization technique and visualize some samples of the floating point model and the quantized model.


## Setup
### Install the relevant packages

In [1]:
import torch
!pip install -q torch
!pip install onnx
!pip install datasets
!pip install matplotlib
!pip install 'huggingface-hub>=0.21.0'



Install MCT (if it’s not already installed). Additionally, in order to use all the necessary utility functions for this tutorial, we also copy [MCT tutorials folder](https://github.com/sony/model_optimization/tree/main/tutorials) and add it to the system path.

In [2]:
import importlib
import sys

if not importlib.util.find_spec('model_compression_toolkit'):
    !pip install model_compression_toolkit
!git clone https://github.com/sony/model_optimization.git temp_mct && mv temp_mct/tutorials . && \rm -rf temp_mct
sys.path.insert(0,"tutorials")

Cloning into 'temp_mct'...
remote: Enumerating objects: 25277, done.[K
remote: Counting objects: 100% (4416/4416), done.[K
remote: Compressing objects: 100% (965/965), done.[K
remote: Total 25277 (delta 3791), reused 3716 (delta 3451), pack-reused 20861 (from 1)[K
Receiving objects: 100% (25277/25277), 11.00 MiB | 13.81 MiB/s, done.
Resolving deltas: 100% (19598/19598), done.
Updating files: 100% (1247/1247), done.
mv: cannot move 'temp_mct/tutorials' to './tutorials': Directory not empty


## Download a Pre-Trained Model 

We begin by downloading a pre-trained Fast-Depth model. This implemetation is based on [Pytorch Fast-Depth](https://github.com/dwofk/fast-depth). 

In [3]:
from tutorials.mct_model_garden.models_pytorch.fastdepth.fastdepth import FastDepth
from model_compression_toolkit.core.pytorch.utils import get_working_device
model = FastDepth.from_pretrained("SSI-DNN/pytorch_fastdepth_224x224")
model.eval()

# Move to device
device = get_working_device()
model.to(device)

2024-10-10 17:08:49.112701: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-10 17:08:49.112765: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-10 17:08:49.407355: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-10 17:08:49.958557: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-10 17:09:11.420429: I external/local_xla/xla/

FastDepth(
  (conv0): Sequential(
    (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU6(inplace=True)
  )
  (conv1): Sequential(
    (0): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=16, bias=False)
    (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU6(inplace=True)
    (3): Conv2d(16, 56, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (4): BatchNorm2d(56, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU6(inplace=True)
  )
  (conv2): Sequential(
    (0): Conv2d(56, 56, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=56, bias=False)
    (1): BatchNorm2d(56, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU6(inplace=True)
    (3): Conv2d(56, 88, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (4): BatchNo

## Quantization

### Post training quantization (PTQ) using Model Compression Toolkit (MCT)

Now, we are all set to use MCT's post-training quantization. To begin, we'll use a representative dataset of lsun-bedrooms and proceed with the model quantization. We'll calibrate the model using 80 representative images, divided into 20 iterations of 'batch_size' images each. 

### Representitive Dataset

In [4]:
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms
from datasets import load_dataset
from typing import Iterator, Tuple, List

BATCH_SIZE = 4
n_iters = 20

class ValDataset(Dataset):
    def __init__(self, dataset):
        super(ValDataset, self).__init__()
        self.dataset = dataset
        self.val_transform = transforms.Compose([
            transforms.Resize((224, 224)),
            transforms.ToTensor()])

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, index):
        img = self.dataset[index]['image']
        tensor = self.val_transform(img)
        return tensor

dataset = load_dataset("pcuenq/lsun-bedrooms",split="test")
val_dataset = ValDataset(dataset)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=4)

# Define representative dataset generator
def get_representative_dataset(n_iter: int, dataset_loader: Iterator[Tuple]):
    """
    This function creates a representative dataset generator. The generator yields numpy
        arrays of batches of shape: [Batch, H, W ,C].
    Args:
        n_iter: number of iterations for MCT to calibrate on
        dataset_loader: iterator object of dataset loader
    Returns:
        A representative dataset generator
    """       
    def representative_dataset() -> Iterator[List]:
        ds_iter = iter(dataset_loader)
        for _ in range(n_iter):
            yield [next(ds_iter)]

    return representative_dataset

# Get representative dataset generator
representative_dataset_gen = get_representative_dataset(n_iter=n_iters, dataset_loader=val_loader)


### Post-Training Quantization (PTQ)

In [5]:
import model_compression_toolkit as mct

# Set IMX500 TPC
tpc = mct.get_target_platform_capabilities(fw_name="pytorch",
                                           target_platform_name='imx500',
                                           target_platform_version='v3')

# Perform post training quantization
quant_model, _ = mct.ptq.pytorch_post_training_quantization(in_module=model,
                                                            representative_data_gen=representative_dataset_gen,
                                                            target_platform_capabilities=tpc)


print('Quantized model is ready!')

Traceback (most recent call last):
  File "/data/projects/swat/envs/eladco/conda_mct/lib/python3.11/site-packages/torch/fx/passes/shape_prop.py", line 153, in run_node
    result = super().run_node(n)
             ^^^^^^^^^^^^^^^^^^^
  File "/data/projects/swat/envs/eladco/conda_mct/lib/python3.11/site-packages/torch/fx/interpreter.py", line 203, in run_node
    return getattr(self, n.op)(n.target, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/projects/swat/envs/eladco/conda_mct/lib/python3.11/site-packages/torch/fx/interpreter.py", line 320, in call_module
    return submod(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/projects/swat/envs/eladco/conda_mct/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/projects/swat/envs/eladco/conda_mct/lib/python3.11/site-packages/torch/nn/modules/modu

RuntimeError: ShapeProp error for: node=%conv11_1 : [num_users=1] = call_module[target=conv11.1](args = (%conv11_0,), kwargs = {}) with meta={'nn_module_stack': OrderedDict([('conv11', ('conv11', <class 'torch.nn.modules.container.Sequential'>)), ('conv11.1', ('conv11.1', <class 'torch.nn.modules.batchnorm.BatchNorm2d'>))])}

While executing %conv11_1 : [num_users=1] = call_module[target=conv11.1](args = (%conv11_0,), kwargs = {})
Original traceback:
None

### Export

Now, we can export the quantized model, ready for deployment om IMX500, into a `.onnx` format file. Please ensure that the `save_model_path` has been set correctly. 

In [None]:
mct.exporter.pytorch_export_model(model=quant_model,
                                  save_model_path='./model.onnx',
                                  repr_dataset=representative_dataset_gen)

## Visualize samples from lsun-bedrooms
Next, we visualize a sample of RGB image along with its depth image from the floating point and the quantized model.

In [6]:
import torch
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

cmap = plt.cm.viridis

def colored_depthmap(depth: np.ndarray, d_min: float = None, d_max: float = None) -> np.ndarray:
    """
    This function create depth map for visualization.
    Args:
        depth: depth image
        d_min: minimum depth
        d_max: maximum depth
    Returns:
        A depth map
    """  
    if d_min is None:
        d_min = np.min(depth)
    if d_max is None:
        d_max = np.max(depth)
    depth_relative = (depth - d_min) / (d_max - d_min)
    return 255 * cmap(depth_relative)[:,:,:3] # H, W, C

def merge_into_row(img: torch.tensor, depth_float: torch.tensor, depth_quant: torch.tensor) -> torch.tensor:
    """
    This function that merge output of 2 depth estimation result together for visualization.
    Args:
        img: RGB image
        depth_float: Depth image of floating-point model
        depth_quant: Depth image of quantized model
    Returns:
        A merged image
    """  
    rgb = 255 * np.transpose(np.squeeze(img.detach().cpu().numpy()), (1,2,0)) # H, W, C
    depth_float = np.squeeze(depth_float.detach().cpu().numpy())
    depth_quant = np.squeeze(depth_quant.detach().cpu().numpy())

    d_min = min(np.min(depth_float), np.min(depth_quant))
    d_max = max(np.max(depth_float), np.max(depth_quant))
    depth_float_col = colored_depthmap(depth_float, d_min, d_max)
    depth_quant_col = colored_depthmap(depth_quant, d_min, d_max)
    img_merge = np.hstack([rgb, depth_float_col, depth_quant_col])
    
    return img_merge


# Take a sample
SAMPLE_IDX = 0
img = val_dataset[SAMPLE_IDX]
img = img.unsqueeze(0).to(device) # adding batch size

# Inference float-point and quantized models
depth_float = model(img)
depth_quant = quant_model(img)

# Create and save image for visualization
merge_img = merge_into_row(img, depth_float, depth_quant)
merge_img = Image.fromarray(merge_img.astype('uint8'))
merge_img.save("depth.png")
print('Depth image is saved!')

OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB. GPU 0 has a total capacity of 10.74 GiB of which 2.44 MiB is free. Process 3262902 has 8.81 GiB memory in use. Including non-PyTorch memory, this process has 1.92 GiB memory in use. Of the allocated memory 171.16 MiB is allocated by PyTorch, and 22.84 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)


Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.