## Steps for generating object_detect.dlc

For this demo, a YoloNAS model is used. You can read more about this model in VisionSolution1-YoloNasSSD Readme.

#### Note->Use python3.8 or above for generating onnx and python3.6 for generating dlc

In [None]:
## Note- Use python3.8 or above for generating onnx

!pip install super-gradients==3.1.2


## Downloading Model from git repo
import torch
# Load model with pretrained weights
from super_gradients.training import models
from super_gradients.common.object_names import Models

model = models.get(Models.YOLO_NAS_S, pretrained_weights="coco")

# Prepare model for conversion
# Input size is in format of [Batch x Channels x Width x Height] where 640 is the standard COCO dataset dimensions
model.eval()
model.prep_model_for_conversion(input_size=[1, 3, 320, 320])

# Create dummy_input
dummy_input = torch.randn([1, 3, 320, 320], device="cpu")

# Convert model to onnx
torch.onnx.export(model, dummy_input, "yolo_nas_s.onnx", opset_version=11)

#### Enable python3.6 environment, to use SNPE SDK and then convert onnx to dlc

In [None]:
%%bash
snpe-onnx-to-dlc -i yolo_nas_s.onnx -o app/src/main/assets/yolo_nas_s.dlc

## Quantizing MobileNetSSD

In [None]:
##STEPS to preprocess images

def preprocess(original_image):
    resized_image = cv2.resize(original_image, (320, 320))
    resized_image = resized_image/255
    return resized_image

import cv2
import numpy as np
import os

##Please download Coco2014 dataset and give the path here
dataset_path = "/workspace/val2014/"

!mkdir -p rawYoloNAS

filenames=[]
for path in os.listdir(dataset_path)[:5]:
    # check if current path is a file
    if os.path.isfile(os.path.join(dataset_path, path)):
        filenames.append(os.path.join(dataset_path, path))

for filename in filenames:
    original_image = cv2.imread(filename)
    img = preprocess(original_image)
    img = img.astype(np.float32)
    img.tofile("rawYoloNAS/"+filename.split("/")[-1].split(".")[0]+".raw")

In [None]:
%%bash
find rawYoloNAS -name *.raw > YoloInputlist.txt

In [None]:
%%bash
cat YoloInputlist.txt

In [None]:
%%bash
snpe-dlc-quantize --input_dlc app/src/main/assets/yolo_nas_s.dlc --input_list YoloInputlist.txt --use_enhanced_quantizer --use_adjusted_weights_quantizer --axis_quant --output_dlc app/src/main/assets/Quant_yoloNas_s_320.dlc


## How to change the object-detect model ? 

Object detection models are highly dependant on model architecture, and the pre-processing requirements vary a lot from model to model. 
If user intends to use a different model e.g. YoloV5, following steps should be followed : 

- Ensure Qualcomm® Neural Processing SDK supports the operations in selected model
- Study the pre processing, and post processing requirements for the selected model
- Most object detection models operate in RGB space. Input camera YUV buffers need to be converted to RGB basd on model requirements 


# Info about HRNET

HRNET model is State-of-the-art model for human pose estimation. It has good accuracy for results with single person, but has lower accuracy for multiple persons. To enhance that, HRNET uses object-detect model to identify a single person in a frame and then give the data to HRNET to get pose of that person. In this solution, we use MobileNetSSD for detecting human and then give the preprocesssed data to HRNET to achieve better accuracy for pose estimation.

HRNET dlc takes 256x192x3 flattened array as input and returns output of dims 17x64x48. HRNET generates heatmap for 17 human joints and each heatmap is of size 64x48.

In [3]:
#Get PTH file and generate onnx file from it
!pwd
!git clone https://github.com/HRNet/HRNet-Human-Pose-Estimation.git
%cd HRNet-Human-Pose-Estimation/
!git checkout 00d7bf72f56382165e504b10ff0dddb82dca6fd2
%cd ..

/local/mnt/workspace/shubgoya/native_doc/test_generateDLC_native
Cloning into 'HRNet-Human-Pose-Estimation'...
remote: Enumerating objects: 140, done.[K
remote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (3/3), done.[K
remote: Total 140 (delta 0), reused 1 (delta 0), pack-reused 137[K
Receiving objects: 100% (140/140), 1.69 MiB | 8.05 MiB/s, done.
Resolving deltas: 100% (64/64), done.
/local/mnt/workspace/shubgoya/native_doc/test_generateDLC_native/HRNet-Human-Pose-Estimation
Note: checking out '00d7bf72f56382165e504b10ff0dddb82dca6fd2'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 00d7

In [8]:
%%bash
cd HRNet-Human-Pose-Estimation/lib
make
cd ../..

cd nms; python setup_linux.py build_ext --inplace; rm -rf build; cd ../../
running build_ext
skipping 'cpu_nms.c' Cython extension (up-to-date)
skipping 'gpu_nms.cpp' Cython extension (up-to-date)


In [9]:
import torch
import numpy as np
%matplotlib inline
from matplotlib import pyplot as plt
import sys

import os
import os.path as osp

# def add_path(path):
#     if path not in sys.path:
#         sys.path.insert(0, path)

lib_path = osp.join(os.getcwd(), 'HRNet-Human-Pose-Estimation/lib')
sys.path.insert(0, lib_path)

import torch
import torch.utils.data
import torchvision.transforms as transforms
from config import cfg
from dataset.coco import COCODataset

In [11]:
!mkdir -p model_binaries

In [12]:
# %cd model_binaries
# !mv HRNet-Human-Pose-Estimation/lib/* . 
# !cp -R -T -f ./HRNet-Human-Pose-Estimation/lib/ . 


import os
import torch
import urllib.request

torch.__version__
# print(torch.__version_)

##Getting .pth file
OPTIMIZED_CHECKPOINT_URL = (
    "https://github.com/quic/aimet-model-zoo/releases/download/hrnet-posenet/"
)

if not os.path.exists(f"./model_binaries/hrnet_posenet_FP32.pth"):
    urllib.request.urlretrieve(
        f"{OPTIMIZED_CHECKPOINT_URL}/hrnet_posenet_FP32.pth",
        f"model_binaries/hrnet_posenet_FP32.pth",
    )


input_shape = (1, 3, 256, 192)
dummy_input = torch.randn(input_shape)
model = torch.load("model_binaries/hrnet_posenet_FP32.pth")
model.to('cpu')

onnx_model_name = "model_binaries/AIMET_HRNET_posnet.onnx"

opset = 11

torch.onnx.export(
    model.cpu(),
    dummy_input,
    onnx_model_name,
    verbose=True,
    do_constant_folding=True,
    export_params=True,
    input_names=['input'],
    output_names=['output'],
    opset_version=opset
)


graph(%input : Float(1, 3, 256, 192, strides=[147456, 49152, 192, 1], requires_grad=0, device=cpu),
      %final_layer.weight : Float(17, 32, 1, 1, strides=[32, 1, 1, 1], requires_grad=1, device=cpu),
      %final_layer.bias : Float(17, strides=[1], requires_grad=1, device=cpu),
      %2903 : Float(64, 3, 3, 3, strides=[27, 9, 3, 1], requires_grad=0, device=cpu),
      %2904 : Float(64, strides=[1], requires_grad=0, device=cpu),
      %2906 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu),
      %2907 : Float(64, strides=[1], requires_grad=0, device=cpu),
      %2909 : Float(64, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu),
      %2910 : Float(64, strides=[1], requires_grad=0, device=cpu),
      %2912 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu),
      %2913 : Float(64, strides=[1], requires_grad=0, device=cpu),
      %2915 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu),
      %2916 : F

In [13]:
!rm -r -f HRNet-Human-Pose-Estimation

## Steps for generating HRNET dlc for int8

In [23]:
%%bash
snpe-onnx-to-dlc -i model_binaries/AIMET_HRNET_posnet.onnx -o app/src/main/assets/hrnet.dlc

2023-05-12 12:26:44,332 - 214 - INFO - Successfully simplified the onnx model in child process
2023-05-12 12:26:44,859 - 214 - INFO - Successfully receive the simplified onnx model in main process
2023-05-12 12:26:47,290 - 214 - INFO - INFO_INITIALIZATION_SUCCESS: 
2023-05-12 12:26:47,662 - 214 - INFO - INFO_CONVERSION_SUCCESS: Conversion completed successfully
2023-05-12 12:26:47,792 - 214 - INFO - INFO_WRITE_SUCCESS: 


## Steps for Quantization

In [None]:
from PIL import Image
normalize = transforms.Normalize(
    mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
)

preproc = transforms.Compose(
        [
            transforms.ToTensor(),
            normalize,
        ]
    )

In [None]:
##Add your dataset path
dataset_path = "/local/mnt/workspace/shubgoya/val2014/"

##Make empty directory
!mkdir -p rawHRNET

import cv2,os
filenames=[]
for path in os.listdir(dataset_path)[:5]:
    # check if current path is a file
    if os.path.isfile(os.path.join(dataset_path, path)):
        filenames.append(os.path.join(dataset_path, path))
print(filenames)

for filename in filenames:
    orig_img = cv2.imread(filename)
    img = cv2.cvtColor(orig_img, cv2.COLOR_BGR2RGB)
    img = cv2.resize(img,(256,192),
                   interpolation = cv2.INTER_LINEAR)
    model_input = preproc(img).unsqueeze(0)

    model_input = model_input.cpu().detach().numpy()
    model_input = model_input.transpose(0,2,3,1)     
    fid = open("rawHRNET/"+filename.split("/")[-1].split(".")[0]+".raw", 'wb')
    model_input.tofile(fid)

In [27]:
%%bash

##Creating raw files - TODO
find rawHRNET -name *.raw > HRNET_input_list.txt

## Quantization
snpe-dlc-quant --input_dlc app/src/main/assets/hrnet.dlc --input_list HRNET_input_list.txt --axis_quant --output_dlc app/src/main/assets/hrnet_axis_int8.dlc

[INFO] InitializeStderr: DebugLog initialized.
[INFO] Attempting to open dynimcally linked lib: libHtpPrepare.so
[INFO] dlopen libHtpPrepare.so SUCCESS handle 0x1695650
[INFO] Found Interface Provider (v2.3)
[USER_INFO] QnnDsp <I> Qnn log initialized
[USER_INFO] QnnDsp <I> addClient done (1). status 0x0
[USER_INFO] QnnDsp <I> addClient started.
[USER_INFO] QnnDsp <I> OpValidatorTest::registerOpValidator qti.aisw

[USER_INFO] QnnDsp <I> addClient done (1). status = 0x0
[USER_INFO] QnnDsp <I> QnnDevice_create started. device = 0x1
[USER_INFO] QnnDsp <I> QnnDevice_create done. status 0x0
[USER_INFO] QnnDsp <I> QnnContext_create started. context 0x1
[USER_INFO] QnnDsp <I> QnnContext_create done successfully
[USER_INFO] QnnDsp <I> QnnGraph_create started. context 0x1, graph name: QnnGraph
[USER_INFO] QnnDsp <I> QnnGraph_create done. graph QnnGraph, id 0x1
[USER_INFO] QnnDsp <I> QnnTensor_createGraphTensor graph 1 
[USER_INFO] QnnDsp <I> create Tensor Id 2816571830 for Tensor name Conv_0Conv