## Pytorch 추론시간 단축을 위한 정밀도 감소 - TensorRT / torch2trt library

### 환경설정

- [Nvidia developer portal](https://developer.nvidia.com/nvidia-tensorrt-7x-download) 회원가입
- TensorRT 7.0.0.11 for Ubuntu 1804 and CUDA 10.0 DEB local repo package 다운로드 후 colab 업로드

### TensorRT 7.0 을 colab 에 설치하는 방법

In [None]:
!nvcc --version

### colab CUDA 10.1 삭제

In [None]:
# remove CUDA 10.1

!sudo apt-get --purge remove cuda nvidia* libnvidia-*
!sudo dpkg -l | grep cuda- | awk '{print $2}' | xargs -n1 dpkg --purge
!sudo apt-get remove cuda-*
!sudo apt autoremove
!sudo apt-get update

### CUDA 10.0 설치 - 입력 세번 필요 (Y, 31, 1)

In [None]:
# Installing CUDA 10.0

!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
!sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
!sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
!sudo apt-get update
!wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
!sudo apt install -y ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
!sudo apt-get update

# Install NVIDIA driver
!sudo apt-get -y installnvidia-driver-418

# Install development and runtime libraries (~4GB)
!sudo apt-get install -y \
    cuda-10-0 \
    libcudnn7=7.6.2.24-1+cuda10.0  \
    libcudnn7-dev=7.6.2.24-1+cuda10.0 --allow-change-held-packages

In [None]:
!nvcc --version

### TensorRT 설치 - colab .deb 파일 경로 확인

In [None]:
# install tensorrt
!sudo dpkg -i "/content/drive/My Drive/capstone1/CAN/nv-tensorrt-repo-ubuntu1804-cuda10.0-trt7.0.0.11-ga-20191216_1-1_amd64.deb"
!sudo apt-key add /var/nv-tensorrt-repo-cuda10.0-trt7.0.0.11-ga-20191216/7fa2af80.pub

!sudo apt-get update

!sudo apt-get install libnvinfer7=7.0.0-1+cuda10.0 libnvonnxparsers7=7.0.0-1+cuda10.0 libnvparsers7=7.0.0-1+cuda10.0 libnvinfer-plugin7=7.0.0-1+cuda10.0 libnvinfer-dev=7.0.0-1+cuda10.0 libnvonnxparsers-dev=7.0.0-1+cuda10.0 libnvparsers-dev=7.0.0-1+cuda10.0 libnvinfer-plugin-dev=7.0.0-1+cuda10.0 python-libnvinfer=7.0.0-1+cuda10.0 python3-libnvinfer=7.0.0-1+cuda10.0

!sudo apt-mark hold libnvinfer7 libnvonnxparsers7 libnvparsers7 libnvinfer-plugin7 libnvinfer-dev libnvonnxparsers-dev libnvparsers-dev libnvinfer-plugin-dev python-libnvinfer python3-libnvinfer

!sudo apt-get install tensorrt

### TensorRT 설치 확인

In [None]:
!dpkg -l | grep TensorRT

### 런타임 다시시작하기

### [torch2trt](https://github.com/NVIDIA-AI-IOT/torch2trt) 라이브러리 설치

In [None]:
cd /content/drive/My\ Drive/capstone1/CAN

In [None]:
!pip3 install pycuda
!git clone https://github.com/NVIDIA-AI-IOT/torch2trt

In [None]:
cd /content/drive/My\ Drive/capstone1/CAN/torch2trt

In [None]:
!python setup.py install

In [None]:
import tensorrt
import torch2trt

### 추론시간 비교하기

In [None]:
cd /content/drive/My\ Drive/capstone1/CAN

In [None]:
import utils
import numpy as np
import importlib
importlib.reload(utils)

utils.run_benchmark('./weights/fed_avg_50_0.9688.pth')

- torch: 464 msc / 96.87 % / 0.05931
- trt: 354 msc / 96.87 % / 0.05931
- trt float 16: 323 msc / 96.87 % / 0.05931
- trt int8 strict: 401 msc / 96.86 % / 06405


#### int8 양자화 실패한 이유 추측
- input 이 0.0 ~ 1.0 이기 때문에 계산이 전부 float 연산
- nvidia tensorRT  가 정확도가 너무 낮아질 경우 양자화 무시
- 사용법 미숙, 모델이 너무 간단함
