YOLO v5 Object Detection on Triton Inference Server
- What does this application do?
- Model Pipeline
- Prerequisites
- Server Installation (for Jetson)
- Server Installation (for x86_64)
- Run Server (for Jetson)
- Run Server (for x86_64)
- Install Client
- Run Client
This application demonstrates the following things.
- How to prepare TensorRT model for NVIDIA Triton Inference Server
- How to launch NVIDIA Triton Inference Server
- How to form a pipeline with the model ensemble
- How to implement client applications for Triton Inference Server
The below pipeline is formed with the model ensemble.
Order | Model Name | Backend | Input Type |
Input Dimension |
Output Type |
Output Dimension |
Description |
---|---|---|---|---|---|---|---|
1 | preprocess | Python | UINT8 | [3, 384, 640] | FP32 | [3, 384, 640] | Type Conversion Normalization |
2 | yolov5s_trt | TensorRT | FP32 | [3, 384, 640] | FP32 | [15120, 85] | Object Detection |
3 | postprocess | Python | FP32 | [15120, 85] | FP32 | [1, -1, 6] | Bounding Box Generation Non-Maximum Suppression |
The pipeline output [1, -1, 6] consists of 1 * N * [x0, y0, x1, y1, score, class].
N : The number of the detected bounding boxes
(x0, y0) : The coordinate of the top-left corner of the detected bounding box
(x1, y1) : The coordinate of the bottom-right corner of the detected bounding box
- Jetson Xavier/Orin or x86_64 Linux with NVIDIA GPU
- For Jetson, JetPack 5.0.2 or later
- For x86_64, NGC account
- Linux(x86_64/ARM64) or Windows(x86_64)
No GPU resource needed for client
-
Clone this repository
git clone https://github.com/MACNICA-CLAVIS-NV/yolov5-triton
cd yolov5-triton/server
-
Launch PyTorch container
./torch_it.sh
-
Obtain YOLO v5 ONNX model
pip3 install -U \ 'protobuf<4,>=3.20.2' \ numpy \ onnx \ pandas \ PyYAML \ tqdm \ matplotlib \ seaborn \ psutil \ gitpython \ scipy \ setuptools
python3 torch2onnx.py yolov5s
-
Covert ONNX model to TensorRT engine
/usr/src/tensorrt/bin/trtexec \ --onnx=yolov5s.onnx \ --saveEngine=model.plan \ --workspace=4096 \ --exportProfile=profile.json
-
Copy TensorRT engine to model repository
cp model.plan ./model_repository/yolov5s_trt/1/
-
Exit from PyTorch container
exit
-
Build a docker image for Triton Inference Server
./triton_build.sh
Need NGC account
-
Clone this repository
git clone https://github.com/MACNICA-CLAVIS-NV/yolov5-triton
cd yolov5-triton/server
-
Launch PyTorch container
./torch_it_x86.sh
-
Obtain YOLO v5 ONNX model
pip3 install \ protobuf \ pandas \ PyYAML \ tqdm \ matplotlib \ seaborn \ gitpython
python3 torch2onnx.py yolov5s
-
Covert ONNX model to TensorRT engine
/usr/src/tensorrt/bin/trtexec \ --onnx=yolov5s.onnx \ --saveEngine=model.plan \ --workspace=4096 \ --exportProfile=profile.json
-
Copy TensorRT engine to model repository
cp model.plan ./model_repository/yolov5s_trt/1/
-
Exit from PyTorch container
exit
sudo jetson_clocks
./triton_start_grpc.sh
./triton_start_grpc_x86.sh
The client application does not need GPU resource. It can be deployed to Windows/Linux without GPU card. Virtual python environment like conda or venv is recommened.
-
Clone this repository
git clone https://github.com/MACNICA-CLAVIS-NV/yolov5-triton
cd yolov5-triton/client
-
Install Python dependencies
pip install tritonclient[all] Pillow opencv-python
python infer_image.py [-h] [--url SERVER_URL] IMAGE_FILE
Example:
python infer_image.py --url localhost:8000 test.jpg
python infer_camera.py [-h] [--camera CAMERA_ID] [--width CAPTURE_WIDTH] [--height CAPTURE_HEIGHT] [--url SERVER_URL]
Example:
python infer_camera.py --camera 1 --width 640 --height 480 --url 192.168.XXX.XXX:8000