- Python 3.8
- Should be already installed with Ubuntu 20.04
- Ubuntu 20.04
- CUDA 11.4 (Jetson)
- TensorRT 8+
- Ubuntu 20.04
- CUDA 11.8
- TensorRT 8.5 GA Update 1 (8.5.2.2)
- NVIDIA Driver 525.85.12 (Data center / Tesla series) / 525.105.17 (TITAN, GeForce RTX / GTX series and RTX / Quadro series)
- NVIDIA DeepStream SDK 6.x
- GStreamer 1.16.3
- DeepStream-Yolo
- JetPack 5.1.1 / 5.1
- NVIDIA DeepStream SDK
- Download and install from https://developer.nvidia.com/deepstream-download
- DeepStream-Yolo
-
Installing GstRtspServer and introspection typelib
sudo apt update sudo apt install python3-gi python3-dev python3-gst-1.0 -y sudo apt-get install libgstrtspserver-1.0-0 gstreamer1.0-rtsp
For gst-rtsp-server (and other GStreamer stuff) to be accessible in Python through gi.require_version(), it needs to be built with gobject-introspection enabled (libgstrtspserver-1.0-0 is already). Yet, we need to install the introspection typelib package:
sudo apt-get install libgirepository1.0-dev sudo apt-get install gobject-introspection gir1.2-gst-rtsp-server-1.0
source : YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss
- YOLOv7
- Gwencong/yolov7-pose-tensorrt
- nanmi/yolov7-pose
- support single batch only
- Some problems with
/YoloLayer_TRT_v7.0/build/libyolo.so
- The detection box is not synchronized with the screen on Jetson
- YOLOv8
Prepare YOLOv8 TensorRT Engine
-
Choose yolov8-pose for better operator optimization of ONNX model
-
The yolov8-pose model conversion route is : YOLOv8 PyTorch model -> ONNX -> TensorRT Engine
Notice !!!
⚠️ This repository don't support TensorRT API building !!!
https://github.com/ultralytics/ultralytics
Benchmark of YOLOv8-Pose
See Pose Docs for usage examples with these models.
Model | size (pixels) |
mAPpose 50-95 |
mAPpose 50 |
Speed CPU ONNX (ms) |
Speed A100 TensorRT (ms) |
params (M) |
FLOPs (B) |
---|---|---|---|---|---|---|---|
YOLOv8n-pose | 640 | 50.4 | 80.1 | 131.8 | 1.18 | 3.3 | 9.2 |
YOLOv8s-pose | 640 | 60.0 | 86.2 | 233.2 | 1.42 | 11.6 | 30.2 |
YOLOv8m-pose | 640 | 65.0 | 88.8 | 456.3 | 2.00 | 26.4 | 81.0 |
YOLOv8l-pose | 640 | 67.6 | 90.0 | 784.5 | 2.59 | 44.4 | 168.6 |
YOLOv8x-pose | 640 | 69.2 | 90.2 | 1607.1 | 3.73 | 69.4 | 263.2 |
YOLOv8x-pose-p6 | 1280 | 71.6 | 91.2 | 4088.7 | 10.04 | 99.1 | 1066.4 |
-
mAPval values are for single-model single-scale on COCO Keypoints val2017 dataset.
Reproduce byyolo val pose data=coco-pose.yaml device=0
-
Speed averaged over COCO val images using an Amazon EC2 P4d instance.
Reproduce byyolo val pose data=coco8-pose.yaml batch=1 device=0|cpu
-
Source : ultralytics
wget https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s-pose.pt
-
Export Orin ONNX model by ultralytics You can leave this repo and use the original
ultralytics
repo for onnx export. -
CLI tools(
yolo
command from "ultralytics.com")- Recommended in your server to get faster speed ⚡
- ref : ultralytics.com/modes/export
- Usage(after
pip3 install ultralytics
):
yolo export model=yolov8s-pose.pt format=onnx device=0 \ imgsz=640 \ dynamic=true \ simplify=true
After executing the above command, you will get an engine named
yolov8s-pose.onnx
too. -
Move your Onnx Model to egdge device in specific path
- put model on your edge device
sudo chmod u+rwx -R /opt/nvidia/deepstream/deepstream/samples/models # Add Write and execute permissions sudo mkdir -p tao_pretrained_models/YOLOv8-TensorRT sudo chmod u+rwx -R tao_pretrained_models/YOLOv8-TensorRT mv -v <path_of_your_yolov8-pose_model> /opt/nvidia/deepstream/deepstream/samples/models/tao_pretrained_models/YOLOv8-TensorRT/yolov8s-pose-dy-sim-640.onnx
- put model on your edge device
- Check Model Ouputs
- Note that the number of anchors for
YOLOv8-Pose
is 56- bbox(4) + confidence(1) + keypoints(3 x 17) = 4 + 1 + 0 + 51 = 56
- The number of anchors of
YOLOv7-Pose
is 57- bbox(4) + confidence(1) + cls(1) + keypoints(3 x 17) = 4 + 1 + 1 + 51 = 57
- Note that the number of anchors for
- Model registration information of YOLOv8S-Pose
INPUTS
: (batch, channel, height, width)OUTPUTS
: (batch, anchors, max_outpus)
⚠️ Must be bound to a hardware device, please put it on your edge device(It's a long wait ⌛)- Specify parameters such as
-minShapes --optShapes --maxShapes
to set dynamic batch processing.
cd /opt/nvidia/deepstream/deepstream/samples/models/tao_pretrained_models/YOLOv8-TensorRT
sudo /usr/src/tensorrt/bin/trtexec --verbose \
--onnx=yolov8s-pose-dy-sim-640.onnx \
--fp16 \
--workspace=4096 \
--minShapes=images:1x3x640x640 \
--optShapes=images:12x3x640x640 \
--maxShapes=images:16x3x640x640 \
--saveEngine=yolov8s-pose-dy-sim-640.engine
/usr/src/tensorrt/bin/trtexec --loadEngine=yolov8s-pose-dy.engine
-
or test with multi batch for dynamic shaped onnx model
--shapes=spec
Set input shapes for dynamic shapes inference inputs.
/usr/src/tensorrt/bin/trtexec \ --loadEngine=yolov8s-pose-dy-sim-640.engine \ --shapes=images:12x3x640x640
- Performance on Jetson(AGX Xavier / AGX Orin) for TensorRT Engine
model | device | size | batch | fps | ms |
---|---|---|---|---|---|
yolov8s-pose.engine | AGX Xavier | 640 | 1 | 40.6 | 24.7 |
yolov8s-pose.engine | AGX Xavier | 640 | 12 | 12.1 | 86.4 |
yolov8s-pose.engine | AGX Orin | 640 | 1 | 258.8 | 4.2 |
yolov8s-pose.engine | AGX Orin | 640 | 12 | 34.8 | 33.2 |
yolov7w-pose.engine* | AGX Xavier | 960 | 1 | 19.0 | 52.1 |
yolov7w-pose.engine* | AGX Orin | 960 | 1 | 61.1 | 16.8 |
yolov7w-pose.pt | AGX Xavier | 960 | 1 | 14.4 | 59.8 |
yolov7w-pose.pt | AGX Xavier | 960 | 1 | 11.8 | 69.4 |
- * yolov7w-pose with yolo layer tensorrt plugin from (nanmi/yolov7-pose).NMS not included。Single batch and image_size 960 only.
- test .engine(TensorRT) model with
trtexec
command. - test .pt model with Pytorch (with 15s video) for baseline.
- NMS not included in all test
git clone https://github.com/YunghuiHsu/deepstream-yolo-pose.git
-
NVInfer with rtsp inputs
python3 deepstream_YOLOv8-Pose_rtsp.py \ -i rtsp://sample_1.mp4 \ rtsp://sample_2.mp4 \ rtsp://sample_N.mp4 \
-
eg: loop with local file inputs
python3 deepstream_YOLOv8-Pose_rtsp.py \ -i file:///home/ubuntu/video1.mp4 file:///home/ubuntu/video2.mp4 \ -config dstest1_pgie_YOLOv8-Pose_config.txt \ --file-loop
-
Default RTSP streaming location:
rtsp://<server IP>:8554/ds-test
- VLC Player on client suggested(Camera Streaming and Multimedia)
Note:
- if
-g/--pgie
: uses nvinfer as default. (['nvinfer', 'nvinferserver']). -config/--config-file
: need to be provided for custom models.--file-loop
: option can be used to loop input files after EOS.--conf-thres
: Objec Confidence Threshold--iou-thres
: IOU Threshold for NMS
This sample app is derived from NVIDIA-AI-IOT/deepstream_python_apps/apps and adds customization features
-
Includes following :
-
Accepts multiple sources
-
Dynamic batch model(YOLO-POSE)
-
Accepts RTSP stream as input and gives out inference as RTSP stream
-
NVInfer GPU inference engine
-
NVInferserver GPU inference engine(Not yet tested)
-
MultiObjectTracker(NVTracker)
-
Automatically adjusts the tensor shape of the loaded input and output (
NvDsInferTensorMeta
) -
Extract the stream metadata,
image datafrom the batched buffer ofGst-nvinfer
source : deepstream-imagedata-multistream
-