YOLOv9 with End2End ( Efficient NMS) #130

levipereira · 2024-02-29T20:26:12Z

Thank you for your wonderful work!

YOLOv9 with End2End ( Efficient NMS)
Note: The primary purpose of employing End2End is to utilize ONNX models on TensorRT. If you choose not to use TensorRT, you should proceed with the standard ONNX export process.

I've created a forked repository from the original, adding End-to-End support for ONNX export. The changes can be found in export.py and models/experimental.py. Both files remain fully compatible with all current export operations.
Check it out at https://github.com/levipereira/yolov9

Support for End-to-End ONNX Export: Added support for end-to-end ONNX export in export.py and models/experimental.py.
Model Compatibility: This functionality currently works with all DetectionModel models ;
Configuration Variables: Use the following flags to configure the model:
- --include onnx_end2end: Enabled export End2End
- --simplify: ONNX/ONNX END2END: Simplify model.
- --topk-all: ONNX END2END/TF.js NMS: Top-k for all classes to keep (default: 100).
- --iou-thres: ONNX END2END/TF.js NMS: IoU threshold (default: 0.45).
- --conf-thres: ONNX END2END/TF.js NMS: Confidence threshold (default: 0.25).

Example:

$ python3 export.py --weights ./yolov9-c.pt --imgsz 640 --simplify --include onnx_end2end

export: data=data/coco.yaml, weights=['./yolov9-c.pt'], imgsz=[640], batch_size=1, device=cpu, half=False, inplace=False, keras=False, optimize=False, int8=False, dynamic=False, simplify=True, opset=12, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['onnx_end2end']
YOLOv5 🚀 v0.1-27-g86b0667 Python-3.8.10 torch-1.14.0a0+44dac51 CPU

Fusing layers...
Model summary: 604 layers, 50880768 parameters, 0 gradients, 237.6 GFLOPs

PyTorch: starting from ./yolov9-c.pt with output shape (1, 84, 8400) (98.4 MB)

ONNX END2END: starting export with onnx 1.13.0...
/yolov9/models/experimental.py:102: FutureWarning: 'torch.onnx._patch_torch._graph_op' is deprecated in version 1.13 and will be removed in 1.14. Please note 'g.op()' is to be removed from torch.Graph. Please open a GitHub issue if you need this functionality..
  out = g.op("TRT::EfficientNMS_TRT",
[W shape_type_inference.cpp:1913] Warning: The shape inference of TRT::EfficientNMS_TRT type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1913] Warning: The shape inference of TRT::EfficientNMS_TRT type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1913] Warning: The shape inference of TRT::EfficientNMS_TRT type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1913] Warning: The shape inference of TRT::EfficientNMS_TRT type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
========== Diagnostic Run torch.onnx.export version 1.14.0a0+44dac51 ===========
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 4 WARNING 0 ERROR ========================
4 WARNING were not printed due to the log level.


Starting to simplify ONNX...
ONNX export success, saved as ./yolov9-c_end2end.onnx
ONNX END2END: export success ✅ 11.5s, saved as ./yolov9-c_end2end.onnx (129.3 MB)

Export complete (13.6s)
Results saved to /yolov9/experiments/models
Visualize:       https://netron.app

The text was updated successfully, but these errors were encountered:

WongKinYiu · 2024-03-01T02:45:28Z

Added to readme.

WongKinYiu · 2024-03-02T13:04:36Z

https://github.com/levipereira/yolov9/blob/main/models/experimental.py#L140

Here may has bug.
output[0] is prediction of aux branch, output[1] is prediction of main branch.

levipereira · 2024-03-03T01:03:13Z

Thanks. Fixed the issue was set output[1] as the prediction for the main branch instead of output[0].
levipereira@20f921f

laugh12321 · 2024-03-03T05:38:34Z

Hello everyone!

I would like to introduce my open-source project - TensoRT-YOLO, a tool for deploying YOLO Series with Efficient NMS in TensorRT.

Key Features

Support for YOLOv3, YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, YOLOv10, PP-YOLOE and PP-YOLOE+
Support for ONNX static and dynamic export, as well as TensorRT inference
Integration of EfficientNMS TensorRT plugin for accelerated post-processing
Utilization of CUDA kernel functions for accelerated preprocessing
Support for inference in both C++ and Python
Command-line interface for quick export and inference
One-click Docker deployment

Perfomance Test using GPU RTX 2080Ti 2GB on AMD Ryzen 7 5700X 8-Core/ 128GB RAM

All models are converted to ONNX models with the EfficientNMS plugin. The conversion was done using the TensoRT-YOLO tool, with the trtyolo CLI tool installed via pip install tensorrt-yolo==3.0.1. The batch size is 1 and the image size is 640.

Model Export and Performance Testing

Use the following commands to export the model and perform performance testing with trtexec:

trtyolo export -v yolov9 -w yolov9-converted.pt --imgsz 640 -o ./
trtexec --onnx=yolov9-converted.onnx --saveEngine=yolov9-converted.engine --fp16
trtexec --fp16 --avgRuns=1000 --useSpinWait --loadEngine=yolov9-converted.engine

Performance testing was conducted using the TensorRT-YOLO inference on the coco128 dataset.

YOLOv9 Series

Tool	YOLOv9-T-Converted	YOLOv9-S-Converted	YOLOv9-M-Converted	YOLOv9-C-Converted	YOLOv9-E-Converted
trtexec (infer)	Mean Latency (ms) `3.51857`	Mean Latency (ms) `3.67899`	Mean Latency (ms) `4.19460`	Mean Latency (ms) `4.25964`	Mean Latency (ms) `8.95429`
TensorRT-YOLO Python (infer)	Mean Latency (ms) `10.19576`	Mean Latency (ms) `10.15226`	Mean Latency (ms) `9.29918`	Mean Latency (ms) `9.60093`	Mean Latency (ms) `21.85042`
TensorRT-YOLO C++ (pre + infer)	Mean Latency (ms) `3.44162`	Mean Latency (ms) `3.66080`	Mean Latency (ms) `4.10519`	Mean Latency (ms) `4.12471`	Mean Latency (ms) `8.98964`

Tool	Gelan-S2	Gelan-S	Gelan-M	Gelan-C	Gelan-E
trtexec (infer)	Mean Latency (ms) `3.42082`	Mean Latency (ms) `3.78578`	Mean Latency (ms) `4.16447`	Mean Latency (ms) `4.27485`	Mean Latency (ms) `8.91479`
TensorRT-YOLO Python (infer)	Mean Latency (ms) `9.96435`	Mean Latency (ms) `10.35934`	Mean Latency (ms) `9.14044`	Mean Latency (ms) `9.33843`	Mean Latency (ms) `21.42764`
TensorRT-YOLO C++ (pre + infer)	Mean Latency (ms) `3.60857`	Mean Latency (ms) `3.93528`	Mean Latency (ms) `4.25084`	Mean Latency (ms) `4.35533`	Mean Latency (ms) `9.23654`

YOLOv8 Series

Tool	YOLOv8n	YOLOv8s	YOLOv8m	YOLOv8l	YOLOv8x
trtexec (infer)	Mean Latency (ms) `1.90273`	Mean Latency (ms) `2.34166`	Mean Latency (ms) `3.58595`	Mean Latency (ms) `4.83306`	Mean Latency (ms) `7.12179`
TensorRT-YOLO Python (infer)	Mean Latency (ms) `7.03217`	Mean Latency (ms) `7.52751`	Mean Latency (ms) `8.75298`	Mean Latency (ms) `10.56914`	Mean Latency (ms) `12.45605`
TensorRT-YOLO C++ (pre + infer)	Mean Latency (ms) `2.02848`	Mean Latency (ms) `2.15021`	Mean Latency (ms) `3.57631`	Mean Latency (ms) `4.78318`	Mean Latency (ms) `6.96686`

radandreicristian · 2024-03-05T11:27:40Z

Hey @levipereira, first of all thanks for this work!

The warning in the ONNX export step results in a run-time error when trying to do the inference.

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from gelan-c-end2end.onnx failed:Fatal error: TRT:EfficientNMS_TRT(-1) is not a registered function/op

Any idea what could cause this? I could provide more code/context if needed.

levipereira · 2024-03-05T15:04:56Z

The warning in the ONNX export step results in a run-time error when trying to do the inference.
Any idea what could cause this? I could provide more code/context if needed.

Inference with an end-to-end model using YOLOv9 source code is not supported due to lack of implementation.
Instead, utilize the triton-server repository for this purpose.
A triton-client repository will be released, enabling users to perform inference on the triton-server.

berkgungor · 2024-03-05T16:49:25Z

Hi, do we have to reparametrize the finetuned pt file before exporting to onnx format? Because when i perform the reparametrize python code it throws error like "AttributeError: 'DetectionModel' object has no attribute 'nc'" .

laugh12321 · 2024-03-06T00:04:43Z

@berkgungor Hi, You can try TensoRT-YOLO， which also supports exporting onnx with Efficient NMS and does not require reparameterizing the finetuned pt.

mdciri · 2024-03-06T17:37:01Z

@levipereira

I exported my yolov9 model to onnx using your end2end class, but when I try to load it for inference as:

`import onnxruntime as ort

onnx_model = "./best-end2end.onnx"
session = ort.InferenceSession(onnx_model, None)
`

returns me this error:
Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from [./yolov9/runs/train/yolov9/weights/best-end2end.onnx](https://file+.vscode-resource.vscode-cdn.net/home/mdcir/Desktop/key_logo_detection/yolov9/runs/train/yolov9/weights/best-end2end.onnx) failed:Fatal error: TRT:EfficientNMS_TRT(-1) is not a registered function/op

this are my versions:
onnx 1.15.0
onnx-graphsurgeon 0.3.27
onnxruntime-gpu 1.16.1
onnxsim 0.4.35

levipereira · 2024-03-06T19:30:34Z

@mdciri @radandreicristian The primary purpose of employing End2End is to utilize ONNX models on TensorRT. If you choose not to use TensorRT, you should proceed with the standard ONNX export process.

Use case: https://github.com/levipereira/triton-server-yolo/tree/master

gl94 · 2024-03-07T08:03:55Z

Hi, do we have to reparametrize the finetuned pt file before exporting to onnx format? Because when i perform the reparametrize python code it throws error like "AttributeError: 'DetectionModel' object has no attribute 'nc'" .

When you encounter the "AttributeError: 'DetectionModel' object has no attribute 'nc'" error, you can manually change "model.nc" to "your num of classes"（in my case,model.nc=4).And also in "glan-c.yaml" line 4: change "nc: 80" to "nc: your nc",line 79: change
"[[15, 18, 21], 1, DDetect, [nc]]" to "[[15, 18, 21], 1, DDetect, [your nc]]"

PrinceP · 2024-03-10T04:35:57Z

Hi all,
https://github.com/PrinceP/tensorrt-cpp-for-onnx

Here's the dynamic batch version for yolov9 inference in tensorrt in C++ using the @levipereira work for dynamic support.

Any batchsize with any image size is supported here. Reference code for batching data is also present.

berkgungor · 2024-03-12T08:56:01Z

@gl94 i already changed class numbers in yaml, nothing has changed. Same error.

@levipereira i exported the model to onnx just using --include onnx without specifying end2end and then converted to tensorRT engine. Works fine. I also did not reparametrize since it threw error.

mdciri · 2024-03-12T10:49:59Z

@levipereira yes, for your implementation the End2End is to use the ONNX model on TensorRT. Anyway, I would like to convert my model to ONNX (with NMS) that works on ONNXRUNTIME.

At the moment, the export to ONNX does not take into consideration the NMS.

gl94 · 2024-03-26T02:58:21Z

@gl94 i already changed class numbers in yaml, nothing has changed. Same error.

@levipereira i exported the model to onnx just using --include onnx without specifying end2end and then converted to tensorRT engine. Works fine. I also did not reparametrize since it threw error.

You might should also make the same change in the reparametrize python code. From "model.nc = ckpt['model'].nc" to "model.nc = your nc"

levipereira closed this as completed Feb 29, 2024

levipereira reopened this Mar 1, 2024

levipereira mentioned this issue Mar 1, 2024

added EfficientNMS (code) thaitc-hust/yolov9-tensorrt#4

Closed

WongKinYiu closed this as completed Mar 1, 2024

levipereira mentioned this issue Mar 3, 2024

Performance YOLOv7 vs YOLOv9 Series using TensorRT engine #143

Closed

levipereira mentioned this issue Mar 7, 2024

End2End ONNX Export Implementation for YOLOv9 #189

Closed

laugh12321 mentioned this issue Jul 31, 2024

[Help]: Segmentation fault when using yolov10 laugh12321/TensorRT-YOLO#38

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

YOLOv9 with End2End ( Efficient NMS) #130

YOLOv9 with End2End ( Efficient NMS) #130

levipereira commented Feb 29, 2024 •

edited

Loading

WongKinYiu commented Mar 1, 2024

WongKinYiu commented Mar 2, 2024

levipereira commented Mar 3, 2024

laugh12321 commented Mar 3, 2024 •

edited

Loading

radandreicristian commented Mar 5, 2024

levipereira commented Mar 5, 2024

berkgungor commented Mar 5, 2024

laugh12321 commented Mar 6, 2024

mdciri commented Mar 6, 2024

levipereira commented Mar 6, 2024 •

edited

Loading

gl94 commented Mar 7, 2024

PrinceP commented Mar 10, 2024

berkgungor commented Mar 12, 2024

mdciri commented Mar 12, 2024

gl94 commented Mar 26, 2024

YOLOv9 with End2End ( Efficient NMS) #130

YOLOv9 with End2End ( Efficient NMS) #130

Comments

levipereira commented Feb 29, 2024 • edited Loading

WongKinYiu commented Mar 1, 2024

WongKinYiu commented Mar 2, 2024

levipereira commented Mar 3, 2024

laugh12321 commented Mar 3, 2024 • edited Loading

Key Features

Perfomance Test using GPU RTX 2080Ti 2GB on AMD Ryzen 7 5700X 8-Core/ 128GB RAM

Model Export and Performance Testing

YOLOv9 Series

YOLOv8 Series

radandreicristian commented Mar 5, 2024

levipereira commented Mar 5, 2024

berkgungor commented Mar 5, 2024

laugh12321 commented Mar 6, 2024

mdciri commented Mar 6, 2024

levipereira commented Mar 6, 2024 • edited Loading

gl94 commented Mar 7, 2024

PrinceP commented Mar 10, 2024

berkgungor commented Mar 12, 2024

mdciri commented Mar 12, 2024

gl94 commented Mar 26, 2024

levipereira commented Feb 29, 2024 •

edited

Loading

laugh12321 commented Mar 3, 2024 •

edited

Loading

levipereira commented Mar 6, 2024 •

edited

Loading