Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Yolov5/v5lite/v6/v7/v7end2end: CUDA preprocessing #370

Merged
merged 24 commits into from
Oct 19, 2022

Conversation

wang-xinyu
Copy link
Collaborator

@wang-xinyu wang-xinyu commented Oct 14, 2022

PR types

Performance optimization

PR changes

Others - preprocessing

Describe

  • Add a YOLO CUDA preprocessing util
  • Yolov5/v5lite/v6/v7/v7end2end: integrate CUDA preprocessing, test and compare latency
  • cmake changes to support CUDA source files compile

@CLAassistant
Copy link

CLAassistant commented Oct 14, 2022

CLA assistant check
All committers have signed the CLA.

@wang-xinyu wang-xinyu changed the title Yolo cuda preprocessing util and yolov5 cuda preprocessing [Model]Yolo cuda preprocessing Oct 18, 2022
@wang-xinyu
Copy link
Collaborator Author

Latency includes preprocessing, inference and postprocessing, in milliseconds.
Tested on P40, TensorRT8.4.

Model Latency(CPU preprocessing) Latency(CUDA preprocessing) Optimization
yolov5s 41 28 31.7% $\downarrow$
yolov5lite 40 22 45% $\downarrow$
yolov6s 25 11 56% $\downarrow$
yolov7 47 32 31.9% $\downarrow$
yolov7_e2e 27 16 40.7% $\downarrow$

@wang-xinyu wang-xinyu changed the title [Model]Yolo cuda preprocessing [Model]Yolov5/v5lite/v6/v7/v7end2end: CUDA preprocessing Oct 18, 2022
CMakeLists.txt Outdated Show resolved Hide resolved
fastdeploy/vision/detection/contrib/yolov5.cc Outdated Show resolved Hide resolved
@wang-xinyu
Copy link
Collaborator Author

wang-xinyu commented Oct 19, 2022

This CUDA preprocessing for YOLO is using warp affine method to do resizing, which is slightly different from cv::resize().
Hence the mAP is slightly different.
Below mAP(IoU=0.50:0.95 | area=all) results were tested on coco_val_2017, 5000 images, with TensorRT model.

Model mAP(CPU preprocessing) mAP(CUDA preprocessing)
yolov5s 0.372 0.368
yolov6s 0.424 0.418
yolov7 0.514 0.498

Copy link
Collaborator

@jiangjiajun jiangjiajun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows的兼容下个PR再补充上

@jiangjiajun jiangjiajun changed the title [Model]Yolov5/v5lite/v6/v7/v7end2end: CUDA preprocessing [Model] Yolov5/v5lite/v6/v7/v7end2end: CUDA preprocessing Oct 19, 2022
@jiangjiajun jiangjiajun merged commit c8d6c82 into PaddlePaddle:develop Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants