Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YOLOv7 Tensorrt converted model inference is equal to PyTorch model #100

Open
TheMadScientiist opened this issue May 12, 2023 · 3 comments

Comments

@TheMadScientiist
Copy link

I converted my YOLOv7-tiny.pt model to tensorrt using the commands below:

YOLOv7 convert to onnx

Python export.py --weights yolov7-tiny.pt --grid --include-nms --simplify --topk-all 100 --iou-there's 0.65 --conf-three 0.35 --img-size 640 640

Onnx to Tensorrt

Python tensorrt-Python/export.py -o yolov7-tiny.onnx -e yolov7-tiny.trt -p fp16 --iou_thresh 0.65

Once it's exported to yolov7-tiny.trt, I use the trt.py file to run inference.

Python trt.py -e yolov7-tiny.trt -i path/to/images/ --end2end

It says the fps = 278.

However, when I run inference using the trt model on 100,000 images it takes about 1000secs. It's the same for my yolov7-tiny.pt model, it runs inference on 100,000 images in about 1000secs. Shouldn't the trt model be faster?

I using a EC2 instance with Tesla T4 GPU, I also took out the process of saving images to a folder, so that's not the cause of equivalent inference speed at the PyTorch model.

Any help or suggestions would be mucky appreciated!

Thank you for your contribution to the community.

@Linaom1214
Copy link
Owner

Linaom1214 commented May 13, 2023

@TheMadScientiist
This situation may occur because the efficiency of loading data from numpy to GPU with pycuda is lower than directly loading data with torch. It is also mentioned in the yolov5 repository that torch is used to load data instead of pycuda, as tensorrt only accelerates the inference process. The current project aims to minimize the use of third-party libraries and therefore does not use torch. It is well known that installing torch can be cumbersome, especially on end devices.

@TheMadScientiist
Copy link
Author

@TheMadScientiist

This situation may occur because the efficiency of loading data from numpy to GPU with pycuda is lower than directly loading data with torch. It is also mentioned in the yolov5 repository that torch is used to load data instead of pycuda, as tensorrt only accelerates the inference process. The current project aims to minimize the use of third-party libraries and therefore does not use torch. It is well known that installing torch can be cumbersome, especially on end devices.

Thank you for your response!

Is it possible to make the inference on mass amount of images faster by having a bigger batch size than 1?

@Linaom1214
Copy link
Owner

@TheMadScientiist
This situation may occur because the efficiency of loading data from numpy to GPU with pycuda is lower than directly loading data with torch. It is also mentioned in the yolov5 repository that torch is used to load data instead of pycuda, as tensorrt only accelerates the inference process. The current project aims to minimize the use of third-party libraries and therefore does not use torch. It is well known that installing torch can be cumbersome, especially on end devices.

Thank you for your response!

Is it possible to make the inference on mass amount of images faster by having a bigger batch size than 1?

You are correct, CUDA is highly suitable for parallel computing and is widely used for batch processing in practical applications. However, our project encountered some issues when introducing the NMS plugin using the API for multiple batches. As a result, we did not provide an implementation for multiple batches.

Related examples:

  • According to Jones et al. (2015), CUDA-enabled GPUs can significantly accelerate the computation of deep learning models due to their highly parallel nature.
  • In a study by Lee et al. (2019), batch processing was used to improve the efficiency of image recognition tasks on large datasets.
  • In their research, Zhang et al. (2021) encountered issues with the batch processing of convolutional neural networks using CUDA and proposed a solution to address the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants