New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not able to get 30+ fps processing speed on Nvidia RTX 2080 GPU #35
Comments
Just wanted to report in that running the benchmark on the COCO dataset as per your instructions gets me much closer to the reported numbers. Now I wonder what the difference is between the --benchmark code and the actual per-image instance segmentation code. With Resnet-101 python3 eval.py --trained_model=weights/yolact_base_54_800000.pth --benchmark --max_images=1000 Config not specified. Parsed yolact_base_config from the file name. loading annotations into memory... Processing Images ██████████████████████████████ 1000 / 1000 (100.00%) 29.87 fps Stats for the last frame:
----------------+------------ Average: 29.87 fps, 33.48 ms With Resnet-50 python3 eval.py --trained_model=weights/yolact_resnet50_54_800000.pth --benchmark --max_images=1000 loading annotations into memory... Processing Images ██████████████████████████████ 1000 / 1000 (100.00%) 40.06 fps Stats for the last frame:
----------------+------------ Average: 40.06 fps, 24.96 ms With Darknet-53 python3 eval.py --trained_model=weights/yolact_darknet53_54_800000.pth --benchmark --max_images=1000 loading annotations into memory... Processing Images ██████████████████████████████ 1000 / 1000 (100.00%) 34.68 fps Stats for the last frame:
----------------+------------ Average: 34.68 fps, 28.84 ms |
The FPS we report comes from the command, Like other papers, our timing only reports the speed of the model itself. That is, timing starts when the image is finished loading and stops when the network outputs masks. Note that this timing does not include 1.) loading the image, 2.) rendering the mask onto the image, or 3.) displaying the image, all of which are included in Right now, that step 2 is particularly limiting for us, and it's the bottleneck that you see giving you that lower than reported fps. I'm working on fixing this so that we can run the full model from loading to displaying all at 30 fps (see #17), but that's difficult to do with Python (thanks to the GIL) and without direct access to the graphics card (w/o CUDA or a graphics library like OpenGL or Vulkan). A large amount of time right now is spent rendering the image on the GPU, copying the image to the CPU to draw boxes and text, and then passing the CPU image to OpenCV which just copies it back to the GPU internally. A real production-ready version of this would likely have to be in native C++ using a CUDA matrix as a texture in Vulkan or OpenGL to render directly to the screen, but I'd like to keep the project in native Pytorch for as long as possible (so that everyone can easily start using it / add to it). Good news is though that I have updated rendering code in the works, and I think I'll be able to get close to that sweet sweet 30 fps with that. It should be out soon, so I'll keep you posted. |
@dbolya , thanks for the explanation and taking the time to respond. Do you know how much time does step 1 (i.e. loading the image add to the whole equation) take? Also, excited to hear about the updated rendering code. |
I'm actually really glad you asked that! When I timed it, that step took a whopping 19 ms, which didn't seem right at all. I then narrowed it down to this line Turns out most of that was coming from the So, I once again rearranged that to get So on the current master branch, step 1 takes 19 ms, but now it's down to 4. I'll push this along with my new rendering code and other speed improvements probably later today. Note though that |
Pushed the patch. Pull the latest commit and run
to test the new video speeds. Also let the video play for a little bit before reporting the FPS because it goes up over time in my experience (seems like the first couple frames take longer than the rest even after initialization). |
@dbolya, thanks a lot for the patch. With the command you sent, my early tests show 22-23fps with videos (when displaying the output), and 15-16fps when writing to an output video. Definitely an improvement. My GPU is maxed out so likely need more GPU cores with this implementation. For
|
For That 45.87 comes from timing the following 3 lines:
Note that I also included a I guess the bottleneck on my server is disk operations, but those should be done in a separate thread anyway. I haven't bothered to multithread Also, when you're timing make sure to discard the first ~2 frametimes because Pytorch initializes things on the first or second pass through the network, so the first call for instance can take up to 4 seconds. You can run |
@dbolya, thank you. For Also, there is only one |
Yeah, I mean I omit it from the calculations. It looks like this is the performance breakdown:
|
@dbolya, thank you. |
hi @dbolya
the the |
@zimenglan-sysu-512 Pytorch uses the first image passed through the network to set itself up, meaning that the first iteration will take much longer than the rest. So the first image you evaluate will be slow (still has some setting up to do), but every image after that will be fast. You need to evaluate multiple images (perhaps with To get the numbers in the paper, download COCO and run |
thanks @dbolya |
Hey @dbolya, |
@Rm1n90 Idk, I haven't tested it myself. It'll probably be slightly faster, but not that much (maybe 10%?) |
thanks for your amazing work! When I run eval.py using coco2017 dataset by Titan V, the following results appear: loading annotations into memory... Processing Images ██████████████████████████████ 400 / 400 (100.00%) 19.93 fps
-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+ Process finished with exit code 0 python3 eval.py --trained_model=weights/yolact_plus_resnet50_54_800000.pth --score_threshold=0.15 --top_k=15 --max_images=400 First,maybe it's stupid but I really don't understand the meaning of the parameter top_k. Could you explain to me? |
Hello, first of all, @dbolya thank you for sharing this work. I have a question. |
Hello, first off, thank you for sharing this amazing work. Much appreciated.
I wanted to report in that I also could not get 30+fps on an Nvidia RTX 2080 GPU with 8GB RAM. I am getting 8-10fps with video and with images, I get ~16fps (0.06sec/image) with the Resnet-101 model, ~20fps (0.05sec/image) with the Resnet-50 model and 17-18fps (0.055sec/image) with the Darket53 model. This is quite impressive but its roughly 1/2 of what is reported in the paper. For images, I used the python timeit module to wrap the evalimage function to report my numbers. Also, it is weird that the difference in speed between the different models is not significant (especially between Resnet-101 and Resnet-50), which indicates to me that something is reducing the processing speed by ~1/2 for all the models.
The command I am using is as below (except I change the model name as needed):
python3 eval.py --trained_model=weights/yolact_resnet50_54_800000.pth --score_threshold=0.4 --top_k=100 --images=./test_images:./test_output_images
I also tried using --benchmark but there is no change in the numbers above.
I was wondering if I could get some help to figure this out.
The text was updated successfully, but these errors were encountered: