Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WHY! v4 detection ETA is 10 times than v3 ??!! #6630

Open
12343954 opened this issue Sep 9, 2020 · 11 comments
Open

WHY! v4 detection ETA is 10 times than v3 ??!! #6630

12343954 opened this issue Sep 9, 2020 · 11 comments

Comments

@12343954
Copy link

12343954 commented Sep 9, 2020

Yolo V4 , training avg-loss=3.0 , CUDA 10.2 + cuDNN 7.6.5

D:\Darknet-YoloV4\darknet-yolov4\build\darknet\x64>darknet_images.cmd
Flag value false not forcing CPU mode
 Try to load cfg: ./training/11/voc_custom/yolov4_custom.cfg, weights: ./training/11/voc_custom/backup/yolov4-custom_best.weights, clear = 0
 0 : compute_capability = 750, cudnn_half = 0, GPU: GeForce RTX 2060
net.optimized_memory = 0
mini_batch = 1, batch = 64, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF
   1 conv     64       3 x 3/ 2    608 x 608 x  32 ->  304 x 304 x  64 3.407 BF
   2 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF
   3 route  1                                      ->  304 x 304 x  64

Yolo V3 , training avg-loss=0.2 , CUDA 10.1 + cuDNN 7.6.4

darknet detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights data/dog.jpg
 CUDA-version: 10010 (10010), cuDNN: 7.6.4, CUDNN_HALF=1, GPU count: 1
 OpenCV version: 4.2.0
 compute_capability = 750, cudnn_half = 1
net.optimized_memory = 0
batch = 1, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    416 x 416 x   3 ->  416 x 416 x  32 0.299 BF
   1 conv     64       3 x 3/ 2    416 x 416 x  32 ->  208 x 208 x  64 1.595 BF

image

image

image

WHERE am i WRONG ??? Why my v4 cudnn_half = 0 ??

@12343954
Copy link
Author

12343954 commented Sep 9, 2020

put v3 and v4 under the same CUDA 10.1 + cuDNN 7.6.4
same code detect same image on the same computer

ETA v3:v4 = 39ms : 622ms

image

v3

D:\Tensorflow2\darknet\build\darknet\x64>detect.cmd
 Try to load cfg: training\11\voc_custom\yolov3_custom.cfg, weights: training\11\voc_custom\backup\yolov3_custom_63000.weights, clear = 0
 compute_capability = 750, cudnn_half = 1
net.optimized_memory = 0
batch = 1, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    416 x 416 x   3 ->  416 x 416 x  32 0.299 BF
   1 conv     64       3 x 3/ 2    416 x 416 x  32 ->  208 x 208 x  64 1.595 BF
   2 conv     32       1 x 1/ 1    208 x 208 x  64 ->  208 x 208 x  32 0.177 BF
   3 conv     64       3 x 3/ 1    208 x 208 x  32 ->  208 x 208 x  64 1.595 BF

v4

D:\Darknet-YoloV4\darknet-yolov4\build\darknet\x64>detect.cmd
name 'DARKNET_FORCE_CPU' is not defined
 Try to load cfg: training\11\voc_custom\yolov4_custom.cfg, weights: training\11\voc_custom\backup\yolov4-custom_best.weights, clear = 0
 0 : compute_capability = 750, cudnn_half = 0, GPU: GeForce RTX 2060
net.optimized_memory = 0
mini_batch = 1, batch = 64, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF
   1 conv     64       3 x 3/ 2    608 x 608 x  32 ->  304 x 304 x  64 3.407 BF
   2 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF
   3 route  1

@AlexeyAB
Copy link
Owner

AlexeyAB commented Sep 9, 2020

1 Run the same command for both v3 and v4 ./darknet detector test coco.data yolov4.cfg yolov4.weights data/dog.jpg on the same PC

  1. Do you use implementation for rotated-bbox? What implementation do you use?

@stephanecharette
Copy link
Collaborator

put v3 and v4 under the same CUDA 10.1 + cuDNN 7.6.4
same code detect same image

Is this on the same computer, same darknet directory, etc?

Cause I can confirm that when I run YOLO v3 or v4 detection on the same computer, I get approximately the same results. Around 3-4 milliseconds for most of my neural networks. And then if I rebuild darknet without GPU support, then I get numbers like 300-700 milliseconds since it runs on the CPU.

I suspect that whatever you are doing, your v4 is running on the CPU.

@12343954
Copy link
Author

12343954 commented Sep 10, 2020

@AlexeyAB @stephanecharette Thank you very much for replying.

1, I tested v4 & v3 on the same PC via python.
2, My rotated-bbox code is running on both v4 and v3 darknet framework under GPU.
but I don't test with one platform two yolo.cfgs. You can see my two different folders.
3. I am sure that the above results are all tested under GPU, but the ETAs are very different.
because when I tested on CPU,the v4 ETA is more than 1000ms!!!

I suspect that cudnn is not turned on under v4. what's means of cudnn_half = 0 ?

I used v4 & v3 with their own cfg, wight.
I trained their own data set separately, but the same name in different folders.
This will ensure that my code changes minimally.

@AlexeyAB
Copy link
Owner

AlexeyAB commented Sep 11, 2020

1 Run the same command for both v3 and v4 ./darknet detector test coco.data yolov4.cfg yolov4.weights data/dog.jpg on the same PC, and show screenshot

  1. Do you use implementation for rotated-bbox? What implementation do you use? Show a link.

@12343954
Copy link
Author

12343954 commented Sep 11, 2020

all test under CUDA 10.1 + cuDNN 7.6.4,same PC

v4,darknet.exe detector test data/coco.data cfg/yolov4.cfg yolov4.weights data/dog.jpg

 CUDA-version: 10010 (11000), cuDNN: 7.6.4, CUDNN_HALF=1, GPU count: 1
 CUDNN_HALF=1
 OpenCV version: 4.2.0
 0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2060
net.optimized_memory = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF
   1 conv     64       3 x 3/ 2    608 x 608 x  32 ->  304 x 304 x  64 3.407 BF
   2 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF

1111

v4 python darknet_images.py --input=data/dog.jpg --config_file=cfg/yolov4.cfg --weights=yolov4.weights --data_file=data/coco.data

name 'DARKNET_FORCE_CPU' is not defined
 Try to load cfg: cfg/yolov4.cfg, weights: yolov4.weights, clear = 0
 0 : compute_capability = 750, cudnn_half = 0, GPU: GeForce RTX 2060
net.optimized_memory = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF
   1 conv     64       3 x 3/ 2    608 x 608 x  32 ->  304 x 304 x  64 3.407 BF
   2 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF

1111-1

v4 (No GPU) python darknet_images.py --input=data/dog.jpg --config_file=cfg/yolov4.cfg --weights=yolov4.weights --data_file=data/coco.data

Environment variables indicated a CPU run, but we didn't find D:\Darknet-YoloV4\darknet-yolov4\build\darknet\x64\yolo_cpp_dll_nogpu.dll. Trying a GPU run anyway.
 Try to load cfg: cfg/yolov4.cfg, weights: yolov4.weights, clear = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF
   1 conv     64       3 x 3/ 2    608 x 608 x  32 ->  304 x 304 x  64 3.407 BF

1111-2

v3,darknet.exe detector test data/coco.data cfg/yolov3.cfg yolov3.weights data/dog.jpg

CUDA-version: 10010 (11000), cuDNN: 7.6.5, CUDNN_HALF=1, GPU count: 1
OpenCV version: 4.2.0
compute_capability = 750, cudnn_half = 1
net.optimized_memory = 0
batch = 1, time_steps = 1, train = 0
  layer   filters  size/strd(dil)      input                output
  0 conv     32       3 x 3/ 1    416 x 416 x   3 ->  416 x 416 x  32 0.299 BF
  1 conv     64       3 x 3/ 2    416 x 416 x  32 ->  208 x 208 x  64 1.595 BF

2222

@AlexeyAB
Copy link
Owner

@12343954

So from your screenshots on GeForce RTX 2060:

  • YOLOv4-608 - 39ms
  • YOLOv3-416 - 20ms (2x faster)

This fully matches the chart from the article: https://arxiv.org/abs/2004.10934
GPU Tesla V100

  • YOLOv4-608 - 16ms (62 FPS)
  • YOLOv3-416 - 8ms (120 FPS) (2x faster)

ap_resol

@12343954
Copy link
Author

@AlexeyAB Thank you for reply!

Though my test, I see darknet.exe different performance between v4 and v3.
but i don't understand why the same code running under python are very different. 10 times!!
is this because v4's cudnn_half is unavailable ?
I don't change your code darknet_images.py.

@AlexeyAB
Copy link
Owner

Does https://github.com/AlexeyAB/darknet/blob/master/darknet_images.py support rotated bboxes?

@12343954
Copy link
Author

12343954 commented Sep 11, 2020

😄
sorry, forgot answer your 2nd question. I implement the rotated-bbox via openCV myself, I combined some technical information on the Internet.

@robotaiguy
Copy link

Were you, by chance, using the default 416x416 network size for Yolo v3, and now using the default 608x608 network size for Yolo v4? That should be enough to make a notable difference in side-by-side comparisons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants