WHY! v4 detection ETA is 10 times than v3 ??!! #6630

12343954 · 2020-09-09T03:54:45Z

Yolo V4 ， training avg-loss=3.0 , CUDA 10.2 + cuDNN 7.6.5

D:\Darknet-YoloV4\darknet-yolov4\build\darknet\x64>darknet_images.cmd
Flag value false not forcing CPU mode
 Try to load cfg: ./training/11/voc_custom/yolov4_custom.cfg, weights: ./training/11/voc_custom/backup/yolov4-custom_best.weights, clear = 0
 0 : compute_capability = 750, cudnn_half = 0, GPU: GeForce RTX 2060
net.optimized_memory = 0
mini_batch = 1, batch = 64, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF
   1 conv     64       3 x 3/ 2    608 x 608 x  32 ->  304 x 304 x  64 3.407 BF
   2 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF
   3 route  1                                      ->  304 x 304 x  64

Yolo V3 ， training avg-loss=0.2 , CUDA 10.1 + cuDNN 7.6.4

darknet detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights data/dog.jpg
 CUDA-version: 10010 (10010), cuDNN: 7.6.4, CUDNN_HALF=1, GPU count: 1
 OpenCV version: 4.2.0
 compute_capability = 750, cudnn_half = 1
net.optimized_memory = 0
batch = 1, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    416 x 416 x   3 ->  416 x 416 x  32 0.299 BF
   1 conv     64       3 x 3/ 2    416 x 416 x  32 ->  208 x 208 x  64 1.595 BF

WHERE am i WRONG ??? Why my v4 cudnn_half = 0 ??

The text was updated successfully, but these errors were encountered:

12343954 · 2020-09-09T07:59:40Z

put v3 and v4 under the same CUDA 10.1 + cuDNN 7.6.4
same code detect same image on the same computer

ETA v3:v4 = 39ms : 622ms

v3

D:\Tensorflow2\darknet\build\darknet\x64>detect.cmd
 Try to load cfg: training\11\voc_custom\yolov3_custom.cfg, weights: training\11\voc_custom\backup\yolov3_custom_63000.weights, clear = 0
 compute_capability = 750, cudnn_half = 1
net.optimized_memory = 0
batch = 1, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    416 x 416 x   3 ->  416 x 416 x  32 0.299 BF
   1 conv     64       3 x 3/ 2    416 x 416 x  32 ->  208 x 208 x  64 1.595 BF
   2 conv     32       1 x 1/ 1    208 x 208 x  64 ->  208 x 208 x  32 0.177 BF
   3 conv     64       3 x 3/ 1    208 x 208 x  32 ->  208 x 208 x  64 1.595 BF

v4

D:\Darknet-YoloV4\darknet-yolov4\build\darknet\x64>detect.cmd
name 'DARKNET_FORCE_CPU' is not defined
 Try to load cfg: training\11\voc_custom\yolov4_custom.cfg, weights: training\11\voc_custom\backup\yolov4-custom_best.weights, clear = 0
 0 : compute_capability = 750, cudnn_half = 0, GPU: GeForce RTX 2060
net.optimized_memory = 0
mini_batch = 1, batch = 64, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF
   1 conv     64       3 x 3/ 2    608 x 608 x  32 ->  304 x 304 x  64 3.407 BF
   2 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF
   3 route  1

AlexeyAB · 2020-09-09T18:56:39Z

1 Run the same command for both v3 and v4 ./darknet detector test coco.data yolov4.cfg yolov4.weights data/dog.jpg on the same PC

Do you use implementation for rotated-bbox? What implementation do you use?

stephanecharette · 2020-09-09T22:17:25Z

put v3 and v4 under the same CUDA 10.1 + cuDNN 7.6.4
same code detect same image

Is this on the same computer, same darknet directory, etc?

Cause I can confirm that when I run YOLO v3 or v4 detection on the same computer, I get approximately the same results. Around 3-4 milliseconds for most of my neural networks. And then if I rebuild darknet without GPU support, then I get numbers like 300-700 milliseconds since it runs on the CPU.

I suspect that whatever you are doing, your v4 is running on the CPU.

12343954 · 2020-09-10T03:33:25Z

@AlexeyAB @stephanecharette Thank you very much for replying.

1, I tested v4 & v3 on the same PC via python.
2, My rotated-bbox code is running on both v4 and v3 darknet framework under GPU.
but I don't test with one platform two yolo.cfgs. You can see my two different folders.
3. I am sure that the above results are all tested under GPU, but the ETAs are very different.
because when I tested on CPU，the v4 ETA is more than 1000ms！！！

I suspect that cudnn is not turned on under v4. what's means of cudnn_half = 0 ?

I used v4 & v3 with their own cfg, wight.
I trained their own data set separately, but the same name in different folders.
This will ensure that my code changes minimally.

AlexeyAB · 2020-09-11T11:57:11Z

1 Run the same command for both v3 and v4 ./darknet detector test coco.data yolov4.cfg yolov4.weights data/dog.jpg on the same PC, and show screenshot

Do you use implementation for rotated-bbox? What implementation do you use? Show a link.

12343954 · 2020-09-11T13:21:13Z

all test under CUDA 10.1 + cuDNN 7.6.4，same PC

v4，darknet.exe detector test data/coco.data cfg/yolov4.cfg yolov4.weights data/dog.jpg

 CUDA-version: 10010 (11000), cuDNN: 7.6.4, CUDNN_HALF=1, GPU count: 1
 CUDNN_HALF=1
 OpenCV version: 4.2.0
 0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2060
net.optimized_memory = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF
   1 conv     64       3 x 3/ 2    608 x 608 x  32 ->  304 x 304 x  64 3.407 BF
   2 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF

v4 python darknet_images.py --input=data/dog.jpg --config_file=cfg/yolov4.cfg --weights=yolov4.weights --data_file=data/coco.data

name 'DARKNET_FORCE_CPU' is not defined
 Try to load cfg: cfg/yolov4.cfg, weights: yolov4.weights, clear = 0
 0 : compute_capability = 750, cudnn_half = 0, GPU: GeForce RTX 2060
net.optimized_memory = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF
   1 conv     64       3 x 3/ 2    608 x 608 x  32 ->  304 x 304 x  64 3.407 BF
   2 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF

v4 (No GPU) python darknet_images.py --input=data/dog.jpg --config_file=cfg/yolov4.cfg --weights=yolov4.weights --data_file=data/coco.data

Environment variables indicated a CPU run, but we didn't find D:\Darknet-YoloV4\darknet-yolov4\build\darknet\x64\yolo_cpp_dll_nogpu.dll. Trying a GPU run anyway.
 Try to load cfg: cfg/yolov4.cfg, weights: yolov4.weights, clear = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF
   1 conv     64       3 x 3/ 2    608 x 608 x  32 ->  304 x 304 x  64 3.407 BF

v3，darknet.exe detector test data/coco.data cfg/yolov3.cfg yolov3.weights data/dog.jpg

CUDA-version: 10010 (11000), cuDNN: 7.6.5, CUDNN_HALF=1, GPU count: 1
OpenCV version: 4.2.0
compute_capability = 750, cudnn_half = 1
net.optimized_memory = 0
batch = 1, time_steps = 1, train = 0
  layer   filters  size/strd(dil)      input                output
  0 conv     32       3 x 3/ 1    416 x 416 x   3 ->  416 x 416 x  32 0.299 BF
  1 conv     64       3 x 3/ 2    416 x 416 x  32 ->  208 x 208 x  64 1.595 BF

AlexeyAB · 2020-09-11T13:39:36Z

@12343954

So from your screenshots on GeForce RTX 2060:

YOLOv4-608 - 39ms
YOLOv3-416 - 20ms (2x faster)

This fully matches the chart from the article: https://arxiv.org/abs/2004.10934
GPU Tesla V100

YOLOv4-608 - 16ms (62 FPS)
YOLOv3-416 - 8ms (120 FPS) (2x faster)

12343954 · 2020-09-11T13:59:18Z

@AlexeyAB Thank you for reply!

Though my test, I see darknet.exe different performance between v4 and v3.
but i don't understand why the same code running under python are very different. 10 times!!
is this because v4's cudnn_half is unavailable ?
I don't change your code darknet_images.py.

AlexeyAB · 2020-09-11T14:22:50Z

Does https://github.com/AlexeyAB/darknet/blob/master/darknet_images.py support rotated bboxes?

12343954 · 2020-09-11T14:24:22Z

😄
sorry, forgot answer your 2nd question. I implement the rotated-bbox via openCV myself, I combined some technical information on the Internet.

robotaiguy · 2020-10-14T16:07:58Z

Were you, by chance, using the default 416x416 network size for Yolo v3, and now using the default 608x608 network size for Yolo v4? That should be enough to make a notable difference in side-by-side comparisons.

12343954 mentioned this issue Sep 9, 2020

How to improve fps for YOLO v4-tiny and YOLO v4 #6366

Open

12343954 mentioned this issue Nov 11, 2020

Yolov4? AlturosDestinations/Alturos.Yolo#133

Open

hujunchao mentioned this issue Nov 12, 2020

Why yolov4 slower than yolov3 in the same environment? #6834

Open

EmileTestUser mentioned this issue Dec 7, 2020

v3 vs v4 speed comparison #7086

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WHY! v4 detection ETA is 10 times than v3 ??!! #6630

WHY! v4 detection ETA is 10 times than v3 ??!! #6630

12343954 commented Sep 9, 2020 •

edited

Loading

12343954 commented Sep 9, 2020 •

edited

Loading

AlexeyAB commented Sep 9, 2020

stephanecharette commented Sep 9, 2020

12343954 commented Sep 10, 2020 •

edited

Loading

AlexeyAB commented Sep 11, 2020 •

edited

Loading

12343954 commented Sep 11, 2020 •

edited

Loading

AlexeyAB commented Sep 11, 2020

12343954 commented Sep 11, 2020

AlexeyAB commented Sep 11, 2020

12343954 commented Sep 11, 2020 •

edited

Loading

robotaiguy commented Oct 14, 2020

WHY! v4 detection ETA is 10 times than v3 ??!! #6630

WHY! v4 detection ETA is 10 times than v3 ??!! #6630

Comments

12343954 commented Sep 9, 2020 • edited Loading

Yolo V4 ， training avg-loss=3.0 , CUDA 10.2 + cuDNN 7.6.5

Yolo V3 ， training avg-loss=0.2 , CUDA 10.1 + cuDNN 7.6.4

12343954 commented Sep 9, 2020 • edited Loading

AlexeyAB commented Sep 9, 2020

stephanecharette commented Sep 9, 2020

12343954 commented Sep 10, 2020 • edited Loading

AlexeyAB commented Sep 11, 2020 • edited Loading

12343954 commented Sep 11, 2020 • edited Loading

all test under CUDA 10.1 + cuDNN 7.6.4，same PC

v4，darknet.exe detector test data/coco.data cfg/yolov4.cfg yolov4.weights data/dog.jpg

v4 python darknet_images.py --input=data/dog.jpg --config_file=cfg/yolov4.cfg --weights=yolov4.weights --data_file=data/coco.data

v4 (No GPU) python darknet_images.py --input=data/dog.jpg --config_file=cfg/yolov4.cfg --weights=yolov4.weights --data_file=data/coco.data

v3，darknet.exe detector test data/coco.data cfg/yolov3.cfg yolov3.weights data/dog.jpg

AlexeyAB commented Sep 11, 2020

12343954 commented Sep 11, 2020

AlexeyAB commented Sep 11, 2020

12343954 commented Sep 11, 2020 • edited Loading

robotaiguy commented Oct 14, 2020

12343954 commented Sep 9, 2020 •

edited

Loading

12343954 commented Sep 9, 2020 •

edited

Loading

12343954 commented Sep 10, 2020 •

edited

Loading

AlexeyAB commented Sep 11, 2020 •

edited

Loading

12343954 commented Sep 11, 2020 •

edited

Loading

12343954 commented Sep 11, 2020 •

edited

Loading