Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YOLOv4-tiny released: 40.2% AP50, 371 FPS (GTX 1080 Ti), 1770 FPS tkDNN/TensorRT #6067

Open
AlexeyAB opened this issue Jun 25, 2020 · 132 comments
Open

Comments

@AlexeyAB
Copy link
Owner

@AlexeyAB AlexeyAB commented Jun 25, 2020

Discussion: https://www.reddit.com/r/MachineLearning/comments/hu7lyt/p_yolov4tiny_speed_1770_fps_tensorrtbatch4/

Full structure: structure of yolov4-tiny.cfg model

YOLOv4-tiny released: 40.2% AP50, 371 FPS (GTX 1080 Ti) / 330 FPS (RTX 2070)

  • 1770 FPS - on GPU RTX 2080Ti - (416x416, fp16, batch=4) tkDNN/TensorRT ceccocats/tkDNN#59 (comment)

  • 1353 FPS - on GPU RTX 2080Ti - (416x416, fp16, batch=4) OpenCV 4.4.0 (including: transfering CPU->GPU and GPU->CPU) (excluding: nms, pre/post-processing) #6067 (comment)

  • 39 FPS - 25ms latency - on Jetson Nano - (416x416, fp16, batch=1) tkDNN/TensorRT ceccocats/tkDNN#59 (comment)

  • 290 FPS - 3.5ms latency - on Jetson AGX - (416x416, fp16, batch=1) tkDNN/TensorRT ceccocats/tkDNN#59 (comment)

  • 42 FPS - on CPU Core i7 7700HQ (4 Cores / 8 Logical Cores) - (416x416, fp16, batch=1) OpenCV 4.4.0 (compiled with OpenVINO backend) #6067 (comment)

  • 20 FPS on CPU ARM Kirin 990 - Smartphone Huawei P40 #6091 (comment) - Tencent/NCNN library https://github.com/Tencent/ncnn

  • 120 FPS on nVidia Jetson AGX Xavier - MAX_N - Darknet framework

  • 371 FPS on GPU GTX 1080 Ti - Darknet framework


cmp


source yolov3-tiny (800x800) yolov4-tiny (800x800)
src predictions predictions
@laclouis5
Copy link

@laclouis5 laclouis5 commented Jun 25, 2020

@AlexeyAB Really great work!

For training, same partial weights as old tiny Yolo i.e. the first 15 layers yolov4-tiny.conv.15?

@AlexeyAB
Copy link
Owner Author

@AlexeyAB AlexeyAB commented Jun 25, 2020

@laclouis5

Use this pre-trained file for trainint yolov4-tiny.cfg: https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.conv.29

How to train yolov4-tiny.cfg: https://github.com/AlexeyAB/darknet#how-to-train-tiny-yolo-to-detect-your-custom-objects

@muhammad-maaz-confiz
Copy link

@muhammad-maaz-confiz muhammad-maaz-confiz commented Jun 25, 2020

Hi @alexeab,

Does OpenCV DNN module supports YoloV4-tiny? Thanks

@AlexeyAB
Copy link
Owner Author

@AlexeyAB AlexeyAB commented Jun 25, 2020

We are waiting for the implementation of the YOLOv4-tiny in libraries:

@ark-
Copy link

@ark- ark- commented Jun 25, 2020

Hi @alexeab,

Does OpenCV DNN module supports YoloV4-tiny? Thanks

OpenCV implemented it in their master branch in 6 days for Yolov4. This looks like a more trivial change required so here's hoping it will be live in a few days.

@LukeAI
Copy link

@LukeAI LukeAI commented Jun 25, 2020

@AlexeyAB v. exciting. are you planning to release a paper on it? Would love to read some details about how it works. Is it a novel backbone or one of the existing CSPs? Do you have any numbers on the performance of the backbone as a classifier?

@AlexeyAB
Copy link
Owner Author

@AlexeyAB AlexeyAB commented Jun 25, 2020

@LukeAI

@Bao150297
Copy link

@Bao150297 Bao150297 commented Jun 26, 2020

@AlexeyAB Can you explain why the last yolo layer uses masks starting from 1, not 0?

@CSTEZCAN
Copy link

@CSTEZCAN CSTEZCAN commented Jun 26, 2020

confirmed. performance is WOW.
https://youtu.be/TWteusBINIw
offline test (without connecting to the stream)
Screenshot from 2020-06-26 14-57-03

@AlexeyAB
Copy link
Owner Author

@AlexeyAB AlexeyAB commented Jun 26, 2020

@CSTEZCAN Hi,
What FPS can you get by using such command without mjpeg_port?
darknet.exe detector demo cfg/coco.data cfg/yolov4-tiny.cfg yolov4-tiny.weights -i 0 -thresh 0.25 -ext_output test.mp4 -dont_show


On GPU RTX 2070, CPU Core i7 6700K

  • I get 230 FPS by using command:
    darknet.exe detector demo cfg/coco.data cfg/yolov4-tiny.cfg yolov4-tiny.weights -ext_output test.mp4 -dont_show
    in both cases (1) NMS is commented and (2) NMS isn't commented

  • I get 330 FPS by using command - it doesn't read videofile:
    darknet.exe detector demo cfg/coco.data cfg/yolov4-tiny.cfg yolov4-tiny.weights -ext_output test.mp4 -dont_show -benchmark
    So the bottleneck is OpenCV: (1) cv::VideoCapture Video Capturing from file/camera and (2) cv::imshow / cv::wait_key Showing video on the screen / sending by TCP/IP -mjpeg_port 8090 flag

@AlexeyAB
Copy link
Owner Author

@AlexeyAB AlexeyAB commented Jun 26, 2020

@DoriHp Just to compare with Yolov3-tiny where were used the same masks, it seems tiny models don't detect well small objects anyway.

@Bao150297
Copy link

@Bao150297 Bao150297 commented Jun 26, 2020

I saw yolov3-tiny_3l.cfg with 3 yolo layers. So due to what you said, the last yolo layers has no use?

@AlexeyAB
Copy link
Owner Author

@AlexeyAB AlexeyAB commented Jun 26, 2020

To detect small objects you must also use 3-yolo-layers in yolov4-tiny.

@CSTEZCAN
Copy link

@CSTEZCAN CSTEZCAN commented Jun 26, 2020

@AlexeyAB Hello Alexey! this is your "Piano Concerto No. 2 Op. 18". I have infinite respect for your work.

My results as follows;

nvidia-smi -pl 125 watts titan x pascal
Screenshot from 2020-06-26 20-15-18

nvidia-smi -pl 250 watts titan x pascal
Screenshot from 2020-06-26 20-16-03

Will be running tests on Jetson Nano, TX2 and Xavier later..

@AlexeyAB
Copy link
Owner Author

@AlexeyAB AlexeyAB commented Jun 26, 2020

@CSTEZCAN Thanks! I think yolov4-tiny can work with 500 - 1000 FPS by using OpenCV or tkDNN/TensorRT when it will be implemented in these libraries.

@CSTEZCAN
Copy link

@CSTEZCAN CSTEZCAN commented Jun 26, 2020

@AlexeyAB I have no doubt. The only weird thing I noticed is, it uses CPU relatively more during training compared to YOLOv4. The recommended system must start from Ryzen 3500 and above for an optimal performance (if you are creating such recommended setup list) :)

@KyryloAntoshyn
Copy link

@KyryloAntoshyn KyryloAntoshyn commented Aug 26, 2020

Hi,

  1. Yes, yolov4-tiny is a Top-1 lightweight object detector in terms Speed & Accuracy.
  2. You can do transfer-learning by using Darknet, and then use these cfg/weights files in tkDNN or OpenCV to run inference on Jetson Nano with ~40 FPS

Thank you so much for your time and help!

@forestguan
Copy link

@forestguan forestguan commented Sep 9, 2020

Hi, @AlexeyAB , thanks for your great job.
I want to convert yolov4-tiny.weights to caffemodel. But there are some errors.
$ python darknet2caffe.py cfg/yolov4-tiny.cfg weights/yolov4-tiny.weights prototxt/yolov4-tiny.prototxt caffemodel/yolov4-tiny.caffemodel

$Traceback (most recent call last):
File "darknet2caffe.py", line 521, in
darknet2caffe(cfgfile, weightfile, protofile, caffemodel)
File "darknet2caffe.py", line 63, in darknet2caffe
start = load_conv_bn2caffe(buf, start, params[conv_layer_name], params[bn_layer_name], params[scale_layer_name])
File "darknet2caffe.py", line 152, in load_conv_bn2caffe
conv_param[0].data[...] = np.reshape(buf[start:start+conv_weight.size], conv_weight.shape); start = start + conv_weight.size
File "<array_function internals>", line 6, in reshape
File "/home/ling/.local/lib/python3.5/site-packages/numpy/core/fromnumeric.py", line 301, in reshape
return _wrapfunc(a, 'reshape', newshape, order=order)
File "/home/ling/.local/lib/python3.5/site-packages/numpy/core/fromnumeric.py", line 61, in _wrapfunc
return bound(*args, **kwds)
ValueError: cannot reshape array of size 756735 into shape (256,384,3,3)

When I change "buf = np.fromfile(fp, dtype = np.float32)" to "buf = np.fromfile(fp, dtype = np.float16)" in darknet2caffe.py, this error is disappear. Is it correct?
Sincerely.

@richardgohth
Copy link

@richardgohth richardgohth commented Sep 10, 2020

Tested on Xavier NX, with 720p video, fps was around 5 to 8.
Width and height in cfg file changed to 320.
Is this normal?
How to make it go faster?
thanks

@ajaykumaar
Copy link

@ajaykumaar ajaykumaar commented Sep 13, 2020

Hi, firstly, thanks for the wonderful implementation!
Can I use the weights and cfg of YOLOV4-tiny for any Darknet model (i.e. model for full-size YOLOV3)?
If not can you suggest the best model to chuck the weights in?

Thanks

@LukeAI
Copy link

@LukeAI LukeAI commented Sep 13, 2020

Hi, firstly, thanks for the wonderful implementation!
Can I use the weights and cfg of YOLOV4-tiny for any Darknet model (i.e. model for full-size YOLOV3)?
If not can you suggest the best model to chuck the weights in?

Thanks

You cannot. What is it that you want to do?

@ajaykumaar
Copy link

@ajaykumaar ajaykumaar commented Sep 13, 2020

Hi, firstly, thanks for the wonderful implementation!
Can I use the weights and cfg of YOLOV4-tiny for any Darknet model (i.e. model for full-size YOLOV3)?
If not can you suggest the best model to chuck the weights in?
Thanks

You cannot. What is it that you want to do?

Hi, thanks for the quick reply.
I want to build an object detector with YOLOV4-tiny and I used the weights and config file for the same from this repo.
To load the weights into a model, I used the darknet which was built for YOLOV3 from another repo and I got the input size mismatch error. So which model should I use to load the the weights from this repo?

@LukeAI
Copy link

@LukeAI LukeAI commented Sep 13, 2020

Hi, firstly, thanks for the wonderful implementation!
Can I use the weights and cfg of YOLOV4-tiny for any Darknet model (i.e. model for full-size YOLOV3)?
If not can you suggest the best model to chuck the weights in?
Thanks

You cannot. What is it that you want to do?

Hi, thanks for the quick reply.
I want to build an object detector with YOLOV4-tiny and I used the weights and config file for the same from this repo.
To load the weights into a model, I used the darknet which was built for YOLOV3 from another repo and I got the input size mismatch error. So which model should I use to load the the weights from this repo?

what do you mean? use the weights alexeyab has provided.

@ajaykumaar
Copy link

@ajaykumaar ajaykumaar commented Sep 13, 2020

The issue is that I used alexeyab weights and cfg files like below...
config_path='config/yolov4-tiny.cfg'
weights_path='yolov4-tiny.weights'
model=Darknet(config_path,img_size=416)
model.load_weights(weights_path)

But I get the runtime error: shape [256,384,3,3] is invalid input of size 756735.
and the Darknet() was from this repo

How do I resolve this?

@LukeAI
Copy link

@LukeAI LukeAI commented Sep 14, 2020

The issue is that I used alexeyab weights and cfg files like below...
config_path='config/yolov4-tiny.cfg'
weights_path='yolov4-tiny.weights'
model=Darknet(config_path,img_size=416)
model.load_weights(weights_path)

But I get the runtime error: shape [256,384,3,3] is invalid input of size 756735.
and the Darknet() was from this repo

How do I resolve this?

I don't know, why don't you ask on that repo? I don't think you'll get an answer about a different repo here.

@marvision-ai
Copy link

@marvision-ai marvision-ai commented Sep 15, 2020

I have a question a few questions that I wrote in this issue: #6548 but had not resolution. Please see below:

I am in the process of detecting 4 types small objects. I have been going through all the extra steps to increase performance.

I calculated these custom achors: anchors = 9, 11, 17, 17, 15, 65, 31, 34, 41, 61, 44,121, 88, 74, 99,123, 180,144

Custom anchors

Only if you are an expert in neural detection networks - recalculate anchors for your dataset for width and height from cfg-file: darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416 then set the same 9 anchors in each of 3 [yolo]-layers in your cfg-file. But you should change indexes of anchors masks= for each [yolo]-layer, so for YOLOv4 the 1st-[yolo]-layer has anchors smaller than 30x30, 2nd smaller than 60x60, 3rd remaining, and vice versa for YOLOv3. Also you should change the filters=(classes + 5)* before each [yolo]-layer. If many of the calculated anchors do not fit under the appropriate layers - then just try using all the default anchors.

I took what you said, and applied it as such to my .cfg but I am not getting much of an increase (1%) performance compared to the original anchors.

Here is my .cfg portion: I changed the filters=(classes + 5)*<number of mask> and I made sure to go based on the largest achors in the first layer, and the smallest anchors in the last.

[convolutional]
size=1
stride=1
pad=1
filters=36
activation=linear

[yolo]
mask = 5,6,7,8
anchors = 9, 11,  17, 17,  15, 65,  31, 34,  41, 61,  44,121,  88, 74,  99,123, 180,144
classes=4
num=9
jitter=.3
scale_x_y = 1.05
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
ignore_thresh = .7
truth_thresh = 1
random=1
resize=1.5
nms_kind=greedynms
beta_nms=0.6

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 23

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=27
activation=linear

[yolo]
mask = 2,3,4
anchors = 9, 11,  17, 17,  15, 65,  31, 34,  41, 61,  44,121,  88, 74,  99,123, 180,144
classes=4
num=9
jitter=.3
scale_x_y = 1.05
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
ignore_thresh = .7
truth_thresh = 1
random=1
resize=1.5
nms_kind=greedynms
beta_nms=0.6


[route]
layers = -3

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 15

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=18
activation=linear

[yolo]
mask = 0,1
anchors = 9, 11,  17, 17,  15, 65,  31, 34,  41, 61,  44,121,  88, 74,  99,123, 180,144
classes=4
num=9
jitter=.3
scale_x_y = 1.05
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
ignore_thresh = .7
truth_thresh = 1
random=1
resize=1.5
nms_kind=greedynms
beta_nms=0.6

3 Questions:

1. The mAP barely improves. Is there something I did not implement correctly?

2. Is there a reason we are detecting the largest anchors first ( >60x60) --> (>30x30) --> (<30x30) ? I read somewhere that this order does not matter.

3. In the case of the ( 9, 11) anchor, should I just ignore that (too small) and just have the last layer show mask = 1 ?

I also want to implement the following suggestions:

for training for small objects (smaller than 16x16 after the image is resized to 416x416) - set layers = 23 instead of

layers = 54

set stride=4 instead of

stride=2

set stride=4 instead of
stride=2

Is there a way to do this on the yolov4-tiny models? Or is this specific to yolo-v4 only?

@re-shubhamturai
Copy link

@re-shubhamturai re-shubhamturai commented Oct 2, 2020

Hey @AlexeyAB !!
I would like to get a suggestion from you regrading the choice of Object Detector:
SSDMobileNetv2 and Yolov4-tiny in terms of speed and accuracy/mAP for a mobile application. Please help !! Thanks !!

@wwzh2015
Copy link

@wwzh2015 wwzh2015 commented Oct 4, 2020

Hey @AlexeyAB !!
I would like to get a suggestion from you regrading the choice of Object Detector:
SSDMobileNetv2 and Yolov4-tiny in terms of speed and accuracy/mAP for a mobile application. Please help !! Thanks !!

SSDMobileNetv2 is lower

@SilverWaveGL
Copy link

@SilverWaveGL SilverWaveGL commented Oct 4, 2020

@wwzh2015 : Can you please share this comparison?

@lisifann
Copy link

@lisifann lisifann commented Nov 10, 2020

@AlexeyAB
Excuse me, could you help me to explain what groups = 2, group_id = 1 means? And did you refer to any papers on this?

@lliming2006
Copy link

@lliming2006 lliming2006 commented Jan 20, 2021

Hi, @AlexeyAB , thanks for your great job.
I had converted yolov4-tiny.weights to caffemodel and test its , (in caffemodel, i use the "slice"layer and "silence"layers replace the route layer that include "groups" in yolov4-tiny, but there is big difference result between Caffe framework and darknet framework on same image. es. to some Target objects in image,there are two or three Target boxes( one box contain one or two small boxes after do_nms_sort, these boxes are the same category, their scores are slightly different ),why?

@WENKONG01
Copy link

@WENKONG01 WENKONG01 commented Jun 1, 2021

Discussion: https://www.reddit.com/r/MachineLearning/comments/hu7lyt/p_yolov4tiny_speed_1770_fps_tensorrtbatch4/

Full structure: https://lutzroeder.github.io/netron/?url=https%3A%2F%2Fraw.githubusercontent.com%2FAlexeyAB%2Fdarknet%2Fmaster%2Fcfg%2Fyolov4-tiny.cfg

YOLOv4-tiny released: 40.2% AP50, 371 FPS (GTX 1080 Ti) / 330 FPS (RTX 2070)

cmp

source yolov3-tiny (800x800) yolov4-tiny (800x800)
src predictions predictions

hi AlexeyAB, thanks for your great job. could you tell me which datasets the pictures is from? thanks.

@WongKinYiu
Copy link
Collaborator

@WongKinYiu WongKinYiu commented Jun 1, 2021

i do not know which datasets the picture is from, but i get the source picture from https://github.com/google/automl/tree/master/efficientdet .

@lliming2006
Copy link

@lliming2006 lliming2006 commented Jun 2, 2021

@WENKONG01
Copy link

@WENKONG01 WENKONG01 commented Jun 2, 2021

i do not know which datasets the picture is from, but i get the source picture from https://github.com/google/automl/tree/master/efficientdet .

Okay, thanks for your reply.

@WENKONG01
Copy link

@WENKONG01 WENKONG01 commented Jun 2, 2021

if you only test the generalization ability of the model ,you can get some test dataset on the overpass!

-------------------------------- Best Regards 刘黎明 Tel:15910968546 微信: llm454650810 QQ:632846506
----- 原始邮件 ----- 发件人:"Kin-Yiu, Wong" @.> 收件人:AlexeyAB/darknet @.> 抄送人:lliming2006 @.>, Comment @.> 主题:Re: [AlexeyAB/darknet] YOLOv4-tiny released: 40.2% AP50, 371 FPS (GTX 1080 Ti), 1770 FPS tkDNN/TensorRT (#6067) 日期:2021年06月01日 17点59分 i do not know which datasets the picture is from, but i get the source picture from https://github.com/google/automl/tree/master/efficientdet . — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

thanks!

@Fetulhak
Copy link

@Fetulhak Fetulhak commented Sep 3, 2021

@DoriHp Can you explain why the last yolo layer uses masks starting from 1, not 0? in yolov4-tiny. did you get the answer or is it a mistake?

@WongKinYiu
Copy link
Collaborator

@WongKinYiu WongKinYiu commented Sep 3, 2021

We just follow yolov3-tiny to use masks starting from 1 for fair comparison.

@Fetulhak
Copy link

@Fetulhak Fetulhak commented Sep 5, 2021

In that case you are not going to use the last anchor size. am I correct?

@WongKinYiu
Copy link
Collaborator

@WongKinYiu WongKinYiu commented Sep 5, 2021

yes, anchor with index 0 is not used in that case.

@Fetulhak
Copy link

@Fetulhak Fetulhak commented Sep 23, 2021

@AlexeyAB
Excuse me, could you help me to explain what groups = 2, group_id = 1 means? And did you refer to any papers on this?

did anyone explain how the backbone of yolov4-tiny works? it is easy to follow yolov4-custom.cfg but in yolov4-tiny.cfg it becomes difficult for me to understand. in the route layer there is groups parameter does this indicate group convolution? if there is group convolution don't we need shuffle parameter too? @AlexeyAB @WongKinYiu please give us a reading reference how the architecture of the backbone implemented? I have seen @AlexeyAB response here #6067 (comment)

@AlexeyAB
Copy link
Owner Author

@AlexeyAB AlexeyAB commented Sep 24, 2021

@Fetulhak

[route]
layers=-1
groups=2
group_id=1

It means,

  • layers=-1 - it gets output of previous layer (layer_id - 1) as input, then
  • groups=2 - it divides this input by channels into 2 groups
  • group_id=1 - output of this layer will be just the 2nd group

So in Pytorch it can be implemented as:

b,c,h,w = input.shape
out = input[:,:, c/2:c ,:]

Look at the Full structure of yolov4-tiny.cfg model

This is related to the CSP paper: https://openaccess.thecvf.com/content_CVPRW_2020/html/w28/Wang_CSPNet_A_New_Backbone_That_Can_Enhance_Learning_Capability_of_CVPRW_2020_paper.html

1_HkKa3ATWuoMlSZGU-Hrp8g

@Fetulhak
Copy link

@Fetulhak Fetulhak commented Sep 25, 2021

@AlexeyAB thank you for your great illustration of the Yolov4-tiny backbone structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests