Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

howto: calculating custom anchors for YOLOv4-tiny #7856

Closed
stephanecharette opened this issue Jul 5, 2021 · 26 comments
Closed

howto: calculating custom anchors for YOLOv4-tiny #7856

stephanecharette opened this issue Jul 5, 2021 · 26 comments

Comments

@stephanecharette
Copy link
Collaborator

@AlexeyAB I know about this line in the readme:

Only if you are an expert in neural detection networks [...]

I've avoided the topic of re-calculating anchors for the past few years. But people ask on the Discord server, and truth is, I'd like to know how to do it as well! :) Every time I try to do it, the results are worse than the default anchors, so I assume that I'm doing it wrong and I'm not enough of an expert.

Say we use this license plate project as an example: https://github.com/stephanecharette/DarkPlate

The default anchors in YOLOv4-tiny is this:

anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319

I know the anchor-calculating code has a bit of randomness in it, so every time I run it I get slightly different results. For my 416x416 YOLOv4-tiny config file, I run this command:

darknet detector calc_anchors DarkPlate.data -num_of_clusters 6 -width 416 -height 416

The results I get look like one of these lines:

anchors =  12, 19,  24, 38,  42, 76,  98, 39, 154, 91, 256,155
anchors =  12, 19,  24, 39,  42, 76,  94, 37, 152, 84, 240,152
anchors =  12, 19,  24, 37,  41, 75,  95, 38, 148, 85, 245,151
anchors =  11, 19,  26, 34,  31, 62,  53, 83, 128, 53, 214,132
anchors =  10, 18,  23, 32,  30, 59,  56, 72, 144, 61, 222,140

First thing I do is pick one of the anchor lines I list above. (They're all very similar, off by just a few pixels.)

Let say we use this line for our example: anchors = 12, 19, 24, 37, 41, 75, 95, 38, 148, 85, 245, 151

Then I look for each [yolo] section in the .cfg file and replace the anchors = ... line with one we selected above.

Lastly comes these instructions:

But you should change indexes of anchors masks= for each [yolo]-layer,
so for YOLOv4 the 1st-[yolo]-layer has anchors smaller than 30x30,
2nd smaller than 60x60, 3rd remaining, and vice versa for YOLOv3.

Considering the default anchors are anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319, the default YOLOv4-tiny.cfg has 2 YOLO sections with these masks and anchors:

Lines 226-228:

[yolo]
mask = 3,4,5
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319

And lines 277-279:

[yolo]
mask = 1,2,3
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319

Are the anchors zero-based, or one-based? I assume zero-based, so line 227 refers to:

  • 81,82
  • 135,169
  • 344,319

And line 278 refers to:

  • 23,27
  • 37,58
  • 81,82

Is it intentional that mask index #3 (81,82) is referenced in both YOLO sections, or is that a typo? Should the mask be 1,2,3 and 4,5,6 or 0,1,2 and 3,4,5?

And just as importantly, how do we reconcile this statement:

so for YOLOv4 the 1st-[yolo]-layer has anchors smaller than 30x30,
2nd smaller than 60x60, 3rd remaining, and vice versa for YOLOv3.

From what I can see, the 1st [YOLO] section has anchors 81,82, 135,169, and 344,319, all of which are larger than 30x30, not smaller.

And even in the 2nd [YOLO] section, only the very first anchor of 23,27 would be smaller than 30x30, so I'm very confused.

But even without understanding all of this, I went ahead and trained 2 networks, one with the default anchors and the other with some new custom anchors. I did not change the mask, only the anchors = ... line. (See attached .cfg file.)

This is the chart.png when I train with the default YOLOv4-tiny anchors:

image

And this is the chart.png file when I use the custom anchors:

image

Can you help clear up the confusion and various questions?

DarkPlate.cfg.txt

@AlexeyAB
Copy link
Owner

AlexeyAB commented Jul 5, 2021

@stephanecharette Hi,

I know the anchor-calculating code has a bit of randomness in it, so every time I run it I get slightly different results.

Yes, there is random initialization in the k-means++ approach https://en.wikipedia.org/wiki/K-means%2B%2B

When you run command darknet detector calc_anchors DarkPlate.data -num_of_clusters 6 -width 416 -height 416

You see:

  • anchors
  • IoU

So you can run it several times and choose the anchors with the highest IoU.


darknet detector calc_anchors DarkPlate.data -num_of_clusters 6 -width 416 -height 416

Try to use this command with flag -show (you should do it on OS with GUI: Windows / Linux+Goneme/KDE/... )
darknet detector calc_anchors DarkPlate.data -num_of_clusters 6 -width 416 -height 416 -show

  • You will see the point cloud, where each pixel is a relative size of object in training dataset (x,y-coord of pixel in cloud === w,h-size of object in training dataset)
  • And you will see anchors, they look as bounded boxes with left-top coord at the (0,0).

Check that the anchors cover most of the points evenly.
If not, then

  • or try to add some additional anchors manually
  • or try to use more anchors, f.e. 9: -num_of_clusters 9

Are the anchors zero-based, or one-based? I assume zero-based

Yes, masks of anchors are 0-based.

So try

[yolo]
mask = 0,1,2

instead of

[yolo]
mask = 1,2,3

It was a mistake in yolov3-tiny and yolov4-tiny versions. Which we shouldn't fix in the default models, beacause we ahve many of pre-trained models with these masks.


And just as importantly, how do we reconcile this statement:

so for YOLOv4 the 1st-[yolo]-layer has anchors smaller than 30x30,
2nd smaller than 60x60, 3rd remaining, and vice versa for YOLOv3.

From what I can see, the 1st [YOLO] section has anchors 81,82, 135,169, and 344,319, all of which are larger than 30x30, not smaller.

And even in the 2nd [YOLO] section, only the very first anchor of 23,27 would be smaller than 30x30, so I'm very confused.

Actually you should use:

  • small anchors for [yolo] layer with high resolution
  • big anchors for [yolo] layer with low resolution

The order of the [yolo] layers is different in different models.

image

image


@AlexeyAB
Copy link
Owner

AlexeyAB commented Jul 5, 2021

It is not related to anchors, but it can slightly improve accuracy if you use pre-trained weights for training.
You can try to add line stopbackward=800 to


So it will freeze all layers before this one for the first 800 iterations, so randomly initialized layers will not produce random gradient and will not destroy information in the pre-trained weights.
After 800 iterations, these layers will already be trained and will not contain random weights.


Or for very large models you can try to train yolov4-p5-frozen.cfg with stopbackward=1

stopbackward=1

with pre trained weigths https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-p5.conv.232

In this case stopbackward=1 will freeze all previous layers throughout the whole training, so training will be 2x-3x faster, will consume less memory so you can fit large mini-batch (lower subdivisions= in cfg) or higher resolution in GPU.
More details about large models: #7838 (comment)

@2MinuteWarning
Copy link

@AlexeyAB - thanks for posting these details. I too am trying to customize anchors for a custom yolov4-tiny model.

I'm still a little confused... what should @stephanecharette set the masks to for lines 227 and 278 above?

@arnaud-nt2i
Copy link

arnaud-nt2i commented Jul 23, 2021

@AlexeyAB sorry but you have not answered to the most critical (and misterious) part of Stephane question:

And just as importantly, how do we reconcile this statement:

so for YOLOv4 the 1st-[yolo]-layer has anchors smaller than 30x30,
2nd smaller than 60x60, 3rd remaining, and vice versa for YOLOv3.

From what I can see, the 1st [YOLO] section has anchors 81,82, 135,169, and 344,319, all of which are larger than 30x30, not > > smaller.
And even in the 2nd [YOLO] section, only the very first anchor of 23,27 would be smaller than 30x30, so I'm very confused.

Yolo doesn't respect his own rules?

Another question, does changing the network size affect the "theoretical" 30x30 and 60x60 limits?

@AlexeyAB
Copy link
Owner

@arnaud-nt2i

Yolo doesn't respect his own rules?

There is detailed answer for this question: #7856 (comment)

Actually you should use:

  • small anchors for [yolo] layer with high resolution
  • big anchors for [yolo] layer with low resolution
    The order of the [yolo] layers is different in different models.

Another question, does changing the network size affect the "theoretical" 30x30 and 60x60 limits?

In general yes.
But you also should pay attention to rewritten_box values during training, if it is higher than >5%, then try to move more anchors (actually move masks) from [yolo] layer with low resolution to [yolo] layer with high resolution

@AlexeyAB
Copy link
Owner

Another question, does changing the network size affect the "theoretical" 30x30 and 60x60 limits?

Try to keep this rule.
If you change the network resize - then recalculate anchors for the new network size

@arnaud-nt2i
Copy link

ok thank you for those explanations, It's the first time I read about the rewritten_box values in relation to Anchors...
I have read all anchors issues since 2018 but never seen an as clear explanation.
I will try that and change mask number and filter if needed.
One more thing, In all my tries, with small and big dataset (up to 300 000 pics):

  1. SGDR works (way) better stan steps,
  2. Batch_normalyse=2 better (little bit) than 1 (for minibatch from 2 to 5)
  3. I never had problems with dynamic_minibatch=2 and 0.9 factor instead of 0.8.

@AlexeyAB
Copy link
Owner

AlexeyAB commented Jul 25, 2021

@arnaud-nt2i

So is this combination the best [convolutional] batch_normalize=2 + [net] dynamic_minibatch=1 policy=sgdr for your dataset? Is your dataset indoor/outdoor/..., urban/agronomic/biology...?

Did you try [net] letter_box=1 and/or [net] ema_alpha=0.9998 ?

And new cfg-file/pre-trained weights: yolov4-csp-x-swish.cfg, yolov4-p5.cfg, yolov4-p6.cfg https://github.com/AlexeyAB/darknet#pre-trained-models


I never had problems with dynamic_minibatch=2 and 0.9 factor instead of 0.8.

Does it solve Out of memory issue, or does it increase accuracy?
int new_dim_b = (int)(dim_b * 0.9); instead of

int new_dim_b = (int)(dim_b * 0.8);

@arnaud-nt2i
Copy link

arnaud-nt2i commented Jul 25, 2021

@AlexeyAB
So is this combination the best [convolutional] batch_normalize=2 + [net] dynamic_minibatch=1 policy=sgdr for your dataset?
Yes for small agro/bio and big outside, with swish and sgdr_cycle = nb iter for 1 epoch and Cycle factor =2
Haven't tried letter box because I compute mean ratio of pics and set network size with the same ratio (ex 704*544)
Don't know about ema_alpha=0.9998 ... what is that ?

int new_dim_b = (int)(dim_b * 0.9) alow higher minibatch and faster/more accurate learning...
never had Out of memory with this (and I am always maximizing VRAM usage) playing with resolution and random coef on my 3090 and 1660Ti;
(Haven't tried new optimized_memory as well... because I am afraid of memory usage peak when launching the network like it uses to be the case with optimized_memory=1

I tried yolov4 csp / scaled in december/january but it was not yet ok...
now, I am desperately waiting for Opencv DNN to support It....
But I am more interested by AP50 (number of detected objects) than AP (bbox coordinates) and I need good accuracy for small objects so new yolov4 (Csp, scaled) might be worse than better for me...

@AlexeyAB
Copy link
Owner

@arnaud-nt2i

Don't know about ema_alpha=0.9998 ... what is that ?

EMA is a custom version of SWA https://pytorch.org/blog/pytorch-1.6-now-includes-stochastic-weight-averaging/

Regardless of the procedure you use to train your neural network, you can likely achieve significantly better generalization at virtually no additional cost with a simple new technique now natively supported in PyTorch 1.6, Stochastic Weight Averaging (SWA)
...
Averaged SGD is often used in conjunction with a decaying learning rate, and an exponential moving average (EMA), typically for convex optimization. In convex optimization, the focus has been on improved rates of convergence.


You can try to train this model: https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-csp-x-swish.cfg with this pre-trained weights https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-csp-x-swish.conv.192
from https://github.com/AlexeyAB/darknet#pre-trained-models

@arnaud-nt2i
Copy link

ok, will try ema_alpha=09998 in some of my next training...
But as for optimizer, Radam + Lookahead =Ranger seems to give higher gain ( and as importantly less sensibility to initial lr)
: https://lessw.medium.com/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookahead-for-the-best-of-2dc83f79a48d
But it is already in todo list ^^

@AlexeyAB
Copy link
Owner

@arnaud-nt2i
For the most of my experiments and other papers:

  • Adam, Ranger, ... is better for faster training and for non-optimal hyper params
  • SGD - is better for the final accuracy, but requires longer training and tuning hyper params
    I will think about it more.

@arnaud-nt2i
Copy link

ok that seems fair, nothing replaces old good long trainings !

@cpsu00
Copy link

cpsu00 commented Aug 4, 2021

Hi @arnaud-nt2i

  1. Batch_normalyse=2 better (little bit) than 1 (for minibatch from 2 to 5)

Isn't Batch_normalise should be either 0 or 1?

@arnaud-nt2i
Copy link

@cpsu00
batch_normalyse=1 is default param for yolov4,
batch_normalyse=0 mean no normalization (not really good)
batch_normalyse=2 in Cross batch norm, allow use of lower minibatch_size (upgrade of batch_normalyse)

see batchnorm_layer.c

@cpsu00
Copy link

cpsu00 commented Aug 4, 2021

Oh, I didn't notice that. Thanks!

@zxz-cc
Copy link

zxz-cc commented Oct 29, 2021

@stephanecharette Hi,

I know the anchor-calculating code has a bit of randomness in it, so every time I run it I get slightly different results.

Yes, there is random initialization in the k-means++ approach https://en.wikipedia.org/wiki/K-means%2B%2B

When you run command darknet detector calc_anchors DarkPlate.data -num_of_clusters 6 -width 416 -height 416

You see:

* anchors

* IoU

So you can run it several times and choose the anchors with the highest IoU.

darknet detector calc_anchors DarkPlate.data -num_of_clusters 6 -width 416 -height 416

Try to use this command with flag -show (you should do it on OS with GUI: Windows / Linux+Goneme/KDE/... ) darknet detector calc_anchors DarkPlate.data -num_of_clusters 6 -width 416 -height 416 -show

* You will see the point cloud, where each pixel is a relative size of object in training dataset (x,y-coord of pixel in cloud === w,h-size of object in training dataset)

* And you will see anchors, they look as bounded boxes with left-top coord at the (0,0).

Check that the anchors cover most of the points evenly. If not, then

* or try to add some additional anchors manually

* or try to use more anchors, f.e. 9: `-num_of_clusters 9`

Are the anchors zero-based, or one-based? I assume zero-based

Yes, masks of anchors are 0-based.

So try

[yolo]
mask = 0,1,2

instead of

[yolo]
mask = 1,2,3

It was a mistake in yolov3-tiny and yolov4-tiny versions. Which we shouldn't fix in the default models, beacause we ahve many of pre-trained models with these masks.

And just as importantly, how do we reconcile this statement:

so for YOLOv4 the 1st-[yolo]-layer has anchors smaller than 30x30,
2nd smaller than 60x60, 3rd remaining, and vice versa for YOLOv3.

From what I can see, the 1st [YOLO] section has anchors 81,82, 135,169, and 344,319, all of which are larger than 30x30, not smaller.
And even in the 2nd [YOLO] section, only the very first anchor of 23,27 would be smaller than 30x30, so I'm very confused.

Actually you should use:

* small anchors for [yolo] layer with high resolution

* big anchors for [yolo] layer with low resolution

The order of the [yolo] layers is different in different models.

image

image

Does that mean maks won't change if I use a pre-trained model? If I change it to [3,4,5], [0,1,2], I can't use the pre-trained model. Okay?

@stephanecharette
Copy link
Collaborator Author

Of course, do not change the masks or anchors on pre-trained models! This only works if you are training your own custom network.

@zxz-cc
Copy link

zxz-cc commented Oct 29, 2021

Of course, do not change the masks or anchors on pre-trained models! This only works if you are training your own custom network.

ok thank you !

@MrGolden1
Copy link

How should I change the .cfg file to increase the number of anchor clusters? For YOLOv4-tiny default value is 6, and I want to try a higher number to test mAP. Any guidance will be appreciated.

@Fetulhak
Copy link

Fetulhak commented Dec 16, 2021

@stephanecharette @AlexeyAB I have trained two Yolov4 models. one using resolution 416x 416 and the other 512x512. however, the model with 512x512 has a lower mAP than the 416x416. It is confusing for me that it should be the opposite? the input images were all equal size 1008 x 1008. any help will be appreciated.

anchor generated at 416: 10, 10, 18, 8, 8, 18, 12, 12, 14, 14, 16, 15, 18, 18, 21, 21, 25, 25 ...............90.52% IOU

@Fetulhak
Copy link

How should I change the .cfg file to increase the number of anchor clusters? For YOLOv4-tiny default value is 6, and I want to try a higher number to test mAP. Any guidance will be appreciated.

Hi @MrGolden1 if you want to increase the anchor box size you should also increase the detection heads of yolo. see Yolov4-tiny-3l.cfg

@Fetulhak
Copy link

@stephanecharette @AlexeyAB the other question I have is what if I forget the resolution I used for training but I have the weight file, is there any way to know the training time resolution from the weight file?

@stephanecharette
Copy link
Collaborator Author

The image size is irrelevant and ignored by Darknet. The only size that matters is the width and height in the cfg. Your images could be 999999x999999 and Darknet will still resize the images to match the network dimensions.

@Fetulhak
Copy link

Fetulhak commented Dec 17, 2021

Thanks for your reply @stephanecharette okay let's forget about image size. whatever image size I have given, isn't my model trained with network resolution (width and height in the cfg) size 512x512 give better results than 416x416? in my case the 512 x 512 .cfg gives very low mAP than the 416 x 416. I am very confused

@darkxzk
Copy link

darkxzk commented Nov 29, 2022

I think you're right. recalculating the anchor frame will result in a decrease in the overall accuracy of the model. My experiment is consistent with yours, and the default anchor works best. Although I do not know why this is the case, it is possible from my analysis that the anchors of K-means + + clustering do not cover all scales, most likely because the target box in the dataset is fixed in one scale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants