Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can anybody explain cfg file parameters from region layer and net layer and which of them required to changed for training own custom dataset? #933

Open
AvaniPitre opened this issue May 30, 2018 · 24 comments

Comments

@AvaniPitre
Copy link

No description provided.

@AlexeyAB
Copy link
Owner

@IlyaOvodov
Copy link

By the way be carefull with "random=1" last layer parameter, if you use the net with less then 6 "stride by 2" layers and/or size less then 416x416 or not square. This parameter randomly changes input size of the net in hardcoded manner: +-160pix with 32pix step. Not long ago it also contained an error resulting in conversion of any input format to square. So "you should be aware what are you doing".

@AlexeyAB
Copy link
Owner

@IlyaOvodov

  1. Currently this fork doesn't change input size of the net in hardcoded manner:

    darknet/src/detector.c

    Lines 132 to 137 in ec766fc

    float random_val = rand_scale(1.4); // *x or /x
    int dim_w = roundl(random_val*init_w / 32) * 32;
    int dim_h = roundl(random_val*init_h / 32) * 32;
    if (dim_w < 32) dim_w = 32;
    if (dim_h < 32) dim_h = 32;

    Now random=1 can be used successfully for non-square network, for example, the network with width=640 height=352 was successfully trained with random=1 and got mAP = 89.82 % : Tiny YOLO: Looking for suggestions to improve training on a custom dataset #406 (comment)

  2. Also random=1 requires not more than 5 layers with stride=2, (pow(2,5)=32). But if number of layers with stride=2 less than 5 - it isn't a problem. For example, if you have only 3 layers with stride=2, then (pow(2,3)=8), so any value that is multiple of 32 also will be multiple of 8, i.e. network resolution 640x352 can be divided by /32 and /8 without remainder of division.

@AvaniPitre
Copy link
Author

thanks a lot @AlexeyAB for quick response and links, actually I have already started training my custom data with this darknet fork and given guidelines. I have trained my data with copy of yolo-voc.2.0.cfg , with1500 images for 2000 iterations, I am getting results but predicted bounding box is not very accurate as marked during training .. I need to improve on accuracy of predicted bounding box .

From above link I understood I need to recalculate anchors , which other parameters do I have change to improve the accuracy of bounding box.
Below is my [net] and [region] layers from cfg.

[net]

batch=64
subdivisions=8
height=416
width=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
 
learning_rate=0.0001
max_batches = 45000
policy=steps
steps=100,25000,35000
scales=10,.1,.1

[region]

anchors = 1.08,1.19,  3.42,4.41,  6.63,11.38,  9.42,5.11,  16.62,10.52
bias_match=1
classes=1
coords=4
num=5
softmax=1
jitter=.2
rescore=1
 
object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1
 
absolute=1
thresh = .6
random=0
 
Thanks in advance

@AlexeyAB
Copy link
Owner

Set width=608 height=608
Set random=1
Read: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection
And train from the begining ~10 000 iterations.

@dfsaw
Copy link

dfsaw commented Jun 6, 2018

Which parameter is for total no of iterations.

@AlexeyAB
Copy link
Owner

AlexeyAB commented Jun 7, 2018

@dfsaw max_batches= in the cfg-file.

@dfsaw
Copy link

dfsaw commented Jun 7, 2018

Is there any min images for which i should train my images. My objects are not getting detected corrected

@AlexeyAB
Copy link
Owner

AlexeyAB commented Jun 7, 2018

@dfsaw

https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds - you should preferably have 2000 images for each class or more

@dfsaw
Copy link

dfsaw commented Jun 14, 2018

recall

Can anyone please explain the parameters. and Why is recall equal to 0.
Also i have total 90 images with batch=8 and subdivisions =4 . then how is it showing 128 images. Since I am new to deep learning I am not able to understand.

Thanks

@AlexeyAB
Copy link
Owner

@dfsaw This is total images for 16 iterations. I.e. batch x iterations = 8 x 16 = 128
If you have less than 128 images, then some of images will be used several times.

@AlexeyAB
Copy link
Owner

@dfsaw

  1. What GPU do you use?

  2. What mAP can you get?
    darknet.exe detector map data/obj.data yolo-obj.cfg backup\yolo-obj_1500.weights

  3. What params do you use in the Makefile?

@dfsaw
Copy link

dfsaw commented Jun 19, 2018

  1. GPU Titan
  2. Parameters in Make file
    GPU=1
    CUDNN=1
    CUDNN_HALF=1

@AlexeyAB
Copy link
Owner

@dfsaw Use CUDNN_HALF=0 and train from the begining about 2000-4000 iterations.

@dfsaw
Copy link

dfsaw commented Jun 20, 2018

@AlexeyAB what should be the batch, subdivision and steps? Foe batch =64 subdivsion 8, I am getting out of memory error

@AlexeyAB
Copy link
Owner

@dfsaw

  • steps should be at 80% and 90% of max_batches
  • batch is always 64
  • subdivisions is small as possible, but if the OOM occures, then set it 64

https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

if error Out of memory occurs then in .cfg-file you should increase subdivisions=16, 32 or 64:

@AvaniPitre
Copy link
Author

@AlexeyAB
I have done setting as you suggested "Set width=608 height=608 Set random=1" and started training fresh but was getting nan values for all parameters after 40 iteration also training is getting killed and exe crashes after 50 iteration while resizing
Is this because of sizes of training images used?

my training dataset includes images with smallest size is 182X 53 and biggest size is 704X576
I have 800 total images and classes=1 and calculated anchors as
anchors = 11.7425,9.5023, 5.8290,2.0653, 9.7451,5.8569, 3.7540,1.4357, 10.6144,7.2646
is input resolution should be changed ? Please guide.

Thanks

@AlexeyAB
Copy link
Owner

@AvaniPitre

also training is getting killed and exe crashes after 50 iteration while resizing

Whit the error CUDA out of memory?

my training dataset includes images with smallest size is 182X 53 and biggest size is 704X576

In your case there is no sense to use 608x608. You can use 416x416 and random=1.

anchors = 11.7425,9.5023, 5.8290,2.0653, 9.7451,5.8569, 3.7540,1.4357, 10.6144,7.2646
is input resolution should be changed ? Please guide.

What command did you use for calculating anchors?

@AvaniPitre
Copy link
Author

Whit the error CUDA out of memory?
No such msg but Unhandled Exception with Access violation reading location

What command did you use for calculating anchors?
./darknet detector calc_anchors x64/Release/data/obj.data yolo-obj.cfg -num_of_clusters 5 -width 13 -height 13

@AlexeyAB
Copy link
Owner

@AvaniPitre

  • What params did you use in the Makefile, GPU=1?
  • Do you use Yolo v2 with [region] layer instead of Yolo v3 [yolo]-layer?
  • Do you use the latest code of my repository?

@AvaniPitre
Copy link
Author

What params did you use in the Makefile, GPU=1?
This particular training I was trying with windows version of darknet with GPU = 0 i.e as my machine has no hardware supported for Cudda drivers .. so I have build darknet_no_gpu.exe with following
make file parameters are
GPU=0
CUDNN=0
CUDNN_HALF=0
OPENCV=0
AVX=0
OPENMP=0
LIBSO=0
Also Linux version I have build with following parameters
GPU=0
CUDNN=0
OPENCV=1
DEBUG=0
OPENMP=1
LIBSO=1
But both version are crashing when I set random =1 during resizing.
Random = 0 works fine with no errors.

Do you use Yolo v2 with [region] layer instead of Yolo v3 [yolo]-layer?
Yes . I use Yolo v2 with [region layer]

Do you use the latest code of my repository?
Yes, must be around 20 days back I have downloaded

@AlexeyAB
Copy link
Owner

@AvaniPitre

  • Try to update your code from this repo and recompile. There are some fixes.
  • Did you use Yolo_mark to create your dataset?

@AvaniPitre
Copy link
Author

Thanks ok sure I will update my code and start training again.
yes I used yolo mark to create my dataset...
Thanks a lot for quick response.. I hope updating code will resolve all issues

@AvaniPitre
Copy link
Author

AvaniPitre commented Jun 25, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants