Tiny YOLO: Looking for suggestions to improve training on a custom dataset #406

saihv · 2018-02-25T05:58:36Z

I am currently working on object detection on a custom dataset, where a close to real time implementation on a Jetson TX2 is the final goal. Hence, I am trying to achieve a performance of ~30 fps (20-30 would be acceptable too as long as accuracy is not too bad) as well as a decent IoU.

As of now, I am using Tiny YOLO as my framework through Darknet, compiled with GPU and CUDNN support. The images are 640x360 in dimensions and I have about 100000 of them with around 10 classes of objects in total. I've trained tiny YOLO for about 80000 iterations and on an average, this has given me IoUs of around 50% on the test dataset with a performance of around 18 fps on the Jetson TX2: I am currently looking to improve these numbers while not affecting the performance too much. I was hoping to get some suggestions regarding this:

What steps can I take to 'customize' training to my dataset? Because I have multiple classes of objects, some of them are very small (bounding boxes of 50x50 pixels in size approx.); and tiny YOLO is having a lot of trouble specifically with these small objects, while performing decently on the bigger ones. Can I somehow retrain my network to focus on these small objects more? Or are there any modifications I can make in the cfg file to account for these small objects?

(I see in the README two points relating to this: parameters small_object=1 and random=1. Do these affect the performance adversely at the cost of increased accuracy?)

Does YOLO have a performance boost when working on square images? i.e., is there any noticeable improvement in resizing the images to square?
Is IoU the best metric to check when trying to increase or derease network-resolution (width and height)? I gather from the README that these values create an accuracy vs speed trade-off, how should I pick the best values for my application?
When performing training or inference, in my application, each image has only one class of object in it. Can I somehow exploit this fact to improve the performance a little (somehow tell YOLO that the maximum number of objects it needs to finally detect is just one)?

Any other general comments aimed at improving accuracy or speed are very welcome too. Thanks!

The text was updated successfully, but these errors were encountered:

AlexeyAB · 2018-02-25T11:26:51Z

Did you get IoU using darknet map or darknet recall command?
What width= height= params do you use in the cfg-file?
What learning_rate, steps, scales and decay do you use?

You can use small_object=1 and random=1 this params doesn't decrease speed:

random=1 increase mAP +1%, it does not affect the speed of detection, but it deacrease speed of training
small_object=1 is required only for objects with size less than 1%x1% i.e. smaller than 5x5 pixels (if you use width=416 height=416)
also you can try to train using pre-trained tiny-yolo-voc.conv.13 instead of darknet19_448.conv.23 that you can get using command: darknet.exe partial cfg/tiny-yolo-voc.cfg tiny-yolo-voc.weights tiny-yolo-voc.conv.13 13

By default Yolo uses square network 416x416, and any image is resized to this square resolution 416x416 automatically, so you shouldn't do it. But there are several approaches for keeping aspect ration, so you can do pre-processing of images, as in the original darknet, or as in the OpenCV-dnn-Yolo: Resizing : keeping aspect ratio, or not #232 (comment)
But there are positive and negative points here.
For default networks (Yolo, Tiny-yolo) and default threshold=0.24, the IoU is the best accuracy metric. But if you use your own model (DenseNet-Yolo, ResNet-Yolo), that requires a different optimal threshold, then the best metric is mAP. Yes, the higher the network resolution, the slower it works, but the more accurately it detects (especially small objects).

3.1. Also, if all of your images (training and detection) have the same size 640x360, then you can try to change your network size width=640 height=352 and train with random=0

You can try to implement it in the source code in this function:

darknet/src/region_layer.c

Lines 333 to 384 in 3ff4797

    
           void get_region_boxes(layer l, int w, int h, float thresh, float **probs, box *boxes, int only_objectness, int *map) 
        
           { 
        
               int i,j,n; 
        
               float *predictions = l.output; 
        
               for (i = 0; i < l.w*l.h; ++i){ 
        
                   int row = i / l.w; 
        
                   int col = i % l.w; 
        
                   for(n = 0; n < l.n; ++n){ 
        
                       int index = i*l.n + n; 
        
                       int p_index = index * (l.classes + 5) + 4; 
        
                       float scale = predictions[p_index]; 
        
                       if(l.classfix == -1 && scale < .5) scale = 0; 
        
                       int box_index = index * (l.classes + 5); 
        
                       boxes[index] = get_region_box(predictions, l.biases, n, box_index, col, row, l.w, l.h); 
        
                       boxes[index].x *= w; 
        
                       boxes[index].y *= h; 
        
                       boxes[index].w *= w; 
        
                       boxes[index].h *= h; 
        
                       int class_index = index * (l.classes + 5) + 5; 
        
                       if(l.softmax_tree){ 
        
                           hierarchy_predictions(predictions + class_index, l.classes, l.softmax_tree, 0); 
        
                           int found = 0; 
        
                           if(map){ 
        
                               for(j = 0; j < 200; ++j){ 
        
                                   float prob = scale*predictions[class_index+map[j]]; 
        
                                   probs[index][j] = (prob > thresh) ? prob : 0; 
        
                               } 
        
                           } else { 
        
                               for(j = l.classes - 1; j >= 0; --j){ 
        
                                   if(!found && predictions[class_index + j] > .5){ 
        
                                       found = 1; 
        
                                   } else { 
        
                                       predictions[class_index + j] = 0; 
        
                                   } 
        
                                   float prob = predictions[class_index+j]; 
        
                                   probs[index][j] = (scale > thresh) ? prob : 0; 
        
                               } 
        
                           } 
        
                       } else { 
        
                           for(j = 0; j < l.classes; ++j){ 
        
                               float prob = scale*predictions[class_index+j]; 
        
                               probs[index][j] = (prob > thresh) ? prob : 0; 
        
                           } 
        
                       } 
        
                       if(only_objectness){ 
        
                           probs[index][0] = scale; 
        
                       } 
        
                   } 
        
               } 
        
           }

For example add this code at the end of the function, before this line:

darknet/src/region_layer.c

Line 384 in 3ff4797

}

float max_prob = 0;
int max_index = 0, max_j = 0;
    int i,j,n;
    for (i = 0; i < l.w*l.h; ++i){
        int row = i / l.w;
        int col = i % l.w;
        for(n = 0; n < l.n; ++n){
            int index = i*l.n + n;
                for(j = 0; j < l.classes; ++j){
                    if(probs[index][j] > max_prob) {
                        max_prob = probs[index][j];
                        max_index = index;
                        max_j = j;
                    }
                }
       }
    }

    for (i = 0; i < l.w*l.h; ++i){
        int row = i / l.w;
        int col = i % l.w;
        for(n = 0; n < l.n; ++n){
            int index = i*l.n + n;
                for(j = 0; j < l.classes; ++j){
                    if(index != max_index || j != max_j) probs[index][j] = 0;
                }
       }
    }

Also you can re-generate anchors for your dataset:

Just download Python 2.7: https://www.python.org/downloads/release/python-2714/
insteall numpy if it required: C:\Python27\Scripts\pip install numpy
calculate your anchors C:\Python27\python.exe gen_anchors.py -filelist data/train.txt -output_dir data/anchors -num_clusters 5 using this script: https://github.com/AlexeyAB/darknet/blob/master/build/darknet/x64/gen_anchors.py
and change anchors in cfg-file to your: https://github.com/AlexeyAB/darknet/blob/master/cfg/tiny-yolo-voc.cfg#L118

MyVanitar · 2018-02-25T12:46:06Z

is it a good idea to pad images during preprocessing to make it compatible with 416 * 416?

I don't mean resizing them all to 416 * 416, but pad them in a way to make a correct fraction of 416, because if the network resizes them all to 416 * 416, many images which are not dividable by 416, will lose their aspect ratio. such as 300 * 300.

AlexeyAB · 2018-02-25T13:26:29Z

@VanitarNordic There are positive and negative points here for each case: #232 (comment)

original Darknet: (+) keep aspect ratio, (-) has the smallest size of object - this further worsens the detection of small objects
OpenCV-dnn-Yolo: (+) keep aspect ratio, (-) removes part of the image - you will not be able to detect objects at the edges of the image
this Darknet repo: (+) object has the biggest size, (-) does not keep aspect ratio - If the sizes of the images in training and detection datasets are very different, then the accuracy will be reduced

Because I train my models on training dataset with the same image size (1280x720 or 1920x1080) as sizes of detection dataset, then I shouldn't keep aspect ratio, so for me the best option is to use this Darknet repository with the maximum object size.

MyVanitar · 2018-02-25T13:35:18Z

Okay, therefore this Darknet Repo does not keep the aspect ratio. I think it is the same with SSD.
if I pad the image to build a correct fraction or multiplication of 416, then is it good (for this repo)?

AlexeyAB · 2018-02-25T14:17:30Z

@VanitarNordic

if I pad the image to build a correct fraction or multiplication of 416, then is it good (for this repo)?

Do you mean, that you will do as in original Darknet, but by yourself? It will keep aspect ratio, but you will have the smallest size of object. If you have small object - then this is bad idea. But if you have big objects, but all your images has different size - then this is good idea.

MyVanitar · 2018-02-25T14:43:50Z

Do you mean, that you will do as in original Darknet, but by yourself?

By doing it myself outside before starting the training, You said this repo does not keep the aspect ratio, therefore I want to do the padding outside to make a correct fraction or multiplication of 416 for all images, by the padding method. Therefore even if the network does not keep the aspect ratio, but because of the correct numbers, then the object will not have un-normal shapes.

saihv · 2018-02-26T03:37:34Z

@AlexeyAB

Thanks a lot for the detailed reply! I will note your suggestions. Replies:

Did you get IoU using darknet map or darknet recall command?

I used darknet recall. But the 50% IoU I mentioned was on the test dataset, not validation. Validation IoU (the last line in the output) was about 65% IIRC.

What width= height= params do you use in the cfg-file?

As of now, just the defaults: 416x416.

What learning_rate, steps, scales and decay do you use?

momentum=0.9
decay=0.0005

learning_rate=0.001
policy=steps
steps=-1,100,80000,100000
scales=.1,10,.1,.1

saihv · 2018-05-08T17:58:08Z

@AlexeyAB

Thanks a lot for the tips! The one that made the biggest difference was using 640x352 with random=0. Strangely, regenerating the anchors actually reduced the IoU (and map). Is this possible?

Also, would you happen to have any tips for improving training only on certain classes? My training data is somewhat unbalanced: some classes have a lot more images than others, and running detector map looks like this:

detections_count = 35981, unique_truth_count = 16923  
class_id = 0, name = boat, 981   ap = 100.00 % 
class_id = 1, name = building,      ap = 79.95 % 
class_id = 2, name = car,      ap = 90.91 % 
class_id = 3, name = drone,      ap = 90.91 % 
class_id = 4, name = group,      ap = 80.07 % 
class_id = 5, name = horseride,      ap = 90.91 % 
class_id = 6, name = paraglider,      ap = 100.00 % 
class_id = 7, name = person,      ap = 90.91 % 
class_id = 8, name = riding,      ap = 90.91 % 
class_id = 9, name = truck,      ap = 72.41 %       // Slightly lower iou/precision on this class for example
class_id = 10, name = wakeboard,      ap = 83.83 % 
class_id = 11, name = whale,      ap = 100.00 %

Although the map/IoU looks really good on validation, it is slightly lower on test data: so I am just curious if I can improve training for only specific classes.

AlexeyAB · 2018-05-08T18:30:01Z

@saihv Simple solution is to do many duplicates of images+labels of classes which have small number of images. Then re-generate train.txt by using Yolo_mark.
Due to data augmentation even duplicates of images+labels will increase accuracy.

saihv · 2018-05-15T08:16:20Z

Got it, thank you! I will try that.

Just one last question in the custom dataset area, if you don't mind:

I am working on an object detection contest, where I only have access to training data. I am supposed to train a model, which is then evaluated on a test set with the same classes (I don't have access to these test images); and the evaluation metric is the average IoU. I am splitting the given data into train and validation (as usual) and training my tiny YOLO model: but there is a noticeably big difference in IoU between validation and test (avg. 80% on validation vs 60% on test).

I guess this could be because of multiple reasons: the test data might be more challenging, or perhaps it has a different distribution of images per class etc. But conceptually, this seems like a tricky problem because the model does perform well in validation, but it still looks like there is some overfitting when it comes to new data. So that makes me curious, are there any tips or tricks to making a model more generalized? Thanks!

AlexeyAB · 2018-05-15T16:36:56Z

the test data might be more challenging, or perhaps it has a different distribution of images per class etc.

Yes.

So that makes me curious, are there any tips or tricks to making a model more generalized?

Increase params in the data augmentation and train 10x times more iterations.
random=1 jitter=0.4 increase width and height to 608 or 832.
If you should detect objects with different colors as the same class_id, increase hue=0.2 saturation=1.8 exposure=1.8

Also fix this mistake:

darknet/cfg/yolov3-tiny.cfg

Line 175 in 8b5344e

mask = 1,2,3

mask = 0,1,2

saihv · 2018-05-15T16:45:21Z

Thanks! I'm using Tiny YOLO v2 right now at 640x352 (all images, training and test are at 640x360) because of FPS requirements. I'll try changing random and jitter.

…

On Tue, May 15, 2018, 9:37 AM Alexey ***@***.***> wrote: the test data might be more challenging, or perhaps it has a different distribution of images per class etc. Yes. ------------------------------ So that makes me curious, are there any tips or tricks to making a model more generalized? Increase params in the data augmentation and train 10x times more iterations. random=1 jitter=0.4 increase width and height to 608 or 832. If you should detect objects with different colors as the same class_id, increase hue=0.2 saturation=1.8 exposure=1.8 Also fix this mistake: https://github.com/AlexeyAB/darknet/blob/8b5344ee2dc551dbe673020a33021e7f84f305f1/cfg/yolov3-tiny.cfg#L175 mask = 0,1,2 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#406 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACKz1oIKV6NniMEnONR6G-znTTtC_06Cks5tywQwgaJpZM4SSLnq> .

AlexeyAB · 2018-05-15T17:07:05Z

If you want to use random=1 with non-square network 640x352 then you should download the latest version of Darknet from this GitHub repository.

Also did you re-calculate anchors? You can do it too for -width 20 -height 11
https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

saihv · 2018-05-15T19:34:08Z

I tried regenerating anchors in the past through this command:

gen_anchors.py -filelist data/train.txt -output_dir data/anchors -num_clusters 5

But using those anchors actually decreased the IoU. I now see that I should probably try with those width and height arguments (net.w/32 and net.h/32 I guess?)

AlexeyAB · 2018-05-15T19:39:22Z

@saihv Set here:

darknet/scripts/gen_anchors.py

Lines 17 to 18 in 8b5344e

    
           width_in_cfg_file = 416. 
        
           height_in_cfg_file = 416.

 width_in_cfg_file = 640. 
 height_in_cfg_file = 352.

saihv · 2018-05-15T20:02:54Z

Oops, I should have looked at that! Thanks for pointing it out, will change it and try.

saihv · 2018-05-20T19:51:08Z

I tried training with random=1, which produces slightly lower validation IoU (avg. 75% vs 80% with random=0); but most of the inaccuracy comes from classes that have relatively smaller objects (I did include small_object=1 in the cfg file; but the objects are not smaller than 1% pixels, so I don't know if this parameter helps)

Would it be helpful to train at a higher resolution (more than 640x360 but still non-square) and random=0 but do inference at 640x352?

AlexeyAB · 2018-05-20T20:14:22Z

@saihv What mAP can you get for random=1 and random=0?
Usually training resolution should be ~the same as detection resolution, if the images in the training and detection dataset have the same resolution.

saihv · 2018-05-21T05:14:50Z

random=0:

detections_count = 35981, unique_truth_count = 13965 
class_id = 0, name = boat, 981   ap = 100.00 % 
class_id = 1, name = building, 	 ap = 79.95 % 
class_id = 2, name = car, 	 ap = 90.91 % 
class_id = 3, name = drone, 	 ap = 90.91 % 
class_id = 4, name = group, 	 ap = 80.07 % 
class_id = 5, name = horseride, 	 ap = 90.91 % 
class_id = 6, name = paraglider, 	 ap = 100.00 % 
class_id = 7, name = person, 	 ap = 90.91 % 
class_id = 8, name = riding, 	 ap = 90.91 % 
class_id = 9, name = truck, 	 ap = 72.41 % 
class_id = 10, name = wakeboard, 	 ap = 83.83 % 
class_id = 11, name = whale, 	 ap = 100.00 % 
 for thresh = 0.24, precision = 0.97, recall = 0.98, F1-score = 0.98 
 for thresh = 0.24, TP = 16597, FP = 453, FN = 326, average IoU = 81.34 % 

 mean average precision (mAP) = 0.892338, or 89.23 %

random=1:

detections_count = 38874, unique_truth_count = 13965  
class_id = 0, name = boat, 874   ap = 90.91 % 
class_id = 1, name = building, 	 ap = 66.27 % 
class_id = 2, name = car, 	 ap = 90.89 % 
class_id = 3, name = drone, 	 ap = 90.53 % 
class_id = 4, name = group, 	 ap = 59.67 % 
class_id = 5, name = horseride, 	 ap = 90.63 % 
class_id = 6, name = paraglider, 	 ap = 100.00 % 
class_id = 7, name = person, 	 ap = 90.89 % 
class_id = 8, name = riding, 	 ap = 90.84 % 
class_id = 9, name = truck, 	 ap = 69.11 % 
class_id = 10, name = wakeboard, 	 ap = 80.74 % 
class_id = 11, name = whale, 	 ap = 90.87 % 
 for thresh = 0.25, precision = 0.95, recall = 0.95, F1-score = 0.95 
 for thresh = 0.25, TP = 13223, FP = 658, FN = 742, average IoU = 75.12 % 

 mean average precision (mAP) = 0.842781, or 84.28 %

Please note the difference in AP for classes 1, 4 and 9: which are the challenging ones with smaller object sizes. Both configurations were trained for about ~120k iterations after which the mAP settles and does not change by much.

AlexeyAB · 2018-05-21T13:08:54Z

@saihv Try to change these lines:

darknet/src/detector.c

Lines 132 to 134 in 4403e71

    
           int random_val = rand() % 12; 
        
           int dim_w = (random_val + (init_w / 32 - 5)) * 32;	// +-160 
        
           int dim_h = (random_val + (init_h / 32 - 5)) * 32;	// +-160

to these:

			float random_val = rand_scale(1.4);	// *x or /x
			int dim_w = roundl(random_val*init_w / 32) * 32;
			int dim_h = roundl(random_val*init_h / 32) * 32;

And train with random=1, what mAP will you get?

saihv · 2018-05-22T21:54:14Z

Trained it for 120k iterations with those changes, and now the mAP is pretty close to random=0:

detections_count = 30511, unique_truth_count = 13965  
class_id = 0, name = boat, 511   ap = 90.91 % 
class_id = 1, name = building, 	 ap = 82.86 % 
class_id = 2, name = car, 	 ap = 90.91 % 
class_id = 3, name = drone, 	 ap = 90.91 % 
class_id = 4, name = group, 	 ap = 78.91 % 
class_id = 5, name = horseride, 	 ap = 100.00 % 
class_id = 6, name = paraglider, 	 ap = 100.00 % 
class_id = 7, name = person, 	 ap = 90.90 % 
class_id = 8, name = riding, 	 ap = 90.91 % 
class_id = 9, name = truck, 	 ap = 72.62 % 
class_id = 10, name = wakeboard, 	 ap = 88.94 % 
class_id = 11, name = whale, 	 ap = 100.00 % 
 for thresh = 0.25, precision = 0.97, recall = 0.98, F1-score = 0.97 
 for thresh = 0.25, TP = 13632, FP = 416, FN = 333, average IoU = 80.81 % 

 mean average precision (mAP) = 0.898222, or 89.82 %

But I guess because random=1 switches between low and high resolutions, it might be beneficial to train for more iterations.

AlexeyAB · 2018-05-22T22:42:12Z

But I guess because random=1 switches between low and high resolutions, it might be beneficial to train for more iterations.

Yes. random=1 is almost the same as if there will be 2x more images, so it requires 2x more itrations.

What jitter do you use in all these cases?

saihv · 2018-05-23T00:05:34Z

I am still using jitter=0.2. I remember one of your suggestions was to move to 0.4, but I was just testing one thing at a time, so that's next on my list.

AlexeyAB · 2018-05-23T00:30:57Z

Yes, better to test one thing at a time. Changing jitter from 0.2 to 0.4 requires about 5-10x times more iterations.

AlexeyAB closed this as completed Feb 25, 2018

AlexeyAB reopened this Feb 25, 2018

AlexeyAB mentioned this issue Mar 9, 2018

Image detection fails with CUDNN=1 #436

Closed

BackT0TheFuture mentioned this issue Mar 30, 2018

question of small object option #534

Open

AlexeyAB mentioned this issue May 30, 2018

Can anybody explain cfg file parameters from region layer and net layer and which of them required to changed for training own custom dataset? #933

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tiny YOLO: Looking for suggestions to improve training on a custom dataset #406

Tiny YOLO: Looking for suggestions to improve training on a custom dataset #406

saihv commented Feb 25, 2018 •

edited

Loading

AlexeyAB commented Feb 25, 2018 •

edited

Loading

MyVanitar commented Feb 25, 2018 •

edited

Loading

AlexeyAB commented Feb 25, 2018

MyVanitar commented Feb 25, 2018

AlexeyAB commented Feb 25, 2018

MyVanitar commented Feb 25, 2018

saihv commented Feb 26, 2018 •

edited

Loading

saihv commented May 8, 2018 •

edited

Loading

AlexeyAB commented May 8, 2018

saihv commented May 15, 2018

AlexeyAB commented May 15, 2018

saihv commented May 15, 2018 via email

AlexeyAB commented May 15, 2018

saihv commented May 15, 2018

AlexeyAB commented May 15, 2018

saihv commented May 15, 2018

saihv commented May 20, 2018

AlexeyAB commented May 20, 2018

saihv commented May 21, 2018

AlexeyAB commented May 21, 2018

saihv commented May 22, 2018

AlexeyAB commented May 22, 2018

saihv commented May 23, 2018

AlexeyAB commented May 23, 2018

Tiny YOLO: Looking for suggestions to improve training on a custom dataset #406

Tiny YOLO: Looking for suggestions to improve training on a custom dataset #406

Comments

saihv commented Feb 25, 2018 • edited Loading

AlexeyAB commented Feb 25, 2018 • edited Loading

MyVanitar commented Feb 25, 2018 • edited Loading

AlexeyAB commented Feb 25, 2018

MyVanitar commented Feb 25, 2018

AlexeyAB commented Feb 25, 2018

MyVanitar commented Feb 25, 2018

saihv commented Feb 26, 2018 • edited Loading

saihv commented May 8, 2018 • edited Loading

AlexeyAB commented May 8, 2018

saihv commented May 15, 2018

AlexeyAB commented May 15, 2018

saihv commented May 15, 2018 via email

AlexeyAB commented May 15, 2018

saihv commented May 15, 2018

AlexeyAB commented May 15, 2018

saihv commented May 15, 2018

saihv commented May 20, 2018

AlexeyAB commented May 20, 2018

saihv commented May 21, 2018

AlexeyAB commented May 21, 2018

saihv commented May 22, 2018

AlexeyAB commented May 22, 2018

saihv commented May 23, 2018

AlexeyAB commented May 23, 2018

saihv commented Feb 25, 2018 •

edited

Loading

AlexeyAB commented Feb 25, 2018 •

edited

Loading

MyVanitar commented Feb 25, 2018 •

edited

Loading

saihv commented Feb 26, 2018 •

edited

Loading

saihv commented May 8, 2018 •

edited

Loading