performance drop when training yolov3 on voc #1323

Noir97 · 2018-08-02T06:29:24Z

Hi, Alexey. I used the cfg file, yolov3-voc.cfg provided in the repo and trained yolov3 on VOC. I prepared the dataset and the labels following the instructions on pjreddie's official site. After training on your version of Darknet, I got 78% map. However, after I trained with the same cfg on the official darknet ,I can get about 82% map. I wonder where the difference is that causes such a drop in performance.

I do found some differences such as the strategy for multiscale training, but changing that didn't bring a good performance either.

I didnt change the yolov3-voc.cfg except uncommenting the training batch and subdivison and changing the batch to 96 (in order to fill my gpus). btw we use our own code to compute map.

Thanks

AlexeyAB · 2018-08-02T12:39:07Z

@QBjun Hi,

Did you use the same batch= and subdivisions= in both cases?
Did you use the same params in Makefile (and CUDNN_HALF=0) in both cases?

There are two differences:

aspect ratio and object sizes: Resizing : keeping aspect ratio, or not #232 (comment)

network resizing (multiscale):

this repo:

darknet/src/detector.c

Lines 137 to 139 in 6682f0b

    
           float random_val = rand_scale(1.4);    // *x or /x 
        
           int dim_w = roundl(random_val*init_w / 32) * 32; 
        
           int dim_h = roundl(random_val*init_h / 32) * 32;

original repo: https://github.com/pjreddie/darknet/blob/49ba88d9f73cf80ed657823a80fefb4b929414a5/examples/detector.c#L65-L66

Noir97 · 2018-08-03T02:31:52Z

@AlexeyAB glad to see your reply.

I've checked that i did use the same cfg file, and the same Makefile (both with GPU=1, OPENCV=1, CUDNN=1, 0 in other cases) as well.
as for the image resizing, i noticed that the original repo uses letterbox in both validation and test, so I edited both parts and ensured that I can get a same result as what i can get by testing in your repo. However, I haven't check the resizing procedure during training thoroughly, any difference there?
the original repo uses scale at (10 + rand() % 10) * 32 ,which means a range of 320~608
so i modified your code to fit the range above (416 * 416 as init_w * init_h)
e.g. int dim_w = (roundl(random_val * init_w / 32) + 1) * 32;
there may still be a problem as the probability of a certain scale being used is not the same as the original repo,but i think thats trivial.
consequently i didn't get much improvement with this change.

AlexeyAB · 2018-08-03T09:00:09Z

@QBjun

Yes, during Training - original repository uses letterbox() for resizing, while my repository uses resize(): #232 (comment)

Noir97 · 2018-08-03T10:37:52Z

@AlexeyAB well, I guess i have to check the code again, i think i made some mistakes when reading the code. I'll train voc again and post the result later. thx again for the reply~

kenrubiooo · 2018-10-17T00:19:34Z

Hi, @AlexeyAB!

Trying to find a thread on multi-scale training. I am not sure how the original darknet does that. I understand that when you specify the height and width in the cfg file for example, 608x608, the images being fed to the algorithm is squeezed but the ratio is retained by using letterbox. Does YOLO v3 just feed on different sizes of images and use this process, thus it is called multi-scale training?

Moreover, will the accuracy suffer if I feed an image that has lower dimension than the height and width or will the letterbox take care of it? Thank you!

AlexeyAB · 2018-10-17T00:30:56Z

@kenrubiooo Hi,

There are 2 features for multi-scale training in Darknet Yolo:

random network resizing random=1 in cfg-file (random for each 10 iterations)
random image resizing jitter=0.3 in cfg-file (random for each image)

Moreover, will the accuracy suffer if I feed an image that has lower dimension than the height and width or will the letterbox take care of it? Thank you!

Letterbox takes care of it.
But better to use image resolution >= network resolution.

kenrubiooo · 2018-10-18T09:07:07Z

Thank you for the swift reply! I am actually a bit confused on the network resizing when 'random=1'. What is the resizing in that part if you set the height and width to a certain dimension? At first, what I need to do for 'random=1' is to set 3 heights and widths, i.e.

height = 1024, 512, 320
width = 1024, 512, 320

But I am a bit confused. Can you clarify this? Thank you very much!

There are 2 features for multi-scale training in Darknet Yolo:

random network resizing random=1 in cfg-file (random for each 10 iterations)

random image resizing jitter=0.3 in cfg-file (random for each image)

Noir97 closed this as completed Aug 3, 2018

Noir97 reopened this Aug 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance drop when training yolov3 on voc #1323

performance drop when training yolov3 on voc #1323

Noir97 commented Aug 2, 2018

AlexeyAB commented Aug 2, 2018

Noir97 commented Aug 3, 2018 •

edited

Loading

AlexeyAB commented Aug 3, 2018

Noir97 commented Aug 3, 2018

kenrubiooo commented Oct 17, 2018

AlexeyAB commented Oct 17, 2018

kenrubiooo commented Oct 18, 2018

performance drop when training yolov3 on voc #1323

performance drop when training yolov3 on voc #1323

Comments

Noir97 commented Aug 2, 2018

AlexeyAB commented Aug 2, 2018

Noir97 commented Aug 3, 2018 • edited Loading

AlexeyAB commented Aug 3, 2018

Noir97 commented Aug 3, 2018

kenrubiooo commented Oct 17, 2018

AlexeyAB commented Oct 17, 2018

kenrubiooo commented Oct 18, 2018

Noir97 commented Aug 3, 2018 •

edited

Loading