Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance drop when training yolov3 on voc #1323

Open
Noir97 opened this issue Aug 2, 2018 · 7 comments
Open

performance drop when training yolov3 on voc #1323

Noir97 opened this issue Aug 2, 2018 · 7 comments

Comments

@Noir97
Copy link

Noir97 commented Aug 2, 2018

Hi, Alexey. I used the cfg file, yolov3-voc.cfg provided in the repo and trained yolov3 on VOC. I prepared the dataset and the labels following the instructions on pjreddie's official site. After training on your version of Darknet, I got 78% map. However, after I trained with the same cfg on the official darknet ,I can get about 82% map. I wonder where the difference is that causes such a drop in performance.

I do found some differences such as the strategy for multiscale training, but changing that didn't bring a good performance either.

I didnt change the yolov3-voc.cfg except uncommenting the training batch and subdivison and changing the batch to 96 (in order to fill my gpus). btw we use our own code to compute map.

Thanks

@AlexeyAB
Copy link
Owner

AlexeyAB commented Aug 2, 2018

@QBjun Hi,

@Noir97
Copy link
Author

Noir97 commented Aug 3, 2018

@AlexeyAB glad to see your reply.

  • I've checked that i did use the same cfg file, and the same Makefile (both with GPU=1, OPENCV=1, CUDNN=1, 0 in other cases) as well.

  • as for the image resizing, i noticed that the original repo uses letterbox in both validation and test, so I edited both parts and ensured that I can get a same result as what i can get by testing in your repo. However, I haven't check the resizing procedure during training thoroughly, any difference there?

  • the original repo uses scale at (10 + rand() % 10) * 32 ,which means a range of 320~608
    so i modified your code to fit the range above (416 * 416 as init_w * init_h)
    e.g. int dim_w = (roundl(random_val * init_w / 32) + 1) * 32;
    there may still be a problem as the probability of a certain scale being used is not the same as the original repo,but i think thats trivial.
    consequently i didn't get much improvement with this change.

@AlexeyAB
Copy link
Owner

AlexeyAB commented Aug 3, 2018

@QBjun

Yes, during Training - original repository uses letterbox() for resizing, while my repository uses resize(): #232 (comment)

@Noir97
Copy link
Author

Noir97 commented Aug 3, 2018

@AlexeyAB well, I guess i have to check the code again, i think i made some mistakes when reading the code. I'll train voc again and post the result later. thx again for the reply~

@Noir97 Noir97 closed this as completed Aug 3, 2018
@Noir97 Noir97 reopened this Aug 3, 2018
@kenrubiooo
Copy link

Hi, @AlexeyAB!

Trying to find a thread on multi-scale training. I am not sure how the original darknet does that. I understand that when you specify the height and width in the cfg file for example, 608x608, the images being fed to the algorithm is squeezed but the ratio is retained by using letterbox. Does YOLO v3 just feed on different sizes of images and use this process, thus it is called multi-scale training?

Moreover, will the accuracy suffer if I feed an image that has lower dimension than the height and width or will the letterbox take care of it? Thank you!

@AlexeyAB
Copy link
Owner

@kenrubiooo Hi,

There are 2 features for multi-scale training in Darknet Yolo:

  • random network resizing random=1 in cfg-file (random for each 10 iterations)
  • random image resizing jitter=0.3 in cfg-file (random for each image)

Moreover, will the accuracy suffer if I feed an image that has lower dimension than the height and width or will the letterbox take care of it? Thank you!

Letterbox takes care of it.
But better to use image resolution >= network resolution.

@kenrubiooo
Copy link

Thank you for the swift reply! I am actually a bit confused on the network resizing when 'random=1'. What is the resizing in that part if you set the height and width to a certain dimension? At first, what I need to do for 'random=1' is to set 3 heights and widths, i.e.

height = 1024, 512, 320
width = 1024, 512, 320

But I am a bit confused. Can you clarify this? Thank you very much!

There are 2 features for multi-scale training in Darknet Yolo:

  • random network resizing random=1 in cfg-file (random for each 10 iterations)
  • random image resizing jitter=0.3 in cfg-file (random for each image)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants