Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resizing : keeping aspect ratio, or not #232

Open
iraadit opened this issue Oct 16, 2017 · 14 comments
Open

Resizing : keeping aspect ratio, or not #232

iraadit opened this issue Oct 16, 2017 · 14 comments
Labels
Explanations Explanations of the source code, algorithms or method of use question

Comments

@iraadit
Copy link

iraadit commented Oct 16, 2017

Hi,

In your implementation of Darknet, when resizing the image, the aspect ratio of the image is not kept in the function get_image_from_stream_resize:

image get_image_from_stream_resize(CvCapture *cap, int w, int h, IplImage** in_img)

used instead of the original resizing function letterbox_image_into in the pjreddie version:
void letterbox_image_into(image im, int w, int h, image boxed)

https://github.com/pjreddie/darknet/blob/532c6e1481e78ba0e27c56f4492a7e8d3cc36597/src/image.c#L913
(in the original repo)
that was resizing the image keeping the aspect ratio and putting it in a letterbox.

It is the same in the OpenCV implementation you pushed some days ago (no kept aspect ratio)
https://github.com/opencv/opencv/blob/73af899b7c737677f008b831c8e61eaeb2984342/samples/dnn/yolo_object_detection.cpp#L60

Why this difference?

It seems to me it is the old behavior of YOLO (v1) : https://github.com/pjreddie/darknet/blame/179ed8ec76f329eb22360440c3836fdcb2560330/src/demo.c#L44
Why didn't you update this behavior in the same way?

It seems to me than not keeping the aspect ratio would mean you HAVE TO use the same aspect ratio for the training images and the test/application images/videos (and I saw you writing that at different places)

Does this mean that a network trained with one version of YOLO isn't fully usable with the other one (having the same results)?

In my case, I have several training images, but some of them are in portrait orientation, while the application video will be a landscape orientation video. Does that mean I can't use them? (Only with your implementation not keeping aspect ratio, or in any case?)

Thank you

@AlexeyAB
Copy link
Owner

Hi,

In the OpenCV version of Yolo you can keep aspect ratio right now, just replace this code:
https://github.com/opencv/opencv/blob/73af899b7c737677f008b831c8e61eaeb2984342/samples/dnn/yolo_object_detection.cpp#L58-L65

    //! [Resizing without keeping aspect ratio]
    cv::Mat resized;
    cv::resize(frame, resized, cv::Size(network_width, network_height));
    //! [Resizing without keeping aspect ratio]

    //! [Prepare blob]
    Mat inputBlob = blobFromImage(resized, 1 / 255.F);
    //! [Prepare blob]

to this:

    //! [Prepare blob]
    Mat inputBlob = blobFromImage(frame, 1 / 255.F, cv::Size(network_width, network_height)); //Convert Mat to batch of images
    //! [Prepare blob]

@AlexeyAB
Copy link
Owner

There are at least 3 versions of Yolo:

  1. Yolo v1 with fully connected layers: https://pjreddie.com/darknet/yolov1/
  2. Yolo v2 fully convolutional network yolo.2.0.cfg and yolo-voc.2.0.cfg - that used in my fork: https://arxiv.org/pdf/1612.08242.pdf
  3. Yolo v2.x with keeping of aspect ratio - current (since 10 Apr 2017): https://pjreddie.com/darknet/yolo/

It seems to me it is the old behavior of YOLO (v1) : https://github.com/pjreddie/darknet/blame/179ed8ec76f329eb22360440c3836fdcb2560330/src/demo.c#L44

No. Yolo v1 used fully conected layers, file yolo_demo.c instead of demo.c and had to small accuracy, you can find Yolo v1 here: https://github.com/AlexeyAB/yolo-windows

This my fork fully corresponds to the Yolo v2 that uses yolo-voc.2.0.cfg or yolo.2.0.cfg and with accuracy 78.6 mAP (VOC 2007), 73.4 mAP (VOC 2012), 44.0 mAP (COCO - table 5): https://arxiv.org/abs/1612.08242

So now with keeping of aspect ratio we can get about 48.1 mAP (COCO) so it adds about +4.1 mAP for COCO: https://pjreddie.com/darknet/yolo/


Why didn't you update this behavior in the same way?

May be I will update it latter. Maybe soon Joseph will release a new version of Yolo with new improvements, and I'll add it all together.

This version of Yolo v2 works a bit worse with different aspects of the training and detection datasets, but it works. Aspect ratio invariance is achieved by using crop that depends on jitter parameter in .cfg-file:

image cropped = crop_image(orig, pleft, ptop, swidth, sheight);

@iraadit
Copy link
Author

iraadit commented Oct 16, 2017

Thank you once again for your completer answer.

If I'm modifying your code to include the keeping of the aspect ratio, would you be interested in a Pull Request?

@iraadit iraadit closed this as completed Oct 16, 2017
@iraadit
Copy link
Author

iraadit commented Oct 16, 2017

For OpenCV, looking at the definition of blobFromImage, it appears to me that the behaviour is different to kept aspect ratio in pjreddie Darknet:

input image is resized so one side after resize is equal to corresponding dimension in size and another one is equal or larger. Then, crop from the center is performed.

It seems to resize and cut the parts of the image that aren't fitting in a square, instead of adding black margins (letterbox)

@iraadit iraadit reopened this Oct 16, 2017
@AlexeyAB
Copy link
Owner

AlexeyAB commented Oct 16, 2017

Yes, you are right, blobFromImage does it in the different way than Darknet.

But there are trade-off in any cases - below is an example of resizing an image in different ways:

  1. blobFromImage(): object size 71 x 43, keeps aspect ratio, but part of image is lost (cropped)
  2. letterbox_image(): the smallest object size 48 x 28, keeps aspect ratio and see whole image
  3. resize_image(): object size 48 x 43, see whole image but doesn't keep aspect ratio

If I'm modifying your code to include the keeping of the aspect ratio, would you be interested in a Pull Request?

Known Yolo problem is a difficult to detect of small objects. And letterbox_image() has the smallest object size 48 x 28.

So yes, I'll apply your pull request, but I think there should be an if-branch that depends on command line flag, which allows us to use the current resize_image() version without keeping the aspect ratio, and letterbox_image() version with keeping the aspect ratio.


For example:

  1. Original image:
    air

  1. Resized (416x416) with keeping aspect ratio - OpenCV blobFromImage():
    air_416x416_cropped

  1. Resized (416x416) with keeping aspect ratio - Darknet letterbox_image():
    air_416x416_letterbox

  1. Resized (416x416) without keeping aspect ratio - this fork of Darknet resize_image():
    air_416x416

@2598Nitz
Copy link

Hi AlexeyAB ,
I found letterbox_image function in image.c file in your repo. But it seems that none of the files use this function. Also, get_image_from_stream_resize is also there in image.c and it has been used in demo.c. So if I replace get_image_from_resize by letterbox_image in demo.c will the aspect ratio be maintained by padding black margin as described above?

@AlexeyAB
Copy link
Owner

AlexeyAB commented Jun 21, 2018

@2598Nitz

Update your code from GitHub.

To use letterbox:

  • for video - set this flag to 1:
    static int letter_box = 0;

    static int letter_box = 1;

@2598Nitz
Copy link

2598Nitz commented Jun 21, 2018

Thanks for your reply.
My purpose is to implement letterbox function at both train and test time.I still have a few questions regarding resizing of images.

  1. The above changes that you mentioned ,are they for training or for testing?If it is for testing what changes should I make for training on custom dataset to implement letterbox function?
    I assume demo.c is for test time and detector.c is for training purpose.

  2. There's one more line in detector.c file(line no. 495) where letterbox is defined.Is that for training purpose?Should I also set it to 1?

  3. Also does resizing implementation vary for different versions of yolo?

@shubham-shahh
Copy link

shubham-shahh commented Oct 3, 2020

Shall I use a varied aspect ratio for training or not? finally, I have to do inference on a constant aspect ratio.and in the .cfg file what should be the width and height? if my training images are of size 150(width)x80(height) to 600(width)x200(height). I am using YOLOV4
thanks @AlexeyAB

@developer0hye
Copy link

@AlexeyAB

I know it's the old issue, but i think this issue is very important.

You said the resize method keeping aspect ratio improved model's performance.

#232 (comment)

But, it seems that your preprocessing method is based on the resize method not keeping aspect ratio.

Are there any reason for that?

@AlexeyAB
Copy link
Owner

@developer0hye

On MSCOCO letter_box=1 works well (for training and detection) if you use network resolution width=608 height=608 in cfg-file or higher.

@developer0hye
Copy link

@AlexeyAB
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Explanations Explanations of the source code, algorithms or method of use question
Projects
None yet
Development

No branches or pull requests

7 participants