Greater IoU than Recall? #35

PatricLee · 2017-03-04T03:33:32Z

Hi,
I'm training YOLO over VOC 2007 & 2012. While I want to get the curve of IoU and Recall, I validated every weight in /backup, but I noticed immediately after validating on 2007_test with yolo-voc_100.weights, that the average IoU is 44.04% and Recall 29.57%.
As I see it, IoU is Area of Overlap/Area of Union, and Recall is Area of Overlap/Area of Object, and since Area of Object is no larger than Area of Union, Recall should always be greater than (or equal to) IoU , yet my data shows something different.
Please tell me which part did I get it wrong.

PatricLee · 2017-03-04T03:38:28Z

BTW, I've also noticed that learning rate would be 10 times larger after 100 iterations, for example when I set learning rate to be 0.0001 like in the example, it automatically changes to 0.001 after 100 iterations and the network diverges. So I had to set learning rate to 0.00001 so that learning rate would be 0.0001 and the network worked just fine. Is it programmed this way?

AlexeyAB · 2017-03-05T11:17:49Z

Yes, strictly speaking, the Recall should always be greater than (or equal to) IoU. But Yolo calculates average of the best IoUs instead of average IoU. And calculates True Positives instead of Recall.
That's why I advise you to pay attention to IoU (best IoU closer to IoU, than True Positive to Recall): https://github.com/AlexeyAB/darknet#when-should-i-stop-training

https://en.wikipedia.org/wiki/Precision_and_recall

Yolo calculates average of the best IoUs instead of average IoU. And calculates True Positives instead of Recall.

IoU - Yolo calculates the best IoUs for each image:
and get average: avg_iou*100/total
Recall - Yolo calculates True positives
and get average: 100.*correct/total

darknet/src/detector.c

Line 432 in b3a3e92

    
           fprintf(stderr, "%5d %5d %5d\tRPs/Img: %.2f\tIOU: %.2f%%\tRecall:%.2f%%\n", i, correct, total, (float)proposals/(i+1), avg_iou*100/total, 100.*correct/total);

AlexeyAB · 2017-03-05T11:24:54Z

BTW, I've also noticed that learning rate would be 10 times larger after 100 iterations, for example when I set learning rate to be 0.0001 like in the example, it automatically changes to 0.001 after 100 iterations and the network diverges. So I had to set learning rate to 0.00001 so that learning rate would be 0.0001 and the network worked just fine. Is it programmed this way?

"the network worked just fine" - It depends on the number of classes and the number of images. For PascalVOC seems optimal values in the yolo-voc.cfg

How it is programmed - see paragraph 5: #30 (comment)

If learning_rate = 0.0001, policy=steps, steps=100,25000,35000 and scales=10,.1,.1 then actual learning_rate will be:

[0 - 100] iterations learning_rate will be 0.0001
[100 - 25000] iterations learning_rate will be 0.001
[25000 - 35000] iterations learning_rate will be 0.0001
[35000 - ...] iterations learning_rate will be 0.00001

PatricLee · 2017-03-05T11:40:37Z

Well that's why my Recall curve looked so much like the true positive curve.

Thank you for you reply though, and your amazing work.

I've finished the training on VOC dataset, validated the network on VOC testing set and compared my result to yolo-voc.weights I downloaded. I noticed that although I'm getting about as many true positives and as much average IoU as the downloaded network, my network has noticeably more RPs/Img (about 160 vs 75), so there I have some questions:

Does this mean that my RPN part has not yet converged and require further training?
Will this (more region proposals per image) cause performance issue, like more time spent when detecting objects?

AlexeyAB · 2017-03-07T12:48:24Z

my network has noticeably more RPs/Img (about 160 vs 75), so there I have some questions:

Does this mean that my RPN part has not yet converged and require further training?

Hard to say. But also it may be some effect of the Bug on Windows that I corrected just that: 4422399

Will this (more region proposals per image) cause performance issue, like more time spent when detecting objects?

No, this should not significantly affect performance.

PatricLee · 2017-03-08T03:32:52Z

Thanks for the correction, Alexey, it seems to work... I couldn't tell for now though

One last question. Since I'm currently working on autonomous driving, my camera has a really wild angle and a weird aspect ratio of about 3:1, so,
-Is it possible to modify the input of the network so that the network also has a aspect ratio of 3:1 (say inputs would be 600x200)? And if it is, where do I have to modify except 'height' and 'width' in the .cfg file?
-Will this lead to a performance improvement (or greater IoU, to be more specifically) in my scenario, compared to network with 1:1 aspect ratio, like 416x416?

For now I'm getting an average IoU of about 65% on my data set, and that's not so good when it detects object for autonomous driving. I wonder if I could improve this somehow.

Again, thank you for your amazing work and amazing answers.

AlexeyAB · 2017-03-08T10:33:07Z

You can try to set width=608 and height=224

darknet/cfg/yolo-voc.cfg

Line 4 in 4e9798d

height=416

It must always be a multiple of 32, such as 608x224, not 600x200
I didn't test non-square resolution, so I can't said will there any bugs or undefined behavior.

I used Yolo to detection wide image (stitched 8 cameras) with wide-angle ~200, but I divide it to many 416x416 square images and run Yolo for each square-image on separate 4 GPU.

I think if your training-dataset has the same aspect ration 3:1 such as detection-dataset, then you should use square resolution 416x416.

To increase IoU:

You can train Yolo with flag random=1

darknet/cfg/yolo-voc.cfg

Line 244 in 4e9798d

random=0
You can train Yolo with multiplied steps at number_of_classes/20, for example if you use 6 classes then steps=100,7500,10000

darknet/cfg/yolo-voc.cfg

Line 17 in 4e9798d

steps=100,25000,35000
For detection (not for training) you can use larger resolution, for example 832x832 by using weights-file trained on 416x416 resolution.
(Or if you trained on resolution 608x224 then you can change resolution to 1216x448 after training).
Also may be, for detection (not for training) you should rescale ahcnors from 16:9 to 3:1, i.e. divide each second value by 1.7, it should be anchors = 1.08,0.71, 3.42,2.59, 6.63,6.69, 9.42,3.00, 16.62,6.19:

darknet/cfg/yolo-voc.cfg

Line 228 in 4e9798d

anchors = 1.08,1.19, 3.42,4.41, 6.63,11.38, 9.42,5.11, 16.62,10.52

PatricLee · 2017-03-08T11:05:56Z

Thank you so much for your answer, I will try them out.

iraadit · 2017-03-21T17:39:14Z

Hi @PatricLee
Have you tried to train with the non-square size? Was it working?

PatricLee · 2017-03-29T11:03:54Z

Hi @iraadit , sorry for the late reply.
I tried training a 960x320 network on my dataset the other day and it worked fine. It took fewer iterations to train (or at least it made me feel this way) and it has slightly higher accuracy than the 416x416 network I trained earlier, probably because 960x320 resolution is larger than 416x416.

But if you are in the same scenario as I am, where all the data have the same aspect ratio, then maybe Alexey is right, it makes no point that you train a non-square network that has the same aspect ratio as the data, instead of a square one.

MyVanitar · 2017-08-31T14:10:39Z

Also may be, for detection (not for training) you should rescale ahcnors from 16:9 to 3:1, i.e. divide each second value by 1.7, it should be anchors = 1.08,0.71, 3.42,2.59, 6.63,6.69, 9.42,3.00, 16.62,6.19:

Why we should not train the model with new calculated anchors?

I think if your training-dataset has the same aspect ration 3:1 such as detection-dataset, then you should use square resolution 416x416.

How can we calculate this when each image has its own width and height?
if you think it is a good idea, we can pad (add black area around image) and make them all have the same size (for example 960 * 960) and then start to annotate them.

Brandy24 · 2020-02-12T23:29:52Z

i am training 5 classes on cpu(intel core i7-5500 2.4Ghz) 8gb ram. how many pictures should i train per classes to have good result? and how long will it take to finish?

stephanecharette · 2020-02-13T01:23:08Z

i am training 5 classes on cpu(intel core i7-5500 2.4Ghz) 8gb ram. how many pictures should i train per classes to have good result? and how long will it take to finish?

Not sure why you chose this closed issue to post your question. But I would argue that you cannot possibly train a 5-class network on a CPU. It would take weeks if not months to train. Get yourself a decent GPU, or rent one from Amazon AWS, Linode, Google, Azure, etc...

See this recent post I made about a 2-class network. It took 4 hours to train a network with a GPU, but it would have taken 16 days on my 16-core 3.2 GHz CPU: https://www.ccoderun.ca/programming/2020-01-04_neural_network_training/

AlexeyAB mentioned this issue Jul 27, 2017

Yolo training does not stop, and what type of images can be used AlexeyAB/Yolo_mark#11

Open

Francis-Xia mentioned this issue Jan 30, 2019

one region is always nan #2321

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Greater IoU than Recall? #35

Greater IoU than Recall? #35

PatricLee commented Mar 4, 2017

PatricLee commented Mar 4, 2017

AlexeyAB commented Mar 5, 2017

AlexeyAB commented Mar 5, 2017

PatricLee commented Mar 5, 2017

AlexeyAB commented Mar 7, 2017 •

edited

Loading

PatricLee commented Mar 8, 2017

AlexeyAB commented Mar 8, 2017 •

edited

Loading

PatricLee commented Mar 8, 2017

iraadit commented Mar 21, 2017

PatricLee commented Mar 29, 2017

MyVanitar commented Aug 31, 2017

Brandy24 commented Feb 12, 2020 •

edited

Loading

stephanecharette commented Feb 13, 2020

Greater IoU than Recall? #35

Greater IoU than Recall? #35

Comments

PatricLee commented Mar 4, 2017

PatricLee commented Mar 4, 2017

AlexeyAB commented Mar 5, 2017

AlexeyAB commented Mar 5, 2017

PatricLee commented Mar 5, 2017

AlexeyAB commented Mar 7, 2017 • edited Loading

PatricLee commented Mar 8, 2017

AlexeyAB commented Mar 8, 2017 • edited Loading

PatricLee commented Mar 8, 2017

iraadit commented Mar 21, 2017

PatricLee commented Mar 29, 2017

MyVanitar commented Aug 31, 2017

Brandy24 commented Feb 12, 2020 • edited Loading

stephanecharette commented Feb 13, 2020

AlexeyAB commented Mar 7, 2017 •

edited

Loading

AlexeyAB commented Mar 8, 2017 •

edited

Loading

Brandy24 commented Feb 12, 2020 •

edited

Loading