Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Greater IoU than Recall? #35

Open
PatricLee opened this issue Mar 4, 2017 · 13 comments
Open

Greater IoU than Recall? #35

PatricLee opened this issue Mar 4, 2017 · 13 comments

Comments

@PatricLee
Copy link

Hi,
I'm training YOLO over VOC 2007 & 2012. While I want to get the curve of IoU and Recall, I validated every weight in /backup, but I noticed immediately after validating on 2007_test with yolo-voc_100.weights, that the average IoU is 44.04% and Recall 29.57%.
As I see it, IoU is Area of Overlap/Area of Union, and Recall is Area of Overlap/Area of Object, and since Area of Object is no larger than Area of Union, Recall should always be greater than (or equal to) IoU , yet my data shows something different.
Please tell me which part did I get it wrong.

@PatricLee
Copy link
Author

BTW, I've also noticed that learning rate would be 10 times larger after 100 iterations, for example when I set learning rate to be 0.0001 like in the example, it automatically changes to 0.001 after 100 iterations and the network diverges. So I had to set learning rate to 0.00001 so that learning rate would be 0.0001 and the network worked just fine. Is it programmed this way?

@AlexeyAB
Copy link
Owner

AlexeyAB commented Mar 5, 2017

Yes, strictly speaking, the Recall should always be greater than (or equal to) IoU. But Yolo calculates average of the best IoUs instead of average IoU. And calculates True Positives instead of Recall.
That's why I advise you to pay attention to IoU (best IoU closer to IoU, than True Positive to Recall): https://github.com/AlexeyAB/darknet#when-should-i-stop-training

https://en.wikipedia.org/wiki/Precision_and_recall
68747470733a2f2f6873746f2e6f72672f66696c65732f6361382f3836362f6437362f63613838363664373666623834303232383934306462663434326137663036612e6a7067


Yolo calculates average of the best IoUs instead of average IoU. And calculates True Positives instead of Recall.

fprintf(stderr, "%5d %5d %5d\tRPs/Img: %.2f\tIOU: %.2f%%\tRecall:%.2f%%\n", i, correct, total, (float)proposals/(i+1), avg_iou*100/total, 100.*correct/total);

@AlexeyAB
Copy link
Owner

AlexeyAB commented Mar 5, 2017

BTW, I've also noticed that learning rate would be 10 times larger after 100 iterations, for example when I set learning rate to be 0.0001 like in the example, it automatically changes to 0.001 after 100 iterations and the network diverges. So I had to set learning rate to 0.00001 so that learning rate would be 0.0001 and the network worked just fine. Is it programmed this way?

"the network worked just fine" - It depends on the number of classes and the number of images. For PascalVOC seems optimal values in the yolo-voc.cfg

How it is programmed - see paragraph 5: #30 (comment)

If learning_rate = 0.0001, policy=steps, steps=100,25000,35000 and scales=10,.1,.1 then actual learning_rate will be:

  • [0 - 100] iterations learning_rate will be 0.0001
  • [100 - 25000] iterations learning_rate will be 0.001
  • [25000 - 35000] iterations learning_rate will be 0.0001
  • [35000 - ...] iterations learning_rate will be 0.00001

@PatricLee
Copy link
Author

Well that's why my Recall curve looked so much like the true positive curve.

Thank you for you reply though, and your amazing work.

I've finished the training on VOC dataset, validated the network on VOC testing set and compared my result to yolo-voc.weights I downloaded. I noticed that although I'm getting about as many true positives and as much average IoU as the downloaded network, my network has noticeably more RPs/Img (about 160 vs 75), so there I have some questions:

  • Does this mean that my RPN part has not yet converged and require further training?
  • Will this (more region proposals per image) cause performance issue, like more time spent when detecting objects?

@AlexeyAB
Copy link
Owner

AlexeyAB commented Mar 7, 2017

my network has noticeably more RPs/Img (about 160 vs 75), so there I have some questions:

Does this mean that my RPN part has not yet converged and require further training?

Hard to say. But also it may be some effect of the Bug on Windows that I corrected just that: 4422399

Will this (more region proposals per image) cause performance issue, like more time spent when detecting objects?

No, this should not significantly affect performance.

@PatricLee
Copy link
Author

Thanks for the correction, Alexey, it seems to work... I couldn't tell for now though

One last question. Since I'm currently working on autonomous driving, my camera has a really wild angle and a weird aspect ratio of about 3:1, so,
-Is it possible to modify the input of the network so that the network also has a aspect ratio of 3:1 (say inputs would be 600x200)? And if it is, where do I have to modify except 'height' and 'width' in the .cfg file?
-Will this lead to a performance improvement (or greater IoU, to be more specifically) in my scenario, compared to network with 1:1 aspect ratio, like 416x416?

For now I'm getting an average IoU of about 65% on my data set, and that's not so good when it detects object for autonomous driving. I wonder if I could improve this somehow.

Again, thank you for your amazing work and amazing answers.

@AlexeyAB
Copy link
Owner

AlexeyAB commented Mar 8, 2017

You can try to set width=608 and height=224

height=416

  1. It must always be a multiple of 32, such as 608x224, not 600x200
  2. I didn't test non-square resolution, so I can't said will there any bugs or undefined behavior.

I used Yolo to detection wide image (stitched 8 cameras) with wide-angle ~200, but I divide it to many 416x416 square images and run Yolo for each square-image on separate 4 GPU.

I think if your training-dataset has the same aspect ration 3:1 such as detection-dataset, then you should use square resolution 416x416.


To increase IoU:

  1. You can train Yolo with flag random=1

    random=0

  2. You can train Yolo with multiplied steps at number_of_classes/20, for example if you use 6 classes then steps=100,7500,10000

    steps=100,25000,35000

  3. For detection (not for training) you can use larger resolution, for example 832x832 by using weights-file trained on 416x416 resolution.
    (Or if you trained on resolution 608x224 then you can change resolution to 1216x448 after training).

  4. Also may be, for detection (not for training) you should rescale ahcnors from 16:9 to 3:1, i.e. divide each second value by 1.7, it should be anchors = 1.08,0.71, 3.42,2.59, 6.63,6.69, 9.42,3.00, 16.62,6.19:

    anchors = 1.08,1.19, 3.42,4.41, 6.63,11.38, 9.42,5.11, 16.62,10.52

@PatricLee
Copy link
Author

Thank you so much for your answer, I will try them out.

@iraadit
Copy link

iraadit commented Mar 21, 2017

Hi @PatricLee
Have you tried to train with the non-square size? Was it working?

@PatricLee
Copy link
Author

Hi @iraadit , sorry for the late reply.
I tried training a 960x320 network on my dataset the other day and it worked fine. It took fewer iterations to train (or at least it made me feel this way) and it has slightly higher accuracy than the 416x416 network I trained earlier, probably because 960x320 resolution is larger than 416x416.

But if you are in the same scenario as I am, where all the data have the same aspect ratio, then maybe Alexey is right, it makes no point that you train a non-square network that has the same aspect ratio as the data, instead of a square one.

@MyVanitar
Copy link

Also may be, for detection (not for training) you should rescale ahcnors from 16:9 to 3:1, i.e. divide each second value by 1.7, it should be anchors = 1.08,0.71, 3.42,2.59, 6.63,6.69, 9.42,3.00, 16.62,6.19:

Why we should not train the model with new calculated anchors?

I think if your training-dataset has the same aspect ration 3:1 such as detection-dataset, then you should use square resolution 416x416.

How can we calculate this when each image has its own width and height?
if you think it is a good idea, we can pad (add black area around image) and make them all have the same size (for example 960 * 960) and then start to annotate them.

@Brandy24
Copy link

Brandy24 commented Feb 12, 2020

i am training 5 classes on cpu(intel core i7-5500 2.4Ghz) 8gb ram. how many pictures should i train per classes to have good result? and how long will it take to finish?

@stephanecharette
Copy link
Collaborator

i am training 5 classes on cpu(intel core i7-5500 2.4Ghz) 8gb ram. how many pictures should i train per classes to have good result? and how long will it take to finish?

Not sure why you chose this closed issue to post your question. But I would argue that you cannot possibly train a 5-class network on a CPU. It would take weeks if not months to train. Get yourself a decent GPU, or rent one from Amazon AWS, Linode, Google, Azure, etc...

See this recent post I made about a 2-class network. It took 4 hours to train a network with a GPU, but it would have taken 16 days on my 16-core 3.2 GHz CPU: https://www.ccoderun.ca/programming/2020-01-04_neural_network_training/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants