-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detection on many varied-size objects #333
Comments
@ilkarman Hi,
How to calculate Receptive field: receptive_field.xlsx Receptieve field mean, that one final activation can't see more than this window (566 x 566 for yolo-voc.cfg), but neural network can know that this is only small part of large object, and bounded box can be more than 566 x 566, so you can get IoU = ~1.
|
@AlexeyAB thank you very much for such a comprehensive response. You have pretty much cleared up all my questions and your response along with Excel is really useful for me! |
I tested the model with modified anchors many times, both training and testing (training and testing with new anchors), but none could achieve the results better than the original anchors, do you know why? |
@VanitarNordic Are the sizes of your custom objects very different to the Pascal VOC data? Did you also generate 5 clusters or did you try a few more? Can it be that since the network learns to predict ground-truth boxes relative to the location of the anchor-boxes it doesn't matter too much what the initial starting values are as it will compensate for it whilst training? Or is the prediction of a bounding-box bounded somehow by the original anchor specified (e.g. logistic function bounds output to a max of 1). However this then means that supplying new anchors for testing phase only will through the network off because the offsets it has learnt (relative to old anchors) will now be completely different. |
@VanitarNordic
|
I used 5 anchors. I have one class of apples. So you can imagine the shapes and ground truths. I used this script to generate anchors for images: https://github.com/Jumabek/darknet_scripts
No, I tested my own model when I used the original anchors and when I used the new anchors
No, for the both experiments, random was equal to 0. |
Is there a limit on the predicted offsets of ground-truth relative to anchor boxes or can they be any number? E.g. if your anchor is way off, the network will just predict a much bigger offset? Since you mention random=1; I want to add multi-scale training to my training example. My base network-size is 1248 (for some reason increasing this even higher makes perf worse which I don't understand, also if it's lower then my small objects can't be detected). I want to implement multi-scale training and was thinking of editing the src to this:
@VanitarNordic What is your network size? Also curious how you judge performance ... by visually inspecting a few images or mAP? |
@VanitarNordic No any idea why is this happen. |
@ilkarman Yes, you can use this line If you use cfg-file based on On the input of neural network for best detection should be If we want to know MIN and MAX obj size on the image (because images has a different resolution), then we use coefficient
|
If you use yolo-voc.2.0.cfg with resolution 1248x1248, then final_feature_map=39x39.
|
@AlexeyAB Thanks once again for your detailed response. @VanitarNordic
I thought the input to the network is padded when resized to preserve the aspect ratio. I notice this because my visualised anchor boxes are rectangles but my ground-truth boxes are all squares. However, this is consistent if the network resizes without preserving aspect |
No, I did not face any error, but in my case, the same as you, I have square rectangles for objects (apples), but anchors are rectangle. I had a discussion there about this you may find it on that repo issues. |
@VanitarNordic Right, interesting discussion. I guess the answer depends on whether this implementation of YOLO preserves aspect ratio or not when resizing. If it does then maybe that explains why performance deteriorates when using custom rectangular anchors for square ground-truth bounding boxes. It seems from AlexeyAB's post that this version does not keep aspect but pjreddie's version does. In which case the rectangular boxes may be correct. I think? |
Yes, this fork doesn't keep aspect ratio, because I use it for Training and Detection on the images with the same resolution, therefore, distortions are identical. For example I train the network on frames from video-stream. |
:-) Finally, I did not understand which one is good :-( problem is with the repo or with the generated anchors. |
@VanitarNordic I think the problem would only come if you use that script to generate anchors for pjreddie's version ? |
The goal is to make the anchors in accordance with our custom images. When I generate using that script, the results get worse. |
and I am sure the results are getting worse, because I test on the same images, videos for the both experiments, with original anchors or with new generated anchors using the above mentioned script. |
Alexey thank you very much for this repo. I have also browsed the closed issues and found your responses very helpful (moreso than Google).
I wanted to ask some specific questions about detecting objects in high (but also varying) counts, with varying sizes and potentially closely overlapping bounding boxes. I have around 200 images that contain logs. The number of logs per image may vary from perhaps 5 to 1000. This also means that the size of the logs varies – usually if there is a big stack of 1000 then they are small (zoomed-out) and if there are only a couple the photo is more zoomed in.
Thanks very much, Ilia
The text was updated successfully, but these errors were encountered: