Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YOLOv3-tiny in Darknet vs OpenCV DNN: large objects are missed #8146

Closed
stephanecharette opened this issue Oct 7, 2021 · 19 comments
Closed

Comments

@stephanecharette
Copy link
Collaborator

I trained a YOLOv3-tiny network using several dash-cam datasets. I then run this network in the following 4 scenarios:

  • Darknet (CPU)
  • Darknet (CUDA)
  • OpenCV DNN (CPU)
  • OpenCV DNN (CUDA)

Other than the obvious timing differences, the results are nearly 100% identical. With 1 exception: when using OpenCV DNN -- both CPU and CUDA -- large objects seem to be missed.

Here is an example frame grab. The two on the left are Darknet, the two on the right are OpenCV DNN:

image

Source: https://www.youtube.com/watch?v=fFYV2uPt-XI

You can see even small objects like the traffic lights are detected correctly. But the large vehicles in the foreground are missed. Anyone know why large objects might be missed when using OpenCV DNN?

Network was trained using these options:

image

@WongKinYiu
Copy link
Collaborator

maybe same issue opencv/opencv#17205?

@byte-6174
Copy link

I think this has to do with how nms is utilized by default on openCV. for a darknet [yolo] layer, if there is no nms_threshold parameter specified the openCV defaults is set to 0.0. I am not sure if this should be defaulted like this. I tried mimicking this behavior explicitly by setting the nms_threshold to 0.0 (indices = cv.dnn.NMSBoxes(boxes, confidences, conf, 0.0)) and I do see that large objects are eliminated.
See a demo here. I am varying the confidence threshold and holding the nms threshold constant at 0.0 and as you can see the large airplane is never "detected"
See here nms threshold is defaulted to 0.0

@stephanecharette are you setting nms in ur cfg file?, if not can you try setting it (set to a low value first and try varying it to see if you get the large cars detected) and running ur video again to confirm if this is the source of the issue?

@stephanecharette
Copy link
Collaborator Author

I tried to explicitly set nms_threshold in both of the [yolo] sections. Not knowing what value to use, I tried 0.2, 0.4, and 0.6. While it did make a slight difference, most "large" objects are still being missed when using OpenCV vs Darknet.

For example, see the car at the very left side of this frame:

image

@stephanecharette
Copy link
Collaborator Author

@AlexeyAB Do you have any insight into why the same network would fail to detect large objects when running via OpenCV DNN vs Darknet?

@AlexeyAB
Copy link
Owner

@stephanecharette Did you try to use cv.resize() to resize src_img to the network_size and then apply OpenCV-dnn?

About different resizing approaches: #232 (comment)

  • OpenCV-dnn by default used Letter_box resizing with keepeing aspect ration and discarding the part of the image that does not fit.
  • While Darknet by default uses Resize (without keeping aspect ratio), or if [net] letter_box=1 in cfg or -letter_box flag is used - Letter_box with keepeing aspect ration and padding

@stephanecharette
Copy link
Collaborator Author

I used resize() to make sure the input image matches the exact network dimensions, stretching the image and ignoring the aspect ratio just like Darknet does.

@stephanecharette
Copy link
Collaborator Author

See the 1st image at the top of this issue. The images are 1280x720, while the network measures 640x352. So the images have an aspect ratio of 1.78 and the network is 1.81. Not exactly the same, but as close as I could come. Also note the 1st image at the top of this issue shows the car in the very center of the image is "missed" by OpenCV. It isn't at the edge of the image.

@AlexeyAB
Copy link
Owner

It would be great if you can check which of these models produce different results in Darknet and OpenCV: yolov4.weights, yolov4-csp-x-swish.weights and yolov4-tiny.weights https://github.com/AlexeyAB/darknet#pre-trained-models

Initially when I added YOLOv2 to the OpenCV, I added tests to check that it produces identical results in both OpenCV and Darknet: opencv/opencv#9705

Now OpenCV>=4.5.4 supports Scaled-YOLOv4 ( opencv/opencv#18975 , opencv/opencv#20671 , opencv/opencv#20818 ) and all these models: https://github.com/AlexeyAB/darknet#pre-trained-models
And as I can see tests for YOLOv4 and Scaled-YOLOv4 are also used.

@YashasSamaga
Copy link

@stephanecharette
Copy link
Collaborator Author

No, unfortunately this is not the solution for me. I added thresh=0.01 to both of the [yolo] sections, but when I compare Darknet and OpenCV DNN output I can still see the OpenCV one misses a lot of objects. In this screenshot, Darknet is on the left and OpenCV 4.5.3 is on the right. Note the red vehicle in the middle, and the pedestrians:

image

@AxelWS
Copy link

AxelWS commented Aug 30, 2022

I was trying to use YOLO in OpenCV 4.5.0 but did run into similar problems. Looking deeper into the darknet code starting from yolo_v2_class.cpp it looks like detection results are taken from all YOLO layers, not only from the last layer, and then put into a single NMS. That could explain some missing detections. Also the code in yolo_v2_class.cpp is slightly different from the code darknet seems to use during training, e.g. NMS threshold 0.4 instead of 0.45 and it does not consider nms_kind layer parameter. It seems to be difficult to evaluate a YOLO net exactly right in an application.

@stephanecharette
Copy link
Collaborator Author

I was trying to use YOLO in OpenCV 4.5.0 but did run into similar problems.

Glad to see I'm not the only person running into this problem. It has also been discussed several times on the Darknet/YOLO discord, but so far I don't know of anyone who knows how to solve the problem.

@AxelWS
Copy link

AxelWS commented Sep 1, 2022

@stephanecharette If you mean the general problem (correct default parameters, which way of image resizing to use, which kind of NMS, etc.) I also don't know how to establish a reference implementation that can be used by darknet, DarkHelper, and everyone.

If you just ask how to get all yolo layer outputs from OpenCV, this did work for me:
  std::vector<cv::String> outputLayerNames;
  for (cv::String const layerName : network->getLayerNames())
    if (layerName.rfind("yolo_", 0) == 0) // if layerName starts with yolo
      outputLayerNames.push_back(layerName);
  std::vector<std::vector<cv::Mat>> outMats;
  network->forward(outMats, outputLayerNames); //compute output
And then collect results:
  for (int outIndex = 0; outIndex < int(outputLayerNames.size()); outIndex++)
  {
    auto& outMat = outMats[outIndex][0];
[...]
  }
Finally do NMS over all collected detections. Preserve class probabilities and "objectness" if desired. The way I understand it 1-objectness is kind of a rejection class probability. (By the way, getting correct probabilities out of darknet is tricky, because a) probabilities are already multiplied by objectness b) anything lower than thresh 0.2 is set to 0.0 and c) NMS in darknet sets some non-maximal probabilities to 0.0.)

@stephanecharette
Copy link
Collaborator Author

No, that's not what I mean. See the top of this post for the problem. This has been discussed on the discord server many times. Everyone who tries to use YOLOv4-tiny when using OpenCV's DNN module is stuck with the exact same issue where larger objects are not detected.

@AxelWS
Copy link

AxelWS commented Sep 1, 2022

Stéphane, I carefully read what this issue is about. Sorry for digressing into probabilities and other problems. My main point is: all OpenCV examples I find read output only from the last network layer, which is a YOLO layer. The tiny network I use has another YOLO layer in the middle. Darknet reads output from both YOLO layers. So when using OpenCV's DNN you have to do that too.

@stephanecharette
Copy link
Collaborator Author

Oh... I'm using cv::dnn::Net::forward() to get the output mat. But you're saying that is just the very last YOLO layer, and I need to do the same thing with all the other YOLO layers? Hmmm. Maybe use the forward() that takes an array of mats?

Let me look into that. Would be nice to finally understand what is going on and have this fixed!

@AxelWS
Copy link

AxelWS commented Sep 1, 2022

Did you read the code I posted above? It did improve results for me.

@stephanecharette
Copy link
Collaborator Author

I didn't understand it. Am attempting to figure it out now. This is what I'm working with: https://github.com/stephanecharette/DarkHelp/blob/master/src-lib/DarkHelpNN.cpp#L1044

@stephanecharette
Copy link
Collaborator Author

@AxelWS Thank you so much! Got it working. That was the key point, to take the output from all the yolo layers instead of just the last one. I'll update DarkHelp with these changes later tonight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants