-
Notifications
You must be signed in to change notification settings - Fork 7.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delta and binary cross-entropy loss #1695
Comments
@doobidoob Hi, In general, there are two types of classification:
|
@AlexeyAB Thanks for your reply! [1]. When using binary cross-entropy, why should "(1-logistic_activation(x))" or "(-logistic_activation(x))" be applied to the delta? I want to change the loss function, but it is not easy to apply it to the code... Thanks in advance for your advice. |
There is Binary cross-entropy
We do it here: Line 143 in 5275787
The same:
Free-form reasoning - in general, in the Yolo v3:
where is In general, there are two types of classification:
There is used Binary cross-entropy with Logistic activation (sigmoid) for multi-label classification in the Yolo v3, so each bonded box (each anchor) can have several classes. For example, one bounded box can be So:
So:
Where is As said in the MXNET doc: https://gluon.mxnet.io/chapter02_supervised-learning/logistic-regression-gluon.html
https://peterroelants.github.io/posts/cross-entropy-logistic/ |
This is very similar to the Yolo v2 with Categorical cross-entropy and Softmax activation for multi-class classification: https://peterroelants.github.io/posts/cross-entropy-softmax/ We do it here: Line 154 in 5275787
The same:
|
Hi @AlexeyAB Thanks for your detailed explanation. I wonder for Binary cross-entropy, why the regularization is not included for the loss function, the smooth l1 loss is added for the bounding box although. |
What do you mean? |
Sorry @AlexeyAB maybe I didn't say it clearly. I mean is any regularization, like l1-norm or l2-norm, for the loss function in the bounding box regression or object classification? (as the overall loss function for YOLO is the sum of squares of deltas for all) For smooth l1 loss, I mean in the following links, they mentioned SSD, fast/faster rcnn are used it for box regression, and R-CNN and SPPNet used L2 loss. So, I wonder why the regularization is not added to classification. As a side note, I am not sure whether YOLO has used any regularization loss for bounding box regression? Hope it's clear to you. Thanks |
question for @AlexeyAB |
Interested, could you upload one this kind of picture, "two labels for the same box", please? |
@i-chaochen |
I see your point. I don't think I ever meet this kind of problem and always have one label for one box. You might have a look that how to use API for Yolo as DLL and SO libraries. If you're using softmax at the last cost function layer, it shall be one class label only as softmax choosing the maximum one. threshold is for NMS, which is to replace the redundant and overlapping bounding boxes rather than the class label. |
@LucWuytens You can implement this in your application code - reject one of detection with lower confidence_score if bboxes are equal. |
@AlexeyAB @i-chaochen Given that there are multi objects belonging to different classes in an image, and one of them was detected out two classes(e.g., person, machine) by YOLOv3, i.e. an object has two labels, if set a higher threshold of confidence score to try to filter out the one with lower score from the two labels, this would perhaps also filter out some other objects detected out in the same image, so, this way introduced a degradation in recall. Still didn't find a more proper way to resolve this contradiction. |
@AlexeyAB Thanks for your support.
I have a few questions about yolo_layer.c
[1]. In yolo_layer.c, i know that "delta" means gradient.
However, as shown below, Delta is expressed as the difference value.
Is not this a gradient for the MSE loss?
In YOLOv3 paper, The author mentioned the following.
"During training we use binary cross-entropy loss for the class predictions."
Why does the class loss function above mean binary cross-entropy?
[2]. Is the following "l.cost" used for back-propagation? Or is it simply for print value?
[3]. I want to change YOLOv3 to output additional information. So, I am trying to modify the loss function. In this case, should I fill the "delta" of yolo_layer.c with the gradient of the desired loss function such as log-likelihood or Binary cross-entropy?
Besides this, Is there anything else to consider?
I'm sorry to have to ask you a question not related to the code. But I'm a beginner... I would like to listen to your advice.
Thank you very much.
The text was updated successfully, but these errors were encountered: