Delta and binary cross-entropy loss #1695

doobidoob · 2018-10-01T17:35:50Z

@AlexeyAB Thanks for your support.
I have a few questions about yolo_layer.c

[1]. In yolo_layer.c, i know that "delta" means gradient.
However, as shown below, Delta is expressed as the difference value.
Is not this a gradient for the MSE loss?

delta[index + 0*stride] = scale * (tx - x[index + 0*stride]);

delta[index + 1*stride] = scale * (ty - x[index + 1*stride]);

delta[index + 2*stride] = scale * (tw - x[index + 2*stride]);

delta[index + 3*stride] = scale * (th - x[index + 3*stride]);

l.delta[obj_index] = - l.output[obj_index];

l.delta[obj_index] = 1 - l.output[obj_index];

delta[index + stride*n] = (((n == class_id) ? 1 : 0) - output[index + stride*n]);

In YOLOv3 paper, The author mentioned the following.
"During training we use binary cross-entropy loss for the class predictions."
Why does the class loss function above mean binary cross-entropy?

[2]. Is the following "l.cost" used for back-propagation? Or is it simply for print value?

*(l.cost) = pow(mag_array(l.delta, l.outputs * l.batch), 2);

[3]. I want to change YOLOv3 to output additional information. So, I am trying to modify the loss function. In this case, should I fill the "delta" of yolo_layer.c with the gradient of the desired loss function such as log-likelihood or Binary cross-entropy?
Besides this, Is there anything else to consider?
I'm sorry to have to ask you a question not related to the code. But I'm a beginner... I would like to listen to your advice.
Thank you very much.

The text was updated successfully, but these errors were encountered:

AlexeyAB · 2018-10-01T18:38:35Z

@doobidoob Hi,

In general, there are two types of classification:

multi-label classification - each bounded box (each anchor) can have several classes. And in total there are in the neural model >= 1 classes. There is used Binary cross-entropy with Logistic activation (sigmoid). Is used in Yolo v3
multi-class classification - each bounded box (each anchor) can have only one classes. And in total there are in the neural model >= 1 classes. There is used Categorical cross-entropy with Softmax activation. Is used in Yolo v2

For independent outputs (x,y,w,h,t0 and multi-label classifications as in yolo v3) is better to use Binary cross-entropy, when each bounded box can predict several objects at a time: https://stats.stackexchange.com/a/288456/111998 So we use logistic activation (sigmoid) as logistic regression algorithm for binary classification: yes/no car, yes/no person, yes/no dog,... So for a single bounded box can be: person(yes), car(yes), dog(no) - for example, if in a single bounded box there are person and car: https://www.reddit.com/r/learnmachinelearning/comments/88g8zf/difference_between_binary_cross_entropy_and/

for multi-label classifications is used Binary cross-entropy:
delta = (n == class_id) ? (1 - logistic_activation(x)) : (-logistic_activation(x)) ;
for multi-class classification is used Categorical cross entropy:
delta = (n == class_id) ? (1 - softmax(x, x_array)) : (-softmax(x, x_array)) ;

This *(l.cost) = pow(mag_array(l.delta, l.outputs * l.batch), 2); is used only for print value as avg loss. This is summary loss for (x,y,w,h,t0,probabilities...) for all anchors for all final activations
If you want to change loss function to get another result during training - then you should change l.delta=...

doobidoob · 2018-10-15T16:32:31Z

@AlexeyAB Thanks for your reply!
I have a few questions about your mention.

[1]. When using binary cross-entropy, why should "(1-logistic_activation(x))" or "(-logistic_activation(x))" be applied to the delta?
[2]. Why is 1 subtracted when n is class_id?
[3]. Why is "(-logistic_activation(x))" when n is not class_id?
[4]. And why not use "logistic_gradient(x)"?
I know that "delta" means gradient...
Am i misunderstanding?

I want to change the loss function, but it is not easy to apply it to the code...
Probably because I did not fully understand it.
[5]. I want to use a negative log likelihood for additional output except YOLO original output. What should I do on "delta"?

Thanks in advance for your advice.

AlexeyAB · 2018-10-17T20:33:13Z

@doobidoob

You can read about it here:

AlexeyAB · 2019-01-02T21:45:46Z

There is Binary cross-entropy loss = −(t*ln(y) + (1−t)*ln(1−y)) - we should minimize it
Also d(loss)/d(y) = loss_derivative = y-t : https://peterroelants.github.io/posts/cross-entropy-logistic/

y - probability [0 - 1]
t - is the class correct 1 or not 0

We do it here:

darknet/src/yolo_layer.c

Line 143 in 5275787

delta[index + stride*n] = ((n == class_id) ? 1 : 0) - output[index + stride*n];

The same:

t==1: i.e. if (detected_class == thruth_class) delta = -loss_derivative = -(y-t) = 1-y
t==0: i.e. if (detected_class != thruth_class) delta = -loss_derivative = -(y-t) = -y

Free-form reasoning - in general, in the Yolo v3:

we use Binary cross-entropy for multi-label classification: loss = −(y*ln(p) + (1−y)*ln(1−p)) and we should minimize it
- p - probability [0 - 1]
- y - is the class correct 1 or not 0
so we should minimize cost: loss = −ln(p) if(y==1) or loss = −ln(1−p) if(y==0), those
- if(y==1) then should be -ln(p)=0, those p=1
- if(y==0) then should be -ln(1-p)=0, those p=0
then we can do it by maximizing p if(y==1) or maximizing 1-p if(y==0),
- if(y==1) then we should maximize logistic_activation(x + delta) so delta > 0
- if(y==0) then we should minimize logistic_activation(x + delta) so delta < 0
we do it here: https://github.com/pjreddie/darknet/blob/61c9d02ec461e30d55762ec7669d6a1d3c356fb2/src/yolo_layer.c#L120
The same:
- if (detected_class == thruth_class) delta = 1-p > 0
- if (detected_class != thruth_class) delta = −p < 0

where is p = logistic_activation(x) = output[index + stride*n]

In general, there are two types of classification:

multi-label classification - each bounded box (each anchor) can have several classes. And in total there are in the neural model >= 1 classes. There is used Binary cross-entropy with Logistic activation (sigmoid). Is used in Yolo v3
multi-class classification - each bounded box (each anchor) can have only one classes. And in total there are in the neural model >= 1 classes. There is used Categorical cross-entropy with Softmax activation. Is used in Yolo v2

There is used Binary cross-entropy with Logistic activation (sigmoid) for multi-label classification in the Yolo v3, so each bonded box (each anchor) can have several classes. For example, one bounded box can be Animal, Cat or Truck, Car. Or even Cat, Dog if they are close to each other.

So:

There is used Logistic activation (sigmoid) = 1./(1. + exp(-x)) because:
- probabilities of several classes can be ~=1 - so we can detect Cat, Dog in one box (multi-label)
- nonlinearity - this is a prerequisite for use in the Neural Networks - A. N. Gorban and D. C. Wunsch, "The General Approximation Theorem," 1998: http://scholarsmine.mst.edu/cgi/viewcontent.cgi?article=2908&context=ele_comeng_facwork
For the neural networks, our result states that the function of neuron activation must be nonlinear - and nothing else. Whatever this nonlinearity is, the network of connections can be constructed, and coefficients of linear connections between the neurons can be adjusted in such a way that the neural network will compute any continuous function from its input signals with any given accuracy.
- derivative is very simple = (1-x)*x
There is used Binary Classification, binary - means that we look at each class separately, and we consider each class as 2 classes (There is or There is no). So we use this formula: https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html
loss = −(y*log(p) + (1−y)*log(1−p))
- log - the natural log ln
- y - binary indicator (0 or 1) if class label c is the correct classification for observation o
- p - predicted probability observation o is of class c

So:

if (detected_class == thruth_class) loss = −log(p)
if (detected_class != thruth_class) loss = −log(1−p)

Where is p = logistic_activation(x), this is output[index + stride*n] in the yolo_layer.c source code.
And we should minimize cost: loss = −log(p) or loss = −log(1−p).

As said in the MXNET doc: https://gluon.mxnet.io/chapter02_supervised-learning/logistic-regression-gluon.html

if (detected_class == thruth_class) we should maximize log(p), i.e. we should maximize p
if (detected_class != thruth_class) we should maximize (1−p)

https://peterroelants.github.io/posts/cross-entropy-logistic/
https://en.wikipedia.org/wiki/Cross_entropy#Cross-entropy_error_function_and_logistic_regression
https://gluon.mxnet.io/chapter02_supervised-learning/logistic-regression-gluon.html
https://ml-cheatsheet.readthedocs.io/en/latest/logistic_regression.html
https://en.wikipedia.org/wiki/Logistic_regression

AlexeyAB · 2019-01-02T22:18:37Z

This is very similar to the Yolo v2 with Categorical cross-entropy and Softmax activation for multi-class classification: https://peterroelants.github.io/posts/cross-entropy-softmax/

We do it here:

darknet/src/region_layer.c

Line 154 in 5275787

delta[index + n] = scale * (((n == class_id) ? 1 : 0) - output[index + n]);

The same:

t==1: i.e. if (detected_class == thruth_class) delta = -loss_derivative = -(y-t) = 1-y
t==0: i.e. if (detected_class != thruth_class) delta = -loss_derivative = -(y-t) = -y

i-chaochen · 2019-11-20T16:16:48Z

Hi @AlexeyAB

Thanks for your detailed explanation. I wonder for Binary cross-entropy, why the regularization is not included for the loss function, the smooth l1 loss is added for the bounding box although.

AlexeyAB · 2019-11-20T16:42:21Z

@i-chaochen

the smooth l1 loss is added for the bounding box although.

What do you mean?

i-chaochen · 2019-11-20T16:49:38Z

@i-chaochen

the smooth l1 loss is added for the bounding box although.

What do you mean?

Sorry @AlexeyAB maybe I didn't say it clearly.

I mean is any regularization, like l1-norm or l2-norm, for the loss function in the bounding box regression or object classification? (as the overall loss function for YOLO is the sum of squares of deltas for all)

For smooth l1 loss, I mean in the following links, they mentioned SSD, fast/faster rcnn are used it for box regression, and R-CNN and SPPNet used L2 loss. So, I wonder why the regularization is not added to classification.
https://lilianweng.github.io/lil-log/2018/12/27/object-detection-part-4.html
https://github.com/rbgirshick/py-faster-rcnn/files/764206/SmoothL1Loss.1.pdf

As a side note, I am not sure whether YOLO has used any regularization loss for bounding box regression?

Hope it's clear to you. Thanks

LucWuytens · 2020-06-09T10:51:26Z

question for @AlexeyAB
You explained that in Yolov3 it is possible that one anchor can detects two classes. This is indeed happening in my dataset: two labels for the same box. This may be desired behavior in many cases, but in my application, I would like to visualize and only keep the label with the highest probability (without using -'thresh'). After all usually the second label is incorrect and results in FP, reducing the overall metrics. Is there some 'setting' that can accomplish this?
thanks,
Luc

i-chaochen · 2020-06-09T11:32:38Z

question for @AlexeyAB
You explained that in Yolov3 it is possible that one anchor can detects two classes. This is indeed happening in my dataset: two labels for the same box. This may be desired behavior in many cases, but in my application, I would like to visualize and only keep the label with the highest probability (without using -'thresh'). After all usually the second label is incorrect and results in FP, reducing the overall metrics. Is there some 'setting' that can accomplish this?
thanks,
Luc

Interested, could you upload one this kind of picture, "two labels for the same box", please?

LucWuytens · 2020-06-09T15:44:55Z

@i-chaochen
I can't really upload my pictures, but you can see it for yourself using the youtube video that AlexeyAB also shared somewhere:
https://www.youtube.com/watch?v=69Ii3HjUiTM
You can see objects with one box and two labels, for example: car, taxi.
This is actually something I would not like to see in my output, but only the highest probability for each anchor box. The lower probability alternative labels also result in False Positives, potentially impacting the mAP calculation? Hence my question to @AlexeyAB

i-chaochen · 2020-06-09T16:12:47Z

@i-chaochen
I can't really upload my pictures, but you can see it for yourself using the youtube video that AlexeyAB also shared somewhere:
https://www.youtube.com/watch?v=69Ii3HjUiTM
You can see objects with one box and two labels, for example: car, taxi.
This is actually something I would not like to see in my output, but only the highest probability for each anchor box. The lower probability alternative labels also result in False Positives, potentially impacting the mAP calculation? Hence my question to @AlexeyAB

I see your point. I don't think I ever meet this kind of problem and always have one label for one box. You might have a look that how to use API for Yolo as DLL and SO libraries.
https://github.com/AlexeyAB/darknet#how-to-use-yolo-as-dll-and-so-libraries

If you're using softmax at the last cost function layer, it shall be one class label only as softmax choosing the maximum one.

threshold is for NMS, which is to replace the redundant and overlapping bounding boxes rather than the class label.

AlexeyAB · 2020-06-09T20:14:30Z

@LucWuytens You can implement this in your application code - reject one of detection with lower confidence_score if bboxes are equal.

arnoldfychen · 2021-01-13T12:27:02Z

@AlexeyAB @i-chaochen
I totally understood the trouble that @LucWuytens ran into, as I have already met the same issue.

Given that there are multi objects belonging to different classes in an image, and one of them was detected out two classes(e.g., person, machine) by YOLOv3, i.e. an object has two labels, if set a higher threshold of confidence score to try to filter out the one with lower score from the two labels, this would perhaps also filter out some other objects detected out in the same image, so, this way introduced a degradation in recall. Still didn't find a more proper way to resolve this contradiction.

AlexeyAB added the question label Oct 11, 2018

AlexeyAB mentioned this issue Jan 2, 2019

Cross Entropy for YOLOv3 pjreddie/darknet#1354

Open

AlexeyAB added Explanations Explanations of the source code, algorithms or method of use and removed question labels Jan 2, 2019

AlexeyAB mentioned this issue Jan 7, 2019

Loss function formula #821

Open

ydixon mentioned this issue Jan 24, 2019

Question about yolo_layer.c #2287

Open

AlexeyAB mentioned this issue Jan 29, 2019

What is probability #1875

Open

This was referenced Mar 11, 2019

Multiple detected classes on same object #2571

Closed

Detect a subclass instead of a class? #2575

Open

ghost mentioned this issue Apr 15, 2019

a question about delta_yolo_box #2733

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delta and binary cross-entropy loss #1695

Delta and binary cross-entropy loss #1695

doobidoob commented Oct 1, 2018

AlexeyAB commented Oct 1, 2018 •

edited

doobidoob commented Oct 15, 2018

AlexeyAB commented Oct 17, 2018

AlexeyAB commented Jan 2, 2019 •

edited

AlexeyAB commented Jan 2, 2019 •

edited

i-chaochen commented Nov 20, 2019

AlexeyAB commented Nov 20, 2019

i-chaochen commented Nov 20, 2019 •

edited

LucWuytens commented Jun 9, 2020

i-chaochen commented Jun 9, 2020 •

edited

LucWuytens commented Jun 9, 2020 •

edited

i-chaochen commented Jun 9, 2020

AlexeyAB commented Jun 9, 2020

arnoldfychen commented Jan 13, 2021 •

edited

Delta and binary cross-entropy loss #1695

Delta and binary cross-entropy loss #1695

Comments

doobidoob commented Oct 1, 2018

AlexeyAB commented Oct 1, 2018 • edited

doobidoob commented Oct 15, 2018

AlexeyAB commented Oct 17, 2018

AlexeyAB commented Jan 2, 2019 • edited

AlexeyAB commented Jan 2, 2019 • edited

i-chaochen commented Nov 20, 2019

AlexeyAB commented Nov 20, 2019

i-chaochen commented Nov 20, 2019 • edited

LucWuytens commented Jun 9, 2020

i-chaochen commented Jun 9, 2020 • edited

LucWuytens commented Jun 9, 2020 • edited

i-chaochen commented Jun 9, 2020

AlexeyAB commented Jun 9, 2020

arnoldfychen commented Jan 13, 2021 • edited

AlexeyAB commented Oct 1, 2018 •

edited

AlexeyAB commented Jan 2, 2019 •

edited

AlexeyAB commented Jan 2, 2019 •

edited

i-chaochen commented Nov 20, 2019 •

edited

i-chaochen commented Jun 9, 2020 •

edited

LucWuytens commented Jun 9, 2020 •

edited

arnoldfychen commented Jan 13, 2021 •

edited