A question about the biased models' negative gradient of its loss #10

Murphyzc · 2023-07-20T01:05:12Z

The negative gradient of biased models' loss $-\nabla \mathcal{L}(\mathcal{H}_m)$ is considered as the pseudo label for the base model. And the $-\nabla \mathcal{L}(\mathcal{H}_m) = y_i-\sigma(\mathcal{H}_m)$ can relatively small if a sample is easy to fit by biased models, how can base model $f(X;\theta)$ pay more attention to samples that are hard to solve by biased classifiers $\mathcal{H}_m$?

In my view, if the negative gradient of biased model for every class $i$ tends to be zero, and this could be a zero vector as a pseudo supervision for base model, it force the model to classify the sample into empty class.

GeraldHan · 2023-07-20T02:40:41Z

Following the expression of CE or BCE loss, the loss will tend to be zero if the pseudo label is zero for all classes. It means that this sample will be ignored by the base model rather than further optimizing the model.

To solve the bias over-estimation problem, we present a simple solution at https://ieeexplore.ieee.org/abstract/document/10027464/

Murphyzc · 2023-07-20T03:48:57Z

Yes, I understand for the CE loss $\mathcal{L}(f(X),-\nabla \mathcal{L}(\mathcal{H}_m))=-\sum_j -\nabla \mathcal{L}(\mathcal{H}_m)_j \log(\sigma(f(X)_i))$, the negative gradient tends to be zero and the loss will be zero.

But for the BCE loss $\mathcal{L}(f(X),-\nabla \mathcal{L}(\mathcal{H}_m))=-\sum_j -\nabla \mathcal{L}(\mathcal{H}_m)_j \log(\sigma(f(X)_j)) + (1- (-\nabla \mathcal{L}(\mathcal{H}_m)_j ))(1-\log(\sigma(f(X)_j)) $, the first term tends to be zero, but the second term is not the same.

GeraldHan · 2023-07-20T06:22:23Z

Under BCE loss, it can also encourage the model to lower the biased estimation but will introduce extra biases (bias over-estimation).
That is why the relaxed form (issue #5) works better than the actual gradient under BCE loss.

Murphyzc · 2023-07-20T06:43:25Z

OK, thanks for your reply! I didn't understand how negative gradients worked at the time, that's why I read the GGE , and now that I've figured it out, it's a really neat way of addressing biased problem.

Following the expression of CE or BCE loss, the loss will tend to be zero if the pseudo label is zero for all classes. It means that this sample will be ignored by the base model rather than further optimizing the model.

To solve the bias over-estimation problem, we present a simple solution at https://ieeexplore.ieee.org/abstract/document/10027464/

Murphyzc closed this as completed Jul 20, 2023

Murphyzc reopened this Jul 20, 2023

Murphyzc closed this as completed Jul 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about the biased models' negative gradient of its loss #10

A question about the biased models' negative gradient of its loss #10

Murphyzc commented Jul 20, 2023 •

edited

Loading

GeraldHan commented Jul 20, 2023

Murphyzc commented Jul 20, 2023

GeraldHan commented Jul 20, 2023

Murphyzc commented Jul 20, 2023

A question about the biased models' negative gradient of its loss #10

A question about the biased models' negative gradient of its loss #10

Comments

Murphyzc commented Jul 20, 2023 • edited Loading

GeraldHan commented Jul 20, 2023

Murphyzc commented Jul 20, 2023

GeraldHan commented Jul 20, 2023

Murphyzc commented Jul 20, 2023

Murphyzc commented Jul 20, 2023 •

edited

Loading