Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about the biased models' negative gradient of its loss #10

Closed
Murphyzc opened this issue Jul 20, 2023 · 4 comments
Closed

A question about the biased models' negative gradient of its loss #10

Murphyzc opened this issue Jul 20, 2023 · 4 comments

Comments

@Murphyzc
Copy link

Murphyzc commented Jul 20, 2023

The negative gradient of biased models' loss $-\nabla \mathcal{L}(\mathcal{H}_m)$ is considered as the pseudo label for the base model. And the $-\nabla \mathcal{L}(\mathcal{H}_m) = y_i-\sigma(\mathcal{H}_m)$ can relatively small if a sample is easy to fit by biased models, how can base model $f(X;\theta)$ pay more attention to samples that are hard to solve by biased classifiers $\mathcal{H}_m$?

In my view, if the negative gradient of biased model for every class $i$ tends to be zero, and this could be a zero vector as a pseudo supervision for base model, it force the model to classify the sample into empty class.

@Murphyzc Murphyzc reopened this Jul 20, 2023
@GeraldHan
Copy link
Owner

Following the expression of CE or BCE loss, the loss will tend to be zero if the pseudo label is zero for all classes. It means that this sample will be ignored by the base model rather than further optimizing the model.

To solve the bias over-estimation problem, we present a simple solution at https://ieeexplore.ieee.org/abstract/document/10027464/

@Murphyzc
Copy link
Author

Yes, I understand for the CE loss $\mathcal{L}(f(X),-\nabla \mathcal{L}(\mathcal{H}_m))=-\sum_j -\nabla \mathcal{L}(\mathcal{H}_m)_j \log(\sigma(f(X)_i))$, the negative gradient tends to be zero and the loss will be zero.

But for the BCE loss $\mathcal{L}(f(X),-\nabla \mathcal{L}(\mathcal{H}_m))=-\sum_j -\nabla \mathcal{L}(\mathcal{H}_m)_j \log(\sigma(f(X)_j)) + (1- (-\nabla \mathcal{L}(\mathcal{H}_m)_j ))(1-\log(\sigma(f(X)_j)) $, the first term tends to be zero, but the second term is not the same.

@GeraldHan
Copy link
Owner

Under BCE loss, it can also encourage the model to lower the biased estimation but will introduce extra biases (bias over-estimation).
That is why the relaxed form (issue #5) works better than the actual gradient under BCE loss.

@Murphyzc
Copy link
Author

OK, thanks for your reply! I didn't understand how negative gradients worked at the time, that's why I read the GGE , and now that I've figured it out, it's a really neat way of addressing biased problem.

Following the expression of CE or BCE loss, the loss will tend to be zero if the pseudo label is zero for all classes. It means that this sample will be ignored by the base model rather than further optimizing the model.

To solve the bias over-estimation problem, we present a simple solution at https://ieeexplore.ieee.org/abstract/document/10027464/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants