-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A question about the biased models' negative gradient of its loss #10
Comments
Following the expression of CE or BCE loss, the loss will tend to be zero if the pseudo label is zero for all classes. It means that this sample will be ignored by the base model rather than further optimizing the model. To solve the bias over-estimation problem, we present a simple solution at https://ieeexplore.ieee.org/abstract/document/10027464/ |
Yes, I understand for the CE loss But for the BCE loss |
Under BCE loss, it can also encourage the model to lower the biased estimation but will introduce extra biases (bias over-estimation). |
OK, thanks for your reply! I didn't understand how negative gradients worked at the time, that's why I read the GGE , and now that I've figured it out, it's a really neat way of addressing biased problem.
|
The negative gradient of biased models' loss$-\nabla \mathcal{L}(\mathcal{H}_m)$ is considered as the pseudo label for the base model. And the $-\nabla \mathcal{L}(\mathcal{H}_m) = y_i-\sigma(\mathcal{H}_m)$ can relatively small if a sample is easy to fit by biased models, how can base model $f(X;\theta)$ pay more attention to samples that are hard to solve by biased classifiers $\mathcal{H}_m$ ?
In my view, if the negative gradient of biased model for every class$i$ tends to be zero, and this could be a zero vector as a pseudo supervision for base model, it force the model to classify the sample into empty class.
The text was updated successfully, but these errors were encountered: