You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In theory, as long as the pseudo label has a negative correlation with the bias model prediction, it is able to mine the hard examples.
The wrong gradient in the paper is actually an approximation of $\nabla \mathcal{H}_i$. That's why it still works well.
The text was updated successfully, but these errors were encountered:
What's reason about this statement "In theory, as long as the pseudo label has a negative correlation with the bias model prediction, it is able to mine the hard examples."?
Sorry for the wrong derivation of the negative gradient for Sigmoid+BCE loss.
The correct negative gradient is
In theory, as long as the pseudo label has a negative correlation with the bias model prediction, it is able to mine the hard examples.$\nabla \mathcal{H}_i$ . That's why it still works well.
The wrong gradient in the paper is actually an approximation of
The text was updated successfully, but these errors were encountered: