New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added NewClassesCrossEntropy criterion and automatic criterion plugin #1514
Conversation
I agree with the first two points. Can I suggest a different solution for masking? unit masking is quite important for most losses in CL, so it should be a generic argument like the
Yes, I think this is a common pain point. I'm not sure the proposal really solves the issue. For example, here are some subtle issues:
IMO this is an evaluation issue that should be fixed by simplifying the metrics/evaluation system. There is not much that we can do to improve on the strategy side. |
I don't really get what you mean with the masking. In classical cross entropy there is no masking of the unit and most strategies use this criterion. The masking usually is handled in the IncrementalClassifier. Could you clarify this point ? Or did you want me to create a more general criterion (rather than the new class one), that masks some units depending on the given parameter ? In the case of this criterion I guess the masking type would be "old", but I don't see any case where you would want to mask all units for instance. |
What you are doing here is basically this:
If you want to use all the units it's the same code:
and this masking is independent from the loss. Let's say that you want to compute the kl-div on old units (useful for LwF):
IncrementalClassifier is different. There, you only remove unseen units, but when you compute the loss you may want to mask more units. |
Pull Request Test Coverage Report for Build 6469068428
💛 - Coveralls |
avalanche/training/losses.py
Outdated
|
||
@property | ||
def current_mask(self): | ||
if self.mask == "all": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is incorrect. You are selecting only "seen" units, which excludes future units (e.g. DER regularizes them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to have both "seen" and "all" options (not necessarily for all losses).
avalanche/training/regularization.py
Outdated
@@ -9,12 +9,17 @@ | |||
from avalanche.models import MultiTaskModule, avalanche_forward | |||
|
|||
|
|||
def cross_entropy_with_oh_targets(outputs, targets, eps=1e-5): | |||
def cross_entropy_with_oh_targets(outputs, targets, eps=1e-5, reduction="mean"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use pytorch cross-entropy? it is more numerically stable than this implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that I need to give one hot targets this renders the masking way easier and more natural to implement. The cross entropy from pytorch does not allow that I believe.
@AntonioCarta I'm not sure about changing it to MaskedKLDiv, because for now I only give the option to give integer targets, not soft targets. So I think it's more like a CrossEntropy. It could be possible to make it a masked KL Div but then it would make the criterion more complex to use, since the user would have to create the one hot target himself. What I can do however is still use the F.kl_div under the hood for numerical stability but remain with the name MaskedCrossEntropy and the constraint that you should give integer targets, just like for cross entropy. |
It's fine, but in this case you should implement a numerically stable softmax https://stackoverflow.com/questions/42599498/numerically-stable-softmax You can't call it CE and use KLDiv because they compute different values. |
I added a criterion that is very important when considering exemplar free scenarios, and only considers the learning of the new task. It's like a multitask learning criterion except this time we don't have task labels so the model outputs the full logits but when not using exemplars we want to update only the last head with the cross-entropy loss in order to avoid task-recency bias.
I personally needed this criterion to use LwF in the class incremental learning scenario but I think that in general it is useful whenever using functional regularization in CIL in the examplar free setting.
Another change that I introduced is more of a personal choice and I need your input on that. I noticed that many criterias were implemented as plugins because they need to access at some point some info about the training (i.e save a model, register current classes etc ...).
To reduce the burden of the user, I now automatically add the criterions that are also plugins to the list of plugins, by first checking that they are not already there to not have 2 times the same plugin. Let me know if you think this is a good idea otherwise I can get rid of it.
Another thing I noticed, is that the training criterion is used at evaluation. I don't know if that could be interesting to use a training criterion and evaluation criterion or if this is too much. For instance when doing exemplar free learning, you want to learn only on the new task but you still want to have the full cross entropy as a criterion when evaluating on the previous experiences. The current behavior is that it gives 0 loss to all previous experiences and it gives some loss value to the new experience only.