Added NewClassesCrossEntropy criterion and automatic criterion plugin #1514

AlbinSou · 2023-10-09T17:03:22Z

I added a criterion that is very important when considering exemplar free scenarios, and only considers the learning of the new task. It's like a multitask learning criterion except this time we don't have task labels so the model outputs the full logits but when not using exemplars we want to update only the last head with the cross-entropy loss in order to avoid task-recency bias.

I personally needed this criterion to use LwF in the class incremental learning scenario but I think that in general it is useful whenever using functional regularization in CIL in the examplar free setting.

Another change that I introduced is more of a personal choice and I need your input on that. I noticed that many criterias were implemented as plugins because they need to access at some point some info about the training (i.e save a model, register current classes etc ...).

To reduce the burden of the user, I now automatically add the criterions that are also plugins to the list of plugins, by first checking that they are not already there to not have 2 times the same plugin. Let me know if you think this is a good idea otherwise I can get rid of it.

Another thing I noticed, is that the training criterion is used at evaluation. I don't know if that could be interesting to use a training criterion and evaluation criterion or if this is too much. For instance when doing exemplar free learning, you want to learn only on the new task but you still want to have the full cross entropy as a criterion when evaluating on the previous experiences. The current behavior is that it gives 0 loss to all previous experiences and it gives some loss value to the new experience only.

… adding

AntonioCarta · 2023-10-10T07:44:54Z

I agree with the first two points.

Can I suggest a different solution for masking? unit masking is quite important for most losses in CL, so it should be a generic argument like the reduce='mean' in pytorch losses. We could have a masking={'curr','old','seen','all'}, which defaults to all.
It's ok if this PR only supports crossentropy with curr and all, but I would like something that we can easily adapt to other methods. Otherwise we will need a different object for each different loss and masking method.

... could be interesting to use a training criterion and evaluation criterion

Yes, I think this is a common pain point. I'm not sure the proposal really solves the issue. For example, here are some subtle issues:

the training loss is the criterion plus any additional regularizations summed by plugins/strategy. Even if we had an explicit training_criterion, it may not represent everything we use for the training loss.
even if we have an explicit train_criterion and eval_criterion, we may want to evaluate multiple criterion during eval.

IMO this is an evaluation issue that should be fixed by simplifying the metrics/evaluation system. There is not much that we can do to improve on the strategy side.

AlbinSou · 2023-10-10T09:46:36Z

I agree with the first two points.

Can I suggest a different solution for masking? unit masking is quite important for most losses in CL, so it should be a generic argument like the reduce='mean' in pytorch losses. We could have a masking={'curr','old','seen','all'}, which defaults to all. It's ok if this PR only supports crossentropy with curr and all, but I would like something that we can easily adapt to other methods. Otherwise we will need a different object for each different loss and masking method.

... could be interesting to use a training criterion and evaluation criterion

Yes, I think this is a common pain point. I'm not sure the proposal really solves the issue. For example, here are some subtle issues:

the training loss is the criterion plus any additional regularizations summed by plugins/strategy. Even if we had an explicit training_criterion, it may not represent everything we use for the training loss.

even if we have an explicit train_criterion and eval_criterion, we may want to evaluate multiple criterion during eval.

IMO this is an evaluation issue that should be fixed by simplifying the metrics/evaluation system. There is not much that we can do to improve on the strategy side.

I don't really get what you mean with the masking. In classical cross entropy there is no masking of the unit and most strategies use this criterion. The masking usually is handled in the IncrementalClassifier. Could you clarify this point ? Or did you want me to create a more general criterion (rather than the new class one), that masks some units depending on the given parameter ? In the case of this criterion I guess the masking type would be "old", but I don't see any case where you would want to mask all units for instance.

AntonioCarta · 2023-10-10T10:24:13Z

I don't really get what you mean with the masking. In classical cross entropy there is no masking of the unit and most strategies use this criterion. The masking usually is handled in the IncrementalClassifier. Could you clarify this point ? Or did you want me to create a more general criterion (rather than the new class one), that masks some units depending on the given parameter ? In the case of this criterion I guess the masking type would be "old", but I don't see any case where you would want to mask all units for instance.

What you are doing here is basically this:

mask = new_units
masked_output = output[:, mask]
loss = cross_entropy(masked_output, target)

If you want to use all the units it's the same code:

mask = all_units  # there is no masking here. All units are active
masked_output = output[:, mask]
loss = cross_entropy(masked_output, target)

and this masking is independent from the loss. Let's say that you want to compute the kl-div on old units (useful for LwF):

mask = old_units
masked_output = output[:, mask]
loss = kl_div(masked_output, target)

IncrementalClassifier is different. There, you only remove unseen units, but when you compute the loss you may want to mask more units.

coveralls · 2023-10-10T11:09:07Z

Pull Request Test Coverage Report for Build 6469068428

51 of 59 (86.44%) changed or added relevant lines in 4 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.03%) to 72.531%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
avalanche/training/templates/base_sgd.py	1	2	50.0%
avalanche/training/regularization.py	3	6	50.0%
avalanche/training/losses.py	30	34	88.24%

Totals
Change from base Build 6299610004:	0.03%
Covered Lines:	17612
Relevant Lines:	24282

💛 - Coveralls

AntonioCarta · 2023-10-10T12:54:00Z

avalanche/training/losses.py

+
+    @property
+    def current_mask(self):
+        if self.mask == "all":


this is incorrect. You are selecting only "seen" units, which excludes future units (e.g. DER regularizes them).

I think it's better to have both "seen" and "all" options (not necessarily for all losses).

AntonioCarta · 2023-10-10T12:55:58Z

avalanche/training/regularization.py

@@ -9,12 +9,17 @@
 from avalanche.models import MultiTaskModule, avalanche_forward


-def cross_entropy_with_oh_targets(outputs, targets, eps=1e-5):
+def cross_entropy_with_oh_targets(outputs, targets, eps=1e-5, reduction="mean"):


can you use pytorch cross-entropy? it is more numerically stable than this implementation.

The problem is that I need to give one hot targets this renders the masking way easier and more natural to implement. The cross entropy from pytorch does not allow that I believe.

AlbinSou · 2023-10-11T10:41:23Z

@AntonioCarta I'm not sure about changing it to MaskedKLDiv, because for now I only give the option to give integer targets, not soft targets. So I think it's more like a CrossEntropy. It could be possible to make it a masked KL Div but then it would make the criterion more complex to use, since the user would have to create the one hot target himself. What I can do however is still use the F.kl_div under the hood for numerical stability but remain with the name MaskedCrossEntropy and the constraint that you should give integer targets, just like for cross entropy.

AntonioCarta · 2023-10-11T11:00:35Z

It's fine, but in this case you should implement a numerically stable softmax https://stackoverflow.com/questions/42599498/numerically-stable-softmax

You can't call it CE and use KLDiv because they compute different values.

Added NewClassesCrossEntropy criterion and automatic criterion plugin…

5e9d1dd

… adding

AlbinSou requested a review from AntonioCarta October 9, 2023 17:03

change to maskedcrossentropy with 3 modes

6e8e1b3

fixed lint

e5f6ed5

AntonioCarta reviewed Oct 10, 2023

View reviewed changes

AlbinSou added 3 commits October 11, 2023 13:01

added seen and all options, switch to torch functional kl div

a1025cc

added stable softmax

06b3acc

fix lint

5561a44

AlbinSou merged commit 8803c85 into ContinualAI:master Oct 20, 2023
14 of 18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added NewClassesCrossEntropy criterion and automatic criterion plugin #1514

Added NewClassesCrossEntropy criterion and automatic criterion plugin #1514

AlbinSou commented Oct 9, 2023

AntonioCarta commented Oct 10, 2023

AlbinSou commented Oct 10, 2023

AntonioCarta commented Oct 10, 2023 •

edited

coveralls commented Oct 10, 2023 •

edited

AntonioCarta Oct 10, 2023

AntonioCarta Oct 10, 2023

AntonioCarta Oct 10, 2023

AlbinSou Oct 10, 2023

AlbinSou commented Oct 11, 2023

AntonioCarta commented Oct 11, 2023 •

edited

Added NewClassesCrossEntropy criterion and automatic criterion plugin #1514

Added NewClassesCrossEntropy criterion and automatic criterion plugin #1514

Conversation

AlbinSou commented Oct 9, 2023

AntonioCarta commented Oct 10, 2023

AlbinSou commented Oct 10, 2023

AntonioCarta commented Oct 10, 2023 • edited

coveralls commented Oct 10, 2023 • edited

Pull Request Test Coverage Report for Build 6469068428

💛 - Coveralls

AntonioCarta Oct 10, 2023

Choose a reason for hiding this comment

AntonioCarta Oct 10, 2023

Choose a reason for hiding this comment

AntonioCarta Oct 10, 2023

Choose a reason for hiding this comment

AlbinSou Oct 10, 2023

Choose a reason for hiding this comment

AlbinSou commented Oct 11, 2023

AntonioCarta commented Oct 11, 2023 • edited

AntonioCarta commented Oct 10, 2023 •

edited

coveralls commented Oct 10, 2023 •

edited

AntonioCarta commented Oct 11, 2023 •

edited