remove OnlineSupervisedTemplate #1362

AlbinSou · 2023-05-05T17:18:47Z

Avalanche already handles well the online streams as streams of small experiences, which are already compatible with the normal strategies (SupervisedTemplate). I think this is confusing to have two types of templates. Also because a CL agent is not suppose to know what is going to come (online stream or non online, maybe experiences with various sizes).

This is why I attempt to remove the online templates.

I removed the existing OnlineSupervisedTemplate and OnlineMeta...Template, and to merge the OnlineObservation into the BatchObservation (in the case of an online stream, we have to handle the access_task_boundaries).
I removed the Online ER ACE, which was the only method implementing both Online and Non Online behavior (aside from the plugins who are agnostic to that)
There is one thing however that I do not understand in the old online observation it's the distinction between make optimizer and update optimizer. I don't understand why it's necessary and I don't know if I should keep this behavior or drop it.
I also kept the OnlineNaive strategy for now, just turning it to a SupervisedTemplate and rerouting the train_passes to train_epochs.

Maybe keep it as a draft for now, till we are sure the currently implemented online strategy exhibit the same behavior when turned into their non-online version.

coveralls · 2023-05-05T17:28:24Z

Pull Request Test Coverage Report for Build 5092658528

164 of 172 (95.35%) changed or added relevant lines in 9 files are covered.
4 unchanged lines in 3 files lost coverage.
Overall coverage increased (+0.5%) to 72.586%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
avalanche/training/templates/observation_type/batch_observation.py	29	30	96.67%
avalanche/models/dynamic_optimizers.py	46	49	93.88%
tests/test_models.py	80	84	95.24%

Files with Coverage Reduction	New Missed Lines	%
avalanche/benchmarks/scenarios/new_classes/nc_scenario.py	1	89.02%
avalanche/models/dynamic_optimizers.py	1	93.75%
avalanche/benchmarks/scenarios/online_scenario.py	2	95.96%

Totals
Change from base Build 5001854992:	0.5%
Covered Lines:	15828
Relevant Lines:	21806

💛 - Coveralls

HamedHemati · 2023-05-05T18:00:06Z

Thanks @AlbinSou for the PR. I think at this point it makes sense to merge them to avoid additional work and confusion when implementing new strategies. Here are my comments:

I removed the existing OnlineSupervisedTemplate and OnlineMeta...Template, and to merge the OnlineObservation into the BatchObservation (in the case of an online stream, we have to handle the access_task_boundaries).

I think it make sense to remove observation_type and add the check_model_and_optimizer, make_optimizer and model_adaptation functions to BaseSGDTemplate. I don't think we'll have new observation types (?).

I removed the Online ER ACE, which was the only method implementing both Online and Non Online behavior (aside from the plugins who are agnostic to that)

Minor: another strategy that uses OnlineSupervisedMetaLearningTemplate is MER.

There is one thing however that I do not understand in the old online observation it's the distinction between make optimizer and update optimizer. I don't understand why it's necessary and I don't know if I should keep this behaviour or drop it.

The reason for updating instead of resetting in online CL streams with task boundaries is that we don't want to reset the optimizer state after each sub-experience of the same experience as this would be problematic for some online strategies when using Adam optimizer.

I also kept the OnlineNaive strategy for now, just turning it to a SupervisedTemplate and rerouting the train_passes to train_epochs.

I think it makes sense to remove OnlineNaive and just use Naive. With the definition of experience in Avalanche, using the term epoch is ok since the only difference between online and batch CL is the size of the experience.

Maybe keep it as a draft for now, till we are sure the currently implemented online strategy exhibit the same behavior when turned into their non-online version.

Since the only difference is in model adaptation and optimizer resetting, if merged correctly, it shouldn't affect anything.

AlbinSou · 2023-05-08T11:02:37Z

Thanks @AlbinSou for the PR. I think at this point it makes sense to merge them to avoid additional work and confusion when implementing new strategies. Here are my comments:

I removed the existing OnlineSupervisedTemplate and OnlineMeta...Template, and to merge the OnlineObservation into the BatchObservation (in the case of an online stream, we have to handle the access_task_boundaries).

I think it make sense to remove observation_type and add the check_model_and_optimizer, make_optimizer and model_adaptation functions to BaseSGDTemplate. I don't think we'll have new observation types (?).

I removed the Online ER ACE, which was the only method implementing both Online and Non Online behavior (aside from the plugins who are agnostic to that)

Minor: another strategy that uses OnlineSupervisedMetaLearningTemplate is MER.

There is one thing however that I do not understand in the old online observation it's the distinction between make optimizer and update optimizer. I don't understand why it's necessary and I don't know if I should keep this behaviour or drop it.

The reason for updating instead of resetting in online CL streams with task boundaries is that we don't want to reset the optimizer state after each sub-experience of the same experience as this would be problematic for some online strategies when using Adam optimizer.

I also kept the OnlineNaive strategy for now, just turning it to a SupervisedTemplate and rerouting the train_passes to train_epochs.

I think it makes sense to remove OnlineNaive and just use Naive. With the definition of experience in Avalanche, using the term epoch is ok since the only difference between online and batch CL is the size of the experience.

Maybe keep it as a draft for now, till we are sure the currently implemented online strategy exhibit the same behavior when turned into their non-online version.

Since the only difference is in model adaptation and optimizer resetting, if merged correctly, it shouldn't affect anything.

Yes, you are right about the fact that we should not reset the optimizer momentum after every experience. However, I think this is a more general question. Maybe we also don't want to reset it for normal task shifts after all. I think we should add this as a part of the strategy arguments (like reset-optimizer-at-task-shift trigger button). What do you think ? Otherwise in online learning without task boundaries we cannot really know when we should reset or not the optimizer. And I think waiting for new parameters to come in is a bit weird since it assumes that a distribution change will be accompanied with new parameters (a bias we have from class-incremental learning where it is indeed the case).

I will put back this update optimizer line but for me it should be clear when it's reseted or not, maybe a strategy argument would be better than an arbitrary decision.

AntonioCarta · 2023-05-08T12:57:21Z

In general I'm not against adding an argument to control the optimizer reset but I have to point out one thing. The current reset_optimizer is doing two things:
1 - adding new parameters
2 - resetting the optimizer's state (e.g. momentum)

We have to do (1) before each experience because all the strategies need to work with DynamicModules, but (2) is optional.

The current solution behaves in the same way whether you have an expanding layer or not. Instead, keeping the state is tricky when a layer is expanded, because we don't know what's in the state.

Notice that a user can always override the method to change this behavior easily.

And I think waiting for new parameters to come in is a bit weird since it assumes that a distribution change will be accompanied with new parameters (a bias we have from class-incremental learning where it is indeed the case).

I'm not sure I understand this point but in general a distribution shift does not always result in parameter expansion (e.g. domain-incremental).

HamedHemati · 2023-05-10T14:36:18Z

Regarding optimizer resetting:

I think we can treat all experience types in the same way and:

1- Do not reset the optimizer after each experience. This can be done as a part of the strategy in case it's needed.

2- Mark the strategy's model changes with an attribute assigned to dynamic modules after their adaptation . For example, for all DynamicModules we can setattr(self, 'is_expanded', True) if there are new parameters introduced, otherwise set it to False by default. Then after model_adaptation is over, we can check if the is_expanded attribute has changed for any of those modules and run update_optimizer in case of any changes. Then of course reset is_expanded to False . This should be very fast and resolve the confusion about task boundaries.

AntonioCarta · 2023-05-10T16:48:13Z

For example, for all DynamicModules we can setattr(self, 'is_expanded', True) if there are new parameters introduced, otherwise set it to False by default. Then after model_adaptation is over, we can check if the is_expanded attribute has changed for any of those modules and run update_optimizer in case of any changes

It's already possible to check expanded layers by looking at their dimension.

One problem I see is that in this way models with incremental classifiers or static ones behave differently.

HamedHemati · 2023-05-11T08:08:11Z

It's already possible to check expanded layers by looking at their dimension.

Yea, that's also possible. The idea was to avoid multiple parameter-wise comparisons.

One problem I see is that in this way models with incremental classifiers or static ones behave differently.

There is always the option to reset the optimizer. At least to me it makes more sense to always update and expand the optimizer state rather than resetting it, and let the user decide whether it should be reset or not.

AlbinSou · 2023-05-15T08:55:29Z

@AntonioCarta I don't understand which one is the failing test. It just says unittest 3.7 "Job has failed" on github. Apart from that, I did not understand what was the conclusion of the above discussion. Should we use update_optimizer() by default ? For me it makes sense to reset the optimizer at task boundaries when given, since most people do that.

…meter change

avalanche/training/supervised/er_ace.py

AntonioCarta · 2023-05-17T15:38:42Z

avalanche/training/templates/observation_type/batch_observation.py

        else:
            avalanche_model_adaptation(model, self.experience)

        return model.to(self.device)

-    def make_optimizer(self):
+    def make_optimizer(self, reset_opt=True, reset_state=False):


since this are global arguments of the train method I would be more explicit reset_optimizer, reset_optimizer_state

also, add a docstring for the arguments

AntonioCarta · 2023-05-17T15:39:52Z

avalanche/training/templates/observation_type/batch_observation.py

-                reset_optimizer(self.optimizer, self.model)
+        reset_opt = detect_param_change(
+            self.optimizer, list(self.model.named_parameters()),
+        ) or reset_opt


this is different from what we discussed. Only the dynamic modules should be reset unless reset_opt is set. The state of the others should be kept.

Ok, then I don't know how to reset only these. Because depending on the optimizer the state could have different structure. In other words, I know how to reset the whole state, but not one part of the state only. One way I see would be to have one parameter group per parameter but this would be uncompatible with user defined parameter groups

you can check the shape of parameters before/after adaptation. If they changed, you should reset the state for those parameters.

In other words, I know how to reset the whole state, but not one part of the state only.

I don't know how to fix this honestly. This is why we just used to reset everything.

One way I see would be to have one parameter group per parameter but this would be uncompatible with user defined parameter groups

I would not mess with parameter groups since they may be defined by the user.

you can check the shape of parameters before/after adaptation. If they changed, you should reset the state for those parameters.

This is already what I am doing, the id of the parameter changes whenever the shape changes. I could check the shape also but this would be redundant I think. Though I should make sure that this thing (id change = shape change) is always true, but I think it is. I might have found a way to update the state btw, Ill push smt today.

AntonioCarta · 2023-05-17T15:40:24Z

avalanche/training/templates/observation_type/batch_observation.py

-                    self.model.parameters(),
-                    reset_state=False)
-        else:
+        if reset_state:


reset_state and reset_opt are both resetting the state. What's the difference?

reset state is resetting only the state, not attributing new parameters to the optimizer, whereas reset_opt is doing both (attributing new parameters to the optimizer and resetting the state)

reset state is resetting only the state, not attributing new parameters to the optimizer

we never want to do this. I think you can remove one of the two options since they are redundants.

…val and change

AlbinSou · 2023-05-23T15:07:06Z

I attempted to fix the checkpointing tests in blind since I don't manage to run them locally. On thing that I foresee could be a problem in the checkpointing case is that in that case I'm not sure the loaded pointers (ids) are necessarily the same for the loaded model parameters than the ones that I store in optimized_param_id. Also, it's possible that they are not even the same between model parameters and optimized parameters.

For checkpointing I cannot remove the optimized_params_id and call reset_optimizer (which is the default behavior when optimized_params_id is None), since there might be a saved optimizer state that we want to keep for the remaining part of the optimization process.

I don't know yet how to fix this problem, if you know exactly how checkpointing of the optimizer and model is done I'll welcome your help. But I think that the problem I described is happening and this must be why we have an error in that case.

I commented in one part some code that I thought would handle this case but it does not and now I understand why, it's because the ids in the loaded optimizer are not necessarly the same than the ids that I save in the optimized_params_id.

Anyways, in this last commit I also add options to manage freezing of parameters, so far I was adding all model parameters, now I check that p.requires_grad = True before adding to the optimizer, and I also strongly link the optimized_params_id to the functions that manipulate the optimizer, this attribute is now returned by the functions. I guess later both of these functions could be moved to the appropriate file (update_optimizer and reset_optimizer in dynamic_optimizers), but for now I keep them here.

…ough checkpointing

AntonioCarta · 2023-05-24T10:10:26Z

You should add a test that checks that optimizers with state (e.g. Adam) are serialized properly.

Also, we could try to save the state_dict instead of the object. Pytorch converts the ids to positional indices that can be updated with load_state_dict (code).

I check that p.requires_grad = True before adding to the optimizer

You should not do this. The user passes the parameters to the optimizer and you don't know if frozen parameters that are passed to the optimizer will stay frozen.

…added doc

AlbinSou · 2023-05-24T10:45:16Z

You should add a test that checks that optimizers with state (e.g. Adam) are serialized properly.

Also, we could try to save the state_dict instead of the object. Pytorch converts the ids to positional indices that can be updated with load_state_dict (code).

I check that p.requires_grad = True before adding to the optimizer

You should not do this. The user passes the parameters to the optimizer and you don't know if frozen parameters that are passed to the optimizer will stay frozen.

Ok, I will add this test, I have already made some tests with checkpointing but did not check the state. From visual debugging check it looked like they were correctly serialized. For the requires_grad function it's more that I'm thinking of some use case that I encountered. I agree with what you said, but then I don't know how a user could decide not to optimize some parameters. I understand that if it is in the optimizer and with requires_grad = False it will not get a gradient so it's fine, however, if a state exists for this parameter it will still get update by the momentum, so it won't be frozen. I wanted to avoid these cases.

…dels

AntonioCarta · 2023-05-24T15:55:25Z

but then I don't know how a user could decide not to optimize some parameters. I understand that if it is in the optimizer and with requires_grad = False it will not get a gradient so it's fine, however, if a state exists for this parameter it will still get update by the momentum, so it won't be frozen. I wanted to avoid these cases.

This is probably a bug on the user side but we can't do much about it. We should have the same behavior of pytorch here.

AlbinSou · 2023-05-24T16:38:11Z

but then I don't know how a user could decide not to optimize some parameters. I understand that if it is in the optimizer and with requires_grad = False it will not get a gradient so it's fine, however, if a state exists for this parameter it will still get update by the momentum, so it won't be frozen. I wanted to avoid these cases.

This is probably a bug on the user side but we can't do much about it. We should have the same behavior of pytorch here.

But, in pytorch the user can decide to set requires_grad = False and retire the parameter from the optimizer, here we are giving all the parameters to the optimizer all the time, leaving no option for the user to freeze any parameters. Maybe we should at least give the choice of what parameters to put into the optimizer.

AlbinSou · 2023-05-29T08:28:49Z

@AntonioCarta It's ready to merge on my side

AntonioCarta · 2023-05-29T08:52:51Z

ok, thank you for all the work

removed OnlineObservation (merge it into batch observation

7837222

merged update_optimizer behavior to batch_observation

79fc3fc

now defaults to update the optimizer all the time, only reset at para…

00b32a0

…meter change

AlbinSou commented May 17, 2023

View reviewed changes

avalanche/training/supervised/er_ace.py Show resolved Hide resolved

AntonioCarta mentioned this pull request May 17, 2023

reset_optimizer removes essential parameters in adaptive optimization algorithms #725

Closed

AntonioCarta reviewed May 17, 2023

View reviewed changes

AlbinSou added 2 commits May 18, 2023 13:32

now update optimizer all the time, considers parameter addition, remo…

d6e3e57

…val and change

added a test for make_optimizer

efa38fd

AlbinSou marked this pull request as ready for review May 22, 2023 12:43

AlbinSou and others added 3 commits May 22, 2023 14:46

Merge branch 'master' into online_strategies

ef956c1

added back cycle func

0c3421d

handled several other cases

d2d255a

changed id storage to parameter referencing, which does not break thr…

37dabd3

…ough checkpointing

little refactoring, changed the functions to dynamic_optimizer file, …

69aafb9

…added doc

added test to check serialization of opt state, moved test to test_mo…

50540f8

…dels

AlbinSou added 2 commits May 25, 2023 13:10

removed the requires_grad thing, fixed a bug

51e014b

added doc line to make_optimizer function

70ba13d

AntonioCarta merged commit 9a61d5b into ContinualAI:master May 29, 2023
18 checks passed

AlbinSou deleted the online_strategies branch June 26, 2023 15:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove OnlineSupervisedTemplate #1362

remove OnlineSupervisedTemplate #1362

AlbinSou commented May 5, 2023

coveralls commented May 5, 2023 •

edited

HamedHemati commented May 5, 2023 •

edited

AlbinSou commented May 8, 2023 •

edited

AntonioCarta commented May 8, 2023

HamedHemati commented May 10, 2023 •

edited

AntonioCarta commented May 10, 2023

HamedHemati commented May 11, 2023 •

edited

AlbinSou commented May 15, 2023

AntonioCarta May 17, 2023

AntonioCarta May 17, 2023

AntonioCarta May 17, 2023

AlbinSou May 17, 2023 •

edited

AntonioCarta May 18, 2023 •

edited

AntonioCarta May 18, 2023

AlbinSou May 18, 2023 •

edited

AntonioCarta May 17, 2023

AlbinSou May 17, 2023

AntonioCarta May 18, 2023

AlbinSou commented May 23, 2023

AntonioCarta commented May 24, 2023

AlbinSou commented May 24, 2023

AntonioCarta commented May 24, 2023

AlbinSou commented May 24, 2023

AlbinSou commented May 29, 2023

AntonioCarta commented May 29, 2023

remove OnlineSupervisedTemplate #1362

remove OnlineSupervisedTemplate #1362

Conversation

AlbinSou commented May 5, 2023

coveralls commented May 5, 2023 • edited

Pull Request Test Coverage Report for Build 5092658528

💛 - Coveralls

HamedHemati commented May 5, 2023 • edited

AlbinSou commented May 8, 2023 • edited

AntonioCarta commented May 8, 2023

HamedHemati commented May 10, 2023 • edited

AntonioCarta commented May 10, 2023

HamedHemati commented May 11, 2023 • edited

AlbinSou commented May 15, 2023

AntonioCarta May 17, 2023

Choose a reason for hiding this comment

AntonioCarta May 17, 2023

Choose a reason for hiding this comment

AntonioCarta May 17, 2023

Choose a reason for hiding this comment

AlbinSou May 17, 2023 • edited

Choose a reason for hiding this comment

AntonioCarta May 18, 2023 • edited

Choose a reason for hiding this comment

AntonioCarta May 18, 2023

Choose a reason for hiding this comment

AlbinSou May 18, 2023 • edited

Choose a reason for hiding this comment

AntonioCarta May 17, 2023

Choose a reason for hiding this comment

AlbinSou May 17, 2023

Choose a reason for hiding this comment

AntonioCarta May 18, 2023

Choose a reason for hiding this comment

AlbinSou commented May 23, 2023

AntonioCarta commented May 24, 2023

AlbinSou commented May 24, 2023

AntonioCarta commented May 24, 2023

AlbinSou commented May 24, 2023

AlbinSou commented May 29, 2023

AntonioCarta commented May 29, 2023

coveralls commented May 5, 2023 •

edited

HamedHemati commented May 5, 2023 •

edited

AlbinSou commented May 8, 2023 •

edited

HamedHemati commented May 10, 2023 •

edited

HamedHemati commented May 11, 2023 •

edited

AlbinSou May 17, 2023 •

edited

AntonioCarta May 18, 2023 •

edited

AlbinSou May 18, 2023 •

edited