New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove OnlineSupervisedTemplate #1362
Conversation
Pull Request Test Coverage Report for Build 5092658528
💛 - Coveralls |
Thanks @AlbinSou for the PR. I think at this point it makes sense to merge them to avoid additional work and confusion when implementing new strategies. Here are my comments:
I think it make sense to remove
Minor: another strategy that uses
The reason for updating instead of resetting in online CL streams with task boundaries is that we don't want to reset the optimizer state after each sub-experience of the same experience as this would be problematic for some online strategies when using Adam optimizer.
I think it makes sense to remove
Since the only difference is in model adaptation and optimizer resetting, if merged correctly, it shouldn't affect anything. |
Yes, you are right about the fact that we should not reset the optimizer momentum after every experience. However, I think this is a more general question. Maybe we also don't want to reset it for normal task shifts after all. I think we should add this as a part of the strategy arguments (like reset-optimizer-at-task-shift trigger button). What do you think ? Otherwise in online learning without task boundaries we cannot really know when we should reset or not the optimizer. And I think waiting for new parameters to come in is a bit weird since it assumes that a distribution change will be accompanied with new parameters (a bias we have from class-incremental learning where it is indeed the case). I will put back this update optimizer line but for me it should be clear when it's reseted or not, maybe a strategy argument would be better than an arbitrary decision. |
In general I'm not against adding an argument to control the optimizer reset but I have to point out one thing. The current We have to do (1) before each experience because all the strategies need to work with DynamicModules, but (2) is optional. The current solution behaves in the same way whether you have an expanding layer or not. Instead, keeping the state is tricky when a layer is expanded, because we don't know what's in the state. Notice that a user can always override the method to change this behavior easily.
I'm not sure I understand this point but in general a distribution shift does not always result in parameter expansion (e.g. domain-incremental). |
Regarding optimizer resetting: I think we can treat all experience types in the same way and: 1- Do not reset the optimizer after each experience. This can be done as a part of the strategy in case it's needed. 2- Mark the strategy's model changes with an attribute assigned to dynamic modules after their |
It's already possible to check expanded layers by looking at their dimension. One problem I see is that in this way models with incremental classifiers or static ones behave differently. |
Yea, that's also possible. The idea was to avoid multiple parameter-wise comparisons.
There is always the option to reset the optimizer. At least to me it makes more sense to always update and expand the optimizer state rather than resetting it, and let the user decide whether it should be reset or not. |
@AntonioCarta I don't understand which one is the failing test. It just says unittest 3.7 "Job has failed" on github. Apart from that, I did not understand what was the conclusion of the above discussion. Should we use update_optimizer() by default ? For me it makes sense to reset the optimizer at task boundaries when given, since most people do that. |
else: | ||
avalanche_model_adaptation(model, self.experience) | ||
|
||
return model.to(self.device) | ||
|
||
def make_optimizer(self): | ||
def make_optimizer(self, reset_opt=True, reset_state=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this are global arguments of the train method I would be more explicit reset_optimizer
, reset_optimizer_state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, add a docstring for the arguments
reset_optimizer(self.optimizer, self.model) | ||
reset_opt = detect_param_change( | ||
self.optimizer, list(self.model.named_parameters()), | ||
) or reset_opt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is different from what we discussed. Only the dynamic modules should be reset unless reset_opt
is set. The state of the others should be kept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then I don't know how to reset only these. Because depending on the optimizer the state could have different structure. In other words, I know how to reset the whole state, but not one part of the state only. One way I see would be to have one parameter group per parameter but this would be uncompatible with user defined parameter groups
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can check the shape of parameters before/after adaptation. If they changed, you should reset the state for those parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, I know how to reset the whole state, but not one part of the state only.
I don't know how to fix this honestly. This is why we just used to reset everything.
One way I see would be to have one parameter group per parameter but this would be uncompatible with user defined parameter groups
I would not mess with parameter groups since they may be defined by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can check the shape of parameters before/after adaptation. If they changed, you should reset the state for those parameters.
This is already what I am doing, the id of the parameter changes whenever the shape changes. I could check the shape also but this would be redundant I think. Though I should make sure that this thing (id change = shape change) is always true, but I think it is. I might have found a way to update the state btw, Ill push smt today.
self.model.parameters(), | ||
reset_state=False) | ||
else: | ||
if reset_state: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reset_state and reset_opt are both resetting the state. What's the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reset state is resetting only the state, not attributing new parameters to the optimizer, whereas reset_opt is doing both (attributing new parameters to the optimizer and resetting the state)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reset state is resetting only the state, not attributing new parameters to the optimizer
we never want to do this. I think you can remove one of the two options since they are redundants.
I attempted to fix the checkpointing tests in blind since I don't manage to run them locally. On thing that I foresee could be a problem in the checkpointing case is that in that case I'm not sure the loaded pointers (ids) are necessarily the same for the loaded model parameters than the ones that I store in optimized_param_id. Also, it's possible that they are not even the same between model parameters and optimized parameters. For checkpointing I cannot remove the optimized_params_id and call reset_optimizer (which is the default behavior when optimized_params_id is None), since there might be a saved optimizer state that we want to keep for the remaining part of the optimization process. I don't know yet how to fix this problem, if you know exactly how checkpointing of the optimizer and model is done I'll welcome your help. But I think that the problem I described is happening and this must be why we have an error in that case. I commented in one part some code that I thought would handle this case but it does not and now I understand why, it's because the ids in the loaded optimizer are not necessarly the same than the ids that I save in the optimized_params_id. Anyways, in this last commit I also add options to manage freezing of parameters, so far I was adding all model parameters, now I check that p.requires_grad = True before adding to the optimizer, and I also strongly link the optimized_params_id to the functions that manipulate the optimizer, this attribute is now returned by the functions. I guess later both of these functions could be moved to the appropriate file (update_optimizer and reset_optimizer in dynamic_optimizers), but for now I keep them here. |
…ough checkpointing
You should add a test that checks that optimizers with state (e.g. Adam) are serialized properly. Also, we could try to save the
You should not do this. The user passes the parameters to the optimizer and you don't know if frozen parameters that are passed to the optimizer will stay frozen. |
Ok, I will add this test, I have already made some tests with checkpointing but did not check the state. From visual debugging check it looked like they were correctly serialized. For the requires_grad function it's more that I'm thinking of some use case that I encountered. I agree with what you said, but then I don't know how a user could decide not to optimize some parameters. I understand that if it is in the optimizer and with requires_grad = False it will not get a gradient so it's fine, however, if a state exists for this parameter it will still get update by the momentum, so it won't be frozen. I wanted to avoid these cases. |
This is probably a bug on the user side but we can't do much about it. We should have the same behavior of pytorch here. |
But, in pytorch the user can decide to set requires_grad = False and retire the parameter from the optimizer, here we are giving all the parameters to the optimizer all the time, leaving no option for the user to freeze any parameters. Maybe we should at least give the choice of what parameters to put into the optimizer. |
@AntonioCarta It's ready to merge on my side |
ok, thank you for all the work |
Avalanche already handles well the online streams as streams of small experiences, which are already compatible with the normal strategies (SupervisedTemplate). I think this is confusing to have two types of templates. Also because a CL agent is not suppose to know what is going to come (online stream or non online, maybe experiences with various sizes).
This is why I attempt to remove the online templates.
I removed the existing OnlineSupervisedTemplate and OnlineMeta...Template, and to merge the OnlineObservation into the BatchObservation (in the case of an online stream, we have to handle the access_task_boundaries).
I removed the Online ER ACE, which was the only method implementing both Online and Non Online behavior (aside from the plugins who are agnostic to that)
There is one thing however that I do not understand in the old online observation it's the distinction between make optimizer and update optimizer. I don't understand why it's necessary and I don't know if I should keep this behavior or drop it.
I also kept the OnlineNaive strategy for now, just turning it to a SupervisedTemplate and rerouting the train_passes to train_epochs.
Maybe keep it as a draft for now, till we are sure the currently implemented online strategy exhibit the same behavior when turned into their non-online version.