Generalize LR scheduler #2345

epwalsh · 2019-01-11T18:30:35Z

This is the followup PR to feature request #2334.

So far I've implemented a base Scheduler and LearningRateScheduler class, where LearningRateScheduler inherits from Scheduler. I also refactored all existing LR schedulers to inherit from LearningRateScheduler, and cleaned up the wrappers for PyTorch LR schedulers. Still to do: implement a momentum scheduler and integrate into the Trainer class.

matt-peters

Thanks for this! It's hard to review the PR as the diff is very large but it's mostly moving things around. Are they any changes to the existing schedulers or tests other then moving locations? We could also merge this before adding the momentum schedulers to break up the PRs.

@joelgrus should also take a look as this as it touches the trainer (although minimally).

matt-peters · 2019-01-11T19:51:50Z

allennlp/tests/training/trainer_test.py

@@ -274,7 +274,7 @@ def test_trainer_can_resume_with_lr_scheduler(self):
                              num_epochs=4, serialization_dir=self.TEST_DIR)
        epoch, _ = new_trainer._restore_checkpoint()
        assert epoch == 2
-        assert new_trainer._learning_rate_scheduler.lr_scheduler.last_epoch == 1
+        assert new_trainer._learning_rate_scheduler.lr_scheduler.last_epoch == 2


Why does this value change?

matt-peters · 2019-01-11T20:00:16Z

allennlp/training/scheduler.py

+                if self._initial_param_group_field not in group:
+                    raise KeyError(f"{self._initial_param_group_field} missing from param_groups[{i}]")
+        self.base_values = [group[self._initial_param_group_field] for group in self.optimizer.param_groups]
+        self.step(epoch=last_epoch)


This might account for the difference in the trainer test -- in the pytorch base torch.optim.lr_scheduler._LRScheduler this line is self.step(last_epoch + 1).

@matt-peters yea, I think that the code / test was wrong before due to the way PyTorch LR schedulers are implemented. I left a note in the comments right above explaining it. I think GitHub is having some issues RN though so my latest commits are not showing up

The commits are showing up now, it was just slow. Sorry I didn't see your comment about holding off on the momentum schedulers yet. But I still haven't added anything to the trainer class

epwalsh · 2019-01-11T20:43:43Z

@matt-peters so far there were only slight changes to the existing LR schedulers to account for a slightly different base class. But the behavior is unchanged (although I split up the tests, I didn't change anything significant), other than fixing (I believe) how PyTorch schedulers behave. Previously PyTorch schedulers were prematurely updating the learning rate an epoch early.

matt-peters · 2019-01-11T23:57:47Z

Got it -- despite the pytorch base class prematurely updating the learning rate, if we change the behavior then does it break backward compatibility with any of the existing schedulers? Or were they already aware of the issue and worked around it? My concern is different behavior with a given scheduler before/after this change.

epwalsh · 2019-01-12T00:34:39Z

I don't think this breaks backwards compatibility, insofar that the schedules from PyTorch LR schedulers will be the same but just shifted by one epoch. But IMHO the new shifted schedule is actually what is expected. I'm not really a fan of how PyTorch's LR schedulers are implemented: step() is called when they are initialized, and then step() is supposed to be called again before the first epoch, which just seems redundant, as opposed to after the epoch (as the AllenNLP trainer does).

Just my opinion though.. it's an easy fix to change it back if you don't agree.

epwalsh · 2019-01-17T16:30:29Z

@matt-peters would it be easier if I broke this PR up? It might make sense to split this into 3 sequential PRs as follows:

Move LR schedulers to their own directory, break out cosine, noam, and slanted triangular into their own files, do the same for the tests (so this only involves moving files)
Implement the Scheduler and LearningRateScheduler abstractions and modify the existing LR schedulers to inherit from the new abstractions
Implement a MomentumScheduler abstraction along with a couple useful concrete momentum schedulers

matt-peters · 2019-01-17T17:45:31Z

I don't think it's necessary to break it up into pieces. But I'm worried about breaking backward compatibility, even if the pytorch behavior calling step() is a little wonky. Can you revert back to the old behavior?

epwalsh · 2019-01-17T18:33:23Z

Sounds good, just did!

matt-peters

Thanks! Looks good, I'll merge this next week.

epwalsh added 13 commits January 10, 2019 15:59

start base scheduler

fdb53a3

fixes

aa1ea45

reorg

be09042

add LearningRateScheduler base

12085f6

update custom LR schedulers

0c186f4

fixes and update tests

88dccab

reorg tests

5cf87d3

fixes

202a888

fix trainer

fc8fc3a

fix docs

96bf3a3

fix trainer test

9c413ba

make wrappers private

0b9d4ee

improve docstring

ee79af1

matt-peters reviewed Jan 11, 2019

View reviewed changes

epwalsh added 2 commits January 11, 2019 12:35

add momentum schedulers

1862c2f

Merge branch 'schedulers' of github.com:epwalsh/allennlp into schedulers

60da34b

fix formatting for docs

bc09297

fix conflicts

073291b

revert PyTorch LR scheduler behavior

1583aee

matt-peters approved these changes Jan 19, 2019

View reviewed changes

epwalsh changed the title ~~WIP: generalize LR scheduler and implement momentum schedulers~~ Generalize LR scheduler and implement momentum schedulers Jan 19, 2019

epwalsh changed the title ~~Generalize LR scheduler and implement momentum schedulers~~ Generalize LR scheduler Jan 20, 2019

epwalsh and others added 2 commits January 21, 2019 08:58

Merge branch 'master' into schedulers

23b1cfc

Merge branch 'master' into schedulers

81436a4

matt-peters and others added 3 commits January 23, 2019 14:08

Merge branch 'master' into schedulers

7e21f81

Merge branch 'master' into schedulers

2393300

Merge branch 'master' into schedulers

87e9122

matt-peters merged commit 5ff923c into allenai:master Jan 28, 2019

epwalsh deleted the schedulers branch January 28, 2019 22:47

epwalsh mentioned this pull request Feb 1, 2019

Add momentum schedulers to trainer #2469

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize LR scheduler #2345

Generalize LR scheduler #2345

epwalsh commented Jan 11, 2019

matt-peters left a comment •

edited

matt-peters Jan 11, 2019

matt-peters Jan 11, 2019

epwalsh Jan 11, 2019 •

edited

epwalsh Jan 11, 2019 •

edited

epwalsh commented Jan 11, 2019

matt-peters commented Jan 11, 2019

epwalsh commented Jan 12, 2019

epwalsh commented Jan 17, 2019

matt-peters commented Jan 17, 2019

epwalsh commented Jan 17, 2019

matt-peters left a comment

Generalize LR scheduler #2345

Generalize LR scheduler #2345

Conversation

epwalsh commented Jan 11, 2019

matt-peters left a comment • edited

Choose a reason for hiding this comment

matt-peters Jan 11, 2019

Choose a reason for hiding this comment

matt-peters Jan 11, 2019

Choose a reason for hiding this comment

epwalsh Jan 11, 2019 • edited

Choose a reason for hiding this comment

epwalsh Jan 11, 2019 • edited

Choose a reason for hiding this comment

epwalsh commented Jan 11, 2019

matt-peters commented Jan 11, 2019

epwalsh commented Jan 12, 2019

epwalsh commented Jan 17, 2019

matt-peters commented Jan 17, 2019

epwalsh commented Jan 17, 2019

matt-peters left a comment

Choose a reason for hiding this comment

matt-peters left a comment •

edited

epwalsh Jan 11, 2019 •

edited

epwalsh Jan 11, 2019 •

edited