Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix AdamW and improve decays docs #1612

Merged
merged 6 commits into from
Jun 10, 2021
Merged

fix AdamW and improve decays docs #1612

merged 6 commits into from
Jun 10, 2021

Conversation

CarloLucibello
Copy link
Member

@CarloLucibello CarloLucibello commented Jun 10, 2021

There is great disorder under the sky with optimizers. Since in chaining optimizers

opt = Optimizer(opt1, opt2)

the order generally matters (a lot!) we have to be very careful in documenting how to use decays. In fact, we were giving completely wrong indirections for InvDecays and ExpDecays. The correct ordering for standard use is

Optimizer(WeightDecay(), ADAM())   # equivalent to L2 regularization
Optimizer(ADAM(), InvDecay())   # learning rate scheduling
Optimizer(ADAM(), ExpDecay())   # learning rate scheduling

Different orderings are to be typically considered as bugs in user code.

This PR fixes examples and tries to clarify documentation in this regard.

Also fixes AdamW, which was doing something totally wrong due to the aforementioned confusion.
(see https://towardsdatascience.com/why-adamw-matters-736223f31b5d for how AdamW works).

Related in model-zoo: FluxML/model-zoo#303 and FluxML/model-zoo#304

ToucheSir
ToucheSir previously approved these changes Jun 10, 2021
Copy link
Member

@ToucheSir ToucheSir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think this is further evidence that scheduling should be distinct from the optimizer interface ;)

DhairyaLGandhi
DhairyaLGandhi previously approved these changes Jun 10, 2021
src/optimise/optimisers.jl Outdated Show resolved Hide resolved
src/optimise/optimisers.jl Outdated Show resolved Hide resolved
src/optimise/optimisers.jl Outdated Show resolved Hide resolved
Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
CarloLucibello and others added 2 commits June 10, 2021 19:30
Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
@CarloLucibello
Copy link
Member Author

@DhairyaLGandhi fixed the typos, need another approval

Copy link
Member

@DhairyaLGandhi DhairyaLGandhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

@bors
Copy link
Contributor

bors bot commented Jun 10, 2021

Build succeeded:

@bors bors bot merged commit 108cbc8 into master Jun 10, 2021
@darsnack
Copy link
Member

Sorry for a missed review request (was traveling). Just want to agree with @ToucheSir that this is more evidence that scheduling policies and optimizers should not be forced to share the same interface even if it is technically possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants