fix AdamW and improve decays docs #1612

CarloLucibello · 2021-06-10T06:40:03Z

There is great disorder under the sky with optimizers. Since in chaining optimizers

opt = Optimizer(opt1, opt2)

the order generally matters (a lot!) we have to be very careful in documenting how to use decays. In fact, we were giving completely wrong indirections for InvDecays and ExpDecays. The correct ordering for standard use is

Optimizer(WeightDecay(), ADAM())   # equivalent to L2 regularization
Optimizer(ADAM(), InvDecay())   # learning rate scheduling
Optimizer(ADAM(), ExpDecay())   # learning rate scheduling

Different orderings are to be typically considered as bugs in user code.

This PR fixes examples and tries to clarify documentation in this regard.

Also fixes AdamW, which was doing something totally wrong due to the aforementioned confusion.
(see https://towardsdatascience.com/why-adamw-matters-736223f31b5d for how AdamW works).

Related in model-zoo: FluxML/model-zoo#303 and FluxML/model-zoo#304

ToucheSir

LGTM. I think this is further evidence that scheduling should be distinct from the optimizer interface ;)

src/optimise/optimisers.jl

Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>

CarloLucibello · 2021-06-10T17:30:58Z

@DhairyaLGandhi fixed the typos, need another approval

DhairyaLGandhi

bors r+

bors · 2021-06-10T18:47:53Z

Build succeeded:

buildkite/flux-dot-jl

darsnack · 2021-06-11T00:05:10Z

Sorry for a missed review request (was traveling). Just want to agree with @ToucheSir that this is more evidence that scheduling policies and optimizers should not be forced to share the same interface even if it is technically possible.

CarloLucibello added 3 commits June 10, 2021 08:08

fix AdamW; improve WeightDecay docstring

e1a4bbc

improve InvDecays docstring

5f314ac

improve ExpDecays docstring

9f1966f

CarloLucibello requested a review from darsnack June 10, 2021 06:42

CarloLucibello force-pushed the cl/opt branch from b7450be to 9f1966f Compare June 10, 2021 07:44

ToucheSir previously approved these changes Jun 10, 2021

View reviewed changes

DhairyaLGandhi previously approved these changes Jun 10, 2021

View reviewed changes

src/optimise/optimisers.jl Outdated Show resolved Hide resolved

src/optimise/optimisers.jl Outdated Show resolved Hide resolved

src/optimise/optimisers.jl Outdated Show resolved Hide resolved

Update src/optimise/optimisers.jl

564ab29

Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>

CarloLucibello dismissed stale reviews from DhairyaLGandhi and ToucheSir via 564ab29 June 10, 2021 17:30

CarloLucibello and others added 2 commits June 10, 2021 19:30

Update src/optimise/optimisers.jl

b9c94f5

Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>

Update src/optimise/optimisers.jl

380ca76

Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>

DhairyaLGandhi approved these changes Jun 10, 2021

View reviewed changes

bors bot merged commit 108cbc8 into master Jun 10, 2021

darsnack mentioned this pull request Jun 23, 2021

Unclear wording in "Composing Optimizers" section of docs #1627

Open

ToucheSir mentioned this pull request Jan 28, 2022

Port over rule changes from Flux FluxML/Optimisers.jl#38

Closed

3 tasks

mcabbott mentioned this pull request Jan 30, 2022

Fix ADAMW, and track the loss FluxML/Optimisers.jl#46

Closed

CarloLucibello mentioned this pull request Feb 10, 2022

fix adamw #1868

Merged

CarloLucibello deleted the cl/opt branch April 7, 2022 07:01

ToucheSir mentioned this pull request Sep 12, 2023

write >> as infix notation for OptimiserChain FluxML/Optimisers.jl#139

Open

CarloLucibello mentioned this pull request May 3, 2024

Implementation of AdamW differs from PyTorch #2433

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix AdamW and improve decays docs #1612

fix AdamW and improve decays docs #1612

CarloLucibello commented Jun 10, 2021 •

edited

ToucheSir left a comment

CarloLucibello commented Jun 10, 2021

DhairyaLGandhi left a comment

bors bot commented Jun 10, 2021

darsnack commented Jun 11, 2021

fix AdamW and improve decays docs #1612

fix AdamW and improve decays docs #1612

Conversation

CarloLucibello commented Jun 10, 2021 • edited

ToucheSir left a comment

Choose a reason for hiding this comment

CarloLucibello commented Jun 10, 2021

DhairyaLGandhi left a comment

Choose a reason for hiding this comment

bors bot commented Jun 10, 2021

darsnack commented Jun 11, 2021

CarloLucibello commented Jun 10, 2021 •

edited