-
-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix AdamW and improve decays docs #1612
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think this is further evidence that scheduling should be distinct from the optimizer interface ;)
Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
564ab29
Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
@DhairyaLGandhi fixed the typos, need another approval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
Build succeeded: |
Sorry for a missed review request (was traveling). Just want to agree with @ToucheSir that this is more evidence that scheduling policies and optimizers should not be forced to share the same interface even if it is technically possible. |
There is great disorder under the sky with optimizers. Since in chaining optimizers
the order generally matters (a lot!) we have to be very careful in documenting how to use decays. In fact, we were giving completely wrong indirections for
InvDecays
andExpDecays
. The correct ordering for standard use isDifferent orderings are to be typically considered as bugs in user code.
This PR fixes examples and tries to clarify documentation in this regard.
Also fixes AdamW, which was doing something totally wrong due to the aforementioned confusion.
(see https://towardsdatascience.com/why-adamw-matters-736223f31b5d for how AdamW works).
Related in model-zoo: FluxML/model-zoo#303 and FluxML/model-zoo#304