Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hypergradient Calculation Different than from the paper? #2

Open
JohnnyC08 opened this issue Jan 19, 2022 · 1 comment
Open

Hypergradient Calculation Different than from the paper? #2

JohnnyC08 opened this issue Jan 19, 2022 · 1 comment

Comments

@JohnnyC08
Copy link

On page 3 of the paper in algorithm 1 it states that that if (x_i, y_i) are not in the current batch, then that means we add (N/B)*g_t_i to the previous step's moment gradient scaled by the momentum and the previous step's hyper gradient scaled by the regularization co-efficient.

However, when I look at the HydraHook class I see that we include the instance gradient (N/B)*g_t_i if the index is part of the current batch and do not include the instance gradient if it is not. This seems opposite to what is suggested in Algorithm 1 and I hope you can help me figure what is going on.

Thanks!

@cyyever
Copy link
Owner

cyyever commented Nov 27, 2023

@JohnnyC08 There is an error in the pseudo code. The condition should be if (x_i, y_i) are in the current batch. Just noticed this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants