You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement the extrapolation from the past algorithm (Popov, 1980). A good and modern source is Gidel et al. (2019).
This is an algorithm for computing parameter updates similar to extragradient: it computes the direction for updating parameters based on a "lookahead step". It is less intensive computationally than extragradient and enjoys similar convergence results for some class problems (Gidel et al., 2019).
Motivation
Whereas extragradient requires two gradient computations per parameter update, extrapolation from the past stores and re-uses gradients from previous extrapolation steps for use during current extrapolation steps. This means less computational intensity in terms of gradient calculations, which may be helpful in some settings.
However, storing the previous gradients still means an overhead in terms of storage as opposed to gradient descent-ascent.
References
G. Gidel, H. Berard, G. Vignoud, P. Vincent, S. Lacoste-Julien. A Variational Inequality Perspective on Generative Adversarial Networks. In ICLR, 2019.
L. D. Popov. A modification of the arrow-hurwicz method for search of saddle points. Mathematical
notes of the Academy of Sciences of the USSR, 1980.
The text was updated successfully, but these errors were encountered:
Enhancement
Implement the extrapolation from the past algorithm (Popov, 1980). A good and modern source is Gidel et al. (2019).
This is an algorithm for computing parameter updates similar to extragradient: it computes the direction for updating parameters based on a "lookahead step". It is less intensive computationally than extragradient and enjoys similar convergence results for some class problems (Gidel et al., 2019).
Motivation
Whereas extragradient requires two gradient computations per parameter update, extrapolation from the past stores and re-uses gradients from previous extrapolation steps for use during current extrapolation steps. This means less computational intensity in terms of gradient calculations, which may be helpful in some settings.
However, storing the previous gradients still means an overhead in terms of storage as opposed to gradient descent-ascent.
References
notes of the Academy of Sciences of the USSR, 1980.
The text was updated successfully, but these errors were encountered: