FROG Optimizer

FROG (Fisher ROw-wise preconditioninG) is a second-order optimizer that approximates natural-gradient updates using row-wise Fisher preconditioning and batched Conjugate Gradient (CG) solves. It is inspired by K-FAC, but avoids Kronecker factorization and instead approximates the Fisher matrix as row-wise block-diagonal, using a small subset of samples to obtain stable, scale-free updates with low computational overhead.

A detailed technical description of the method is provided in technical_overview.pdf.

Motivation

FROG is motivated by the following observations:

Scale-invariance is a desirable optimizer property, as in Adam or RMSProp.
Classical Newton and natural-gradient methods are not scale-invariant and are sensitive to gradient magnitude.
At later stages of training, decoupling input and output factors (X and Y), as done in K-FAC, often degrades performance.
The empirical Fisher term ( Y^\top Y ) is typically strongly diagonally dominant, suggesting that cross-row correlations are weak and that row-wise preconditioning is a reasonable approximation.
Exact empirical Fisher construction is prohibitively expensive; therefore subsampling and iterative solvers are required in practice.

Method Overview

FROG approximates normalized natural-gradient updates by solving a row-wise Fisher system for each output row (or convolutional channel).
Each row’s Fisher matrix is estimated from a small subset of activations and inverted approximately using a small number of batched CG iterations.

To stabilize learning and remove sensitivity to loss scaling, the Fisher matrix is normalized by its average diagonal value tr(F) / D and damped.
Momentum is applied before Fisher preconditioning, and weight decay is handled in a decoupled manner.

Key Modifications (relative to K-FAC)

No Kronecker factorization of the Fisher matrix.
Row-wise Fisher separation: each output row (or channel) is preconditioned independently.
Trace-normalized Fisher F / (tr(F) / D) for scale-free updates.
Iterative Conjugate Gradient instead of explicit matrix inversion.
Subsampled activations for Fisher estimation, significantly reducing computational cost.

For full details, algorithmic description, and practical guidelines, see technical_overview.pdf.

Wall-clock Efficiency (CIFAR-10)

All experiments were conducted on NVIDIA A100 (80 GB) using the CIFAR-10 dataset and the ResNet-18 model.
Reported values correspond to wall-clock time (in minutes) required to first reach a given test accuracy. A reference reproduction notebook is provided at experiments/exp-cifar10.ipynb.

The learning-rate schedule follows the setup from
https://github.com/hirotomusiker/cifar10_pytorch/:
a constant learning rate with 10× drops at epochs 80 and 120.

Time to Reach Target Test Accuracy

Optimizer	Batch Size (bs)	Sample Size (s)	88% (min)	90% (min)	92% (min)	94% (min)
FROG	512	64	1.29	1.86	5.37	11.88
SGD	128	–	3.82	5.01	9.63	11.44
SGD	512	–	3.91	7.80	8.84	–

Training Dynamics (CIFAR-10)

Interpretation

FROG reaches 88–92% test accuracy between ~1.6× and ~4.2× faster in wall-clock time compared to SGD baselines.
Final accuracy is comparable across methods; the main benefit is faster time-to-solution, especially in early and mid-training.

Legacy version: K-FROG (Kronecker FROG)

I kept the previous optimizer, which performs kronecker-based factoring, same as in K-FAC and uses CG on XX^T, while performing inverse sqrt on YY^T.

TODO

Add support for linear layers and biases.
Add mixed precision support
Compare FROG with other optimizers on convolutional architectures.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
experiments		experiments
figures		figures
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
frog.py		frog.py
kfrog.py		kfrog.py
requirements.txt		requirements.txt
technical_overview.pdf		technical_overview.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FROG Optimizer

Motivation

Method Overview

Key Modifications (relative to K-FAC)

Wall-clock Efficiency (CIFAR-10)

Time to Reach Target Test Accuracy

Training Dynamics (CIFAR-10)

Interpretation

Legacy version: K-FROG (Kronecker FROG)

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FROG Optimizer

Motivation

Method Overview

Key Modifications (relative to K-FAC)

Wall-clock Efficiency (CIFAR-10)

Time to Reach Target Test Accuracy

Training Dynamics (CIFAR-10)

Interpretation

Legacy version: K-FROG (Kronecker FROG)

TODO

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages