Skip to content

JoshVarty/ReproducingSuperConvergence

Repository files navigation

Reproducing Super Convergence

This is an attempt to reproduce a subset of the results found in Super-Convergence: Very Fast Training of Residual Networks Using Large Learning Rates.

Super-Convergence is described as "a phenomenon... where residual networks can be trained using an order of magnitude fewer iterations than is used with standard training methods".

Figure 1A demonstrates the phenomenon below:

Cyclical Learning Rate (CLR) allows for competitive training in just 10,000 training steps.

Reproduction

Weaker evidence of super-convergence is demonstrated below:

Left: Test accuracy after 10,000 steps with CLR      Right: Test accuracy after 80,000 steps with multistep.

In the above images:

  • A Cyclical Learning Rate allows for a test accuracy of ~85% after 10,000 training steps.
  • A multistep learning rate allows for a test accuracy ofr ~80% after 20,000 training steps. Progress is not made in steps 60,000 to 80,000.
  • Accuracies above 90% were unable to be achieved. This may be related to the small mini-batch sizes used (125) compared to the author's (1,000).

Architecture

The Tensorflow implementation in based on the ResNet-56 architecture described in Appendix A of Super-Convergence: Very Fast Training of Residual Networks Using Large Learning Rates with the following changes:

Corrections

  • The 3x3 Conv Layer at the start of the network has stride=1, not stride=2 as mentioned in the paper.

Undocumented Elements

Appendix

The learning rate, train accuracy and train loss for 10,000 training steps with a cyclical learning rate are shown below:

The learning rate, train accuracy and train loss for 80,000 training steps with a multistep learning rate are shown below:

Releases

No releases published

Packages

No packages published

Languages