Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
ReproducingSuperConvergence
.gitattributes
.gitignore
Readme.md
ReproducingSuperConvergence.sln

Readme.md

Reproducing Super Convergence

This is an attempt to reproduce a subset of the results found in Super-Convergence: Very Fast Training of Residual Networks Using Large Learning Rates.

Super-Convergence is described as "a phenomenon... where residual networks can be trained using an order of magnitude fewer iterations than is used with standard training methods".

Figure 1A demonstrates the phenomenon below:

Cyclical Learning Rate (CLR) allows for competitive training in just 10,000 training steps.

Reproduction

Weaker evidence of super-convergence is demonstrated below:

Left: Test accuracy after 10,000 steps with CLR      Right: Test accuracy after 80,000 steps with multistep.

In the above images:

  • A Cyclical Learning Rate allows for a test accuracy of ~85% after 10,000 training steps.
  • A multistep learning rate allows for a test accuracy ofr ~80% after 20,000 training steps. Progress is not made in steps 60,000 to 80,000.
  • Accuracies above 90% were unable to be achieved. This may be related to the small mini-batch sizes used (125) compared to the author's (1,000).

Architecture

The Tensorflow implementation in based on the ResNet-56 architecture described in Appendix A of Super-Convergence: Very Fast Training of Residual Networks Using Large Learning Rates with the following changes:

Corrections

  • The 3x3 Conv Layer at the start of the network has stride=1, not stride=2 as mentioned in the paper.

Undocumented Elements

Appendix

The learning rate, train accuracy and train loss for 10,000 training steps with a cyclical learning rate are shown below:

The learning rate, train accuracy and train loss for 80,000 training steps with a multistep learning rate are shown below: