Skip to content
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
Screen Shot 2019-02-15 at 12.22.45 AM.png

Cyclical Learning Rate Scheduler With Decay in Pytorch

Adapted from:

Reach multiple minimas to create a powerful ensemble or just to find the best one using Cyclical Learning Rates with Decay. Ideally decay milestones should intersect with cyclical milestones for smooth transition as shown below. Can be used with any optimizer such as Adam.

Cyclic learning rate schedulers -PyTorch


Cyclic learning rate schedules -

  • cyclic cosine annealing - CycilcCosAnnealingLR()
  • cyclic linear decay - CyclicLinearLR()


  • numpy
  • python >= 2.7
  • PyTorch >= 0.4.0


SGDR: Stochastic Gradient Descent with Warm Restarts



Sample - (follow similarly for CyclicLinearLR) milestones specifies when learning rate should shoot back up and decay_milestones when learning rate should be decayed.

from cyclicLR import CyclicCosAnnealingLR
import torch

optimizer = torch.optim.SGD(model.parameters(),lr=1e-3)
scheduler = CyclicCosAnnealingLR(optimizer,milestones=[10,25,60,80,120,180,240,320,400,480],decay_milestones=[60, 120, 240, 480, 960],eta_min=1e-6)
for epoch in range(500):

Note: scheduler.step() shown is called at every epoch. It can be called even in every batch. Remember to specify milestones in number of batches (and not number of epochs) in such as case. For only cyclical lr with no decay, do not pass a decay list. eta_min is the minimum lr it will go to and continue on that once cyclical shedule is over which is by default 1e-6.


Cyclic Cosine Annealing Learning Rate Schedule

Cosine LR

Cyclic Linear Annealing Learning Rate Schedule

Linear LR

You can’t perform that action at this time.