# Exponential learning rates

Tuning a constant learning rate is difficult, using an adaptive learning rate is often better. A small learning rate will converge after a very long training process and a large learning rate will reduce the loss quickly but can bounce around the optimum. An adaptive learning rate will change over the course of training.

Exponential learning rates are a function of the iteration number:

$ t: \eta(t) = \eta_0 10^{\frac{-t}{r}} $

It requires tuning $\eta_0$ and $r$. At every step, the learning rate will drop by a factor of 10.

In this paper: http://ieeexplore.ieee.org/document/6638963/, the authors compare performance of learning schedules (ways of setting the learning rate). They find that exponential scheduling is simple to implement and converges quickly, compared to others. 

In TensorFlow:

In [None]:
initial_learning_rate = 0.1
decay_steps = 10000 #drop after 10000 steps t
decay_rate = 1/10 # drop learning rate by factor of 10 
global_step = tf.Variable(0, trainable=False, name="global_step")
learning_rate = tf.train.exponential_decay(initial_learning_rate, global_step, decay_steps, decay_rate)
optimiser = tf.train.MomentumOptimizer(learning_rate, momentum=0.9)
training_op = optimiser.minimize(loss, global_step=global_step)

The tf.train.exponential_decay function returns the decayed_learning_rate as:  
  
learning_rate * decay_rate ^ (global_step / decay_steps)

https://www.tensorflow.org/api_docs/python/tf/train/exponential_decay