Skip to content
very adaptive optimizers in tensorflow
Branch: master
Clone or download
Latest commit 58f188d Jan 20, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE add MIT license Jan 19, 2018 add reference to readme Jul 10, 2017 use import * Mar 20, 2017 added aggregate optimizers file Mar 20, 2017 go back to sqrt(5) default k Mar 21, 2017


very adaptive optimizers in tensorflow based on this paper.

Three optimizers that provably achieve optimal convergence rates with no prior information about the data.

FreeRexDiag is a coordinate-wise optimizer (this is probably the best default algorithm).

FreeRexSphere uses an L2 update for dimension-independence (good for high-dimensional problems).

FreeRexLayerWise is an intermediate between the above two that might be computationally faster than FreeRexSphere.

These are all implemented as subclasses of Tensorflow's optimizer class. You should be able to use them as drop-in replacements for other optimizers. For example:

optimizer = tf.Train.AdamOptimizer(1e-4)
train_step = optimizer.minimize(loss)

Can be replaced with

optimizer = FreeRex() # FreeRex is an alias for FreeRexDiag
train_step = optimizer.minimize(loss)

Each algorithm takes as input the parameter k_inv (e.g. optimizer = FreeRex(0.5)). This parameter is analagous to a learning rate, but provably requires less tuning. The default is k_inv=1.0, which has worked well in my limited experiments.

You can’t perform that action at this time.