Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


TensorFlow implementation of COCOB from the paper

Backprop without Learning Rates Through Coin Betting
Francesco Orabona and Tatiana Tommasi


COntinuous COin Betting (COCOB) is a novel algorithm for stochastic subgradient descent (SGD) that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates, nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and obtain a learning rate free procedure for deep networks.

How do we reduce SGD to coin betting?
Betting on a coin works in this way: start with $1 and bet some money wt on the outcome of a coin toss gt, that can be +1 or -1. Similarly, in the optimization world, we want to minimize a 1D convex function f(w), and the gradients gt that we can receive are only +1 and -1. Thus, we can treat the gradients gt as the outcomes of the coin toss, and the value of the parameter wt as the money bet in each round.
If we make a lot of money with the betting it means that we are very good at predicting the gradients, and in optimization terms it means that we converge quickly to the minimum of our function. More in details, the average of our bets will converge to the minimizer at a rate that depends on the dual of the growth rate of our money.

Why using this reduction?
Because algorithms to bet on coins are already known, they are optimal, parameter-free, and very intuitive. So, in the reduction we get an optimal parameter-free algorithm, almost for free. Our paper extends a known betting strategy to a data-dependent one.

Is this magical?
No, you could get the same results just running in parallel copies of SGD with different learning rates and then combining them with an algorithm on top. This would lead exactly to the same convergence rate we get with COCOB, but the advantage here is that you just need to run one algorithm!

We refer the interested reader to the paper for many more details.

Code & Usage

Here you find the scripts needed to reproduce the experiment on MNIST data with a fully connected 2-layers network (1000 hidden units each and ReLU activations, mini-batch size of 100) as reported in Figure 2 (top row) of our paper.

To run the code simply cd to the mnist directory and use


It will create a directory data where the MNIST data are downloaded and saved. In case of any problem with the data download it can also be done manually with


While running, the code will print on screen the training cost and test error per epoch

epoch 0, training cost 7.38284, test error 0.885 
epoch 1, training cost 0.0914842, test error 0.0413998 
epoch 2, training cost 0.0321911, test error 0.0226998
epoch 37, training cost 5.72368e-05, test error 0.0164999 
epoch 38, training cost 5.53296e-05, test error 0.0163999 
epoch 39, training cost 5.34391e-05, test error 0.0164999


  • 05/31/2017 COCOB version 1.0 - Initial release


TensorFlow implementation of COCOB







No releases published


No packages published