Skip to content

Justin-Tan/entropy-sgd-tf

Repository files navigation

entropy-sgd-tf

TensorFlow implementation of Entropy SGD: Biasing gradient descent into wide valleys. The entropy-SGD optimization algorithm uses geometric information about the energy landscape to bias the optimization algorithm toward flat regions of the loss function, which may aid generalization.


Instructions

The CIFAR-10 dataset will be automatically downloaded and converted to tfrecord format when first run. The default is to run on CIFAR-10 with the entropy-SGD optimizer with 20 Langevin iterations on a wide residual network (28x10). For a complete list of options run python3 train.py -h, e.g. to run on CIFAR-10 using the entropy-sgd optimizer with 5 Langevin iterations:

# Check command line arguments
$ python3 train.py -h
# Run
$ python3 train.py -opt entropy-sgd -L 5

The default hyperparameters (used to report all results) can be accessed and set in the config.py file under config_train. Most should be self-explanatory. For parameters labelled 'entropy-sgd specific', you may need to refer to the original paper. Checkpoints and Tensorboard scalars are saved beneath their respective directories.

Multi-GPU

Coming soon...


Results

Both CIFAR-10/CIFAR-100 models are trained with the same hyperparameters and learning rate schedule specified in the original paper. The dataset is subjected to meanstd preprocessing and random rotations+reflections. Convergence when training on both datasets is compared with vanilla SGD and SGD with Nesterov momentum. The accuracy reported is the average of 5 runs with random weight initialization.

Models trained without entropy-SGD are run for 200 epochs, models trained with entropy-SGD are run with L=20 for 10 epochs, with the hyperparameters specified as in the CIFAR-10 run in the original paper.

CIFAR-10

Entropy-SGD seems to be outperformed by SGD + momentum. Retraining by applying momentum to the SGLD and outer loop updates.

CIFAR-100

# Plots showing convergence of entropy-sgd v. sgd here.

Dependencies

Related work

Releases

No releases published

Packages

No packages published