MNIST learning rate #35

glangford · 2017-12-08T02:00:49Z

The paper says in 4:

Our implementation is in TensorFlow (Abadi et al. [2016]) and we use the Adam optimizer (Kingma and Ba [2014]) with its TensorFlow default parameters, including the exponentially decaying learning rate

The TensorFlow defaults for Adam are described here:
https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer

The current capsulenet.py uses lr_decay as a callback to modify the learning rate, but there isn't any evidence that the paper follows this method. Should the lr_decay callback be removed since Adam already decays the learning rate?
(update: the TensorFlow and Keras defaults for Adam appear to be the same)

The text was updated successfully, but these errors were encountered:

XifengGuo · 2017-12-08T05:01:37Z

@glangford As I said in README.md, I'm not sure if the paper used this learning rate decay method. I found that adopting lr_decay can lead to faster convergence. You can remove it and train for more epochs, it's your choice.

XifengGuo closed this as completed Dec 14, 2017

XifengGuo mentioned this issue Dec 21, 2017

Learning rate decay in the paper #44

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNIST learning rate #35

MNIST learning rate #35

glangford commented Dec 8, 2017 •

edited

Loading

XifengGuo commented Dec 8, 2017

MNIST learning rate #35

MNIST learning rate #35

Comments

glangford commented Dec 8, 2017 • edited Loading

XifengGuo commented Dec 8, 2017

glangford commented Dec 8, 2017 •

edited

Loading