AMSGrad-Tensorflow

Simple Tensorflow implementation of On the Convergence of Adam and Beyond

Hyperparameter

For the default hyperparameter, we set it to the best value in the experiment
learning_rate = 0.01
beta1 = 0.9
beta2 = 0.99
Depending on which network you are using, performance may be good at beta2 = 0.99 (default)

Usage

  from AMSGrad import AMSGrad
  
  train_op = AMSGrad(learning_rate=0.01, beta1=0.9, beta2=0.99, epsilon=1e-8).minimize(loss)

Network Architecture

  x = fully_connected(inputs=images, units=100)
  x = relu(x)
  logits = fully_connected(inputs=x, units=10)

Mnist Result (iteration = 30K)

lr=0.1, beta1=0.9, beta2=various

lr=0.01, beta1=0.9, beta2=various

Reference

Keras-AMSGrad

Author

Junho Kim

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
assests		assests
.DS_Store		.DS_Store
.gitignore		.gitignore
AMSGrad.py		AMSGrad.py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assests

assests

.DS_Store

.DS_Store

.gitignore

.gitignore

AMSGrad.py

AMSGrad.py

LICENSE

LICENSE

README.md

README.md

Repository files navigation

AMSGrad-Tensorflow

Hyperparameter

Usage

Network Architecture

Mnist Result (iteration = 30K)

lr=0.1, beta1=0.9, beta2=various

lr=0.01, beta1=0.9, beta2=various

Reference

Author

About

Releases

Packages

Languages

License

taki0112/AMSGrad-Tensorflow

Folders and files

Latest commit

History

Repository files navigation

AMSGrad-Tensorflow

Hyperparameter

Usage

Network Architecture

Mnist Result (iteration = 30K)

lr=0.1, beta1=0.9, beta2=various

lr=0.01, beta1=0.9, beta2=various

Reference

Author

About

Resources

License

Stars

Watchers

Forks

Languages