# Optimizers
This is a snapshot of the optimizers implemented in scipy, tensorflow and torch.

## scipy.optimize

Scipy has the most extensive set of methods.

**Local optimization:**
* Nelder-Mead
* Powell
* CG
* BFGS
* Newton-CG
* L-BFGS-B
* TNC
* COBYLA
* SLSQP
* trust-constr
* dogleg
* trust-ncg
* trust-krylov
* trust-exact

**Heuristic global optimization:**
* basinhopping
* brute
* differential_evolution
* shgo
* dual_annealing

Constraints are supported by 'trust-constr' , 'SLSQP' and 'COBYLA'.

In [1]:
import scipy
import scipy.optimize
scipy.__version__
# [f for f in dir(scipy.optimize) if not f.startswith('_')]

'1.4.1'

## tf.optimizers

Tensorflow has a number of stochastic gradient descent methods.
* Adadelta
* Adagrad
* Adam
* Adamax
* Nadam
* RMSprop
* SGD
* Ftrl

In [3]:
import tensorflow as tf
tf.__version__
# [f for f in dir(tf.optimizers) if not f.startswith('_')]

'2.1.0'

## tfp.optimizer

Tensorflow probability implements a some quasi-second order methods and gradient-free methods.
* bfgs_minimize
* lbfgs_minimize
* differential_evolution_minimize
* nelder_mead_minimize
* linesearch
* proximal_hessian_sparse
* sgld
* variational_sgd

In [5]:
import tensorflow_probability as tfp
tfp.__version__
# [f for f in dir(tfp.optimizer) if not f.startswith('_')]

'0.9.0'

In [12]:
# Example
x = tf.Variable(0.)
loss_fn = lambda: (x - 5.)**2
losses = tfp.math.minimize(
    loss_fn,
    num_steps=100,
    optimizer=tf.optimizers.Adam(learning_rate=0.1)
)
print(x)

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=5.0390043>


## torch.optimizers

* ASGD
* Adadelta
* Adagrad
* Adam
* AdamW
* Adamax
* LBFGS
* Optimizer
* RMSprop
* Rprop
* SGD
* SparseAdam

In [8]:
import torch
torch.__version__
# [f for f in dir(torch.optim) if not f.startswith('_')]

'1.3.1'