# Optimizer
In gradient-base optimization algorithms, we update the parameters (or weights) using the gradients in each iteration. We call this updating function as Optimizer.

The main method of an optimizer is update(weight, grad), which updates a NDArray weight using a NDArray gradient. But given that a multi-layer neural network often has more than one weights, we assign each weight a unique integer index. Furthermore, an optimizer may need space to store auxiliary state, such as momentum, we also allow a user-defined state for updating. In summary, an optimizer has two major methods

- createState(index, weight): create auxiliary state for the index-th weight.
- update(index, weight, grad, state): update the index-th weight given the gradient and auxiliary state. The state can be also updated.


## Jupyter Scala kernel
Add mxnet scala jar which is created as a part of MXNet Scala package installation in classpath as follows:

**Note**: Process to add this jar in your scala kernel classpath can differ according to the scala kernel you are using.

We have used [jupyter-scala kernel](https://github.com/alexarchambault/jupyter-scala) for creating this notebook.

```
classpath.addPath(<path_to_jar>)

e.g
classpath.addPath("mxnet-full_2.11-osx-x86_64-cpu-0.1.2-SNAPSHOT.jar")
```

## Basic Usage
### Create and Update
MXNet has already implemented several popular optimizers in [optimizer.scala](https://github.com/dmlc/mxnet/blob/master/scala-package/core/src/main/scala/ml/dmlc/mxnet/Optimizer.scala). An convenient way to create one is by using new SGD(args). The following codes create a standard SGD updater which does

```scala
weight = weight - learning_rate * grad
```

Import the optimizer you want to use as follows:

In [2]:
import ml.dmlc.mxnet._
import ml.dmlc.mxnet.optimizer.SGD
val opt = new SGD(learningRate=0.1f)

[32mimport [36mml.dmlc.mxnet._[0m
[32mimport [36mml.dmlc.mxnet.optimizer.SGD[0m
[36mopt[0m: [32mml[0m.[32mdmlc[0m.[32mmxnet[0m.[32moptimizer[0m.[32mSGD[0m = ml.dmlc.mxnet.optimizer.SGD@4b5a5245

Then we can use the update function.


In [3]:
val grad = NDArray.ones(2,3)
val weight = NDArray.ones(2,3)
val index = 0
opt.update(index, weight, grad, NDArray.empty(2,3))
weight.toArray

log4j:WARN No appenders could be found for logger (MXNetJVM).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.


[36mgrad[0m: [32mNDArray[0m = ml.dmlc.mxnet.NDArray@a5d8d68a
[36mweight[0m: [32mNDArray[0m = ml.dmlc.mxnet.NDArray@ffb131d2
[36mindex[0m: [32mInt[0m = [32m0[0m
[36mres2_4[0m: [32mArray[0m[[32mFloat[0m] = [33mArray[0m([32m0.89999F[0m, [32m0.89999F[0m, [32m0.89999F[0m, [32m0.89999F[0m, [32m0.89999F[0m, [32m0.89999F[0m)

When momentum is non-zero, the sgd optimizer needs extra state. State is of type AnyRef. So, we cast the type to NDArray and then print the value of state.


In [4]:
val momOpt = new SGD(learningRate = 0.1f, momentum = 0.01f)
val index = 0
val grad = NDArray.ones(2,3)
val weight = NDArray.ones(2,3)
val state = momOpt.createState(index, weight)
opt.update(index, weight, grad, state)
state.asInstanceOf[NDArray].toArray

[36mmomOpt[0m: [32mSGD[0m = ml.dmlc.mxnet.optimizer.SGD@143d6181
[36mindex[0m: [32mInt[0m = [32m0[0m
[36mgrad[0m: [32mNDArray[0m = ml.dmlc.mxnet.NDArray@e409d01c
[36mweight[0m: [32mNDArray[0m = ml.dmlc.mxnet.NDArray@ac5011ac
[36mstate[0m: [32mAnyRef[0m = ml.dmlc.mxnet.NDArray@fbcb7abb
[36mres3_6[0m: [32mArray[0m[[32mFloat[0m] = [33mArray[0m([32m-0.10001F[0m, [32m-0.10001F[0m, [32m-0.10001F[0m, [32m-0.10001F[0m, [32m-0.10001F[0m, [32m-0.10001F[0m)

## Types of Optimizers supported
- [AdaDelta](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.optimizer.AdaDelta)
- [AdaGrad](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.optimizer.AdaGrad)
- [Adam](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.optimizer.Adam)
- [SGD](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.optimizer.SGD)
- [SGLD](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.optimizer.SGLD)
- [DCASGD](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.optimizer.DCASGD)
- [NAG](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.optimizer.NAG)
- [RMSProp](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.optimizer.RMSProp)

You can set these optimizers while building a FeedForward network in `.setOptimizer(new SGD(...))` method 

## Further Reading
[Optimizer](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.optimizer.AdaGrad)