# Optimizer
In gradient-base optimization algorithms, we update the parameters (or weights) using the gradients in each iteration. We call this updating function as Optimizer.

The main method of an optimizer is update(weight, grad), which updates a NDArray weight using a NDArray gradient. But given that a multi-layer neural network often has more than one weights, we assign each weight a unique integer index. Furthermore, an optimizer may need space to store auxiliary state, such as momentum, we also allow a user-defined state for updating. In summary, an optimizer has two major methods

- create_state(index, weight): create auxiliary state for the index-th weight.
- update(index, weight, grad, state): update the index-th weight given the gradient and auxiliary state. The state can be also updated.


## Jupyter Scala kernel
Add mxnet scala jar which is created as a part of MXNet Scala package installation in classpath as follows:

**Note**: Process to add this jar in your scala kernel classpath can differ according to the scala kernel you are using.

We have used [jupyter-scala kernel](https://github.com/alexarchambault/jupyter-scala) for creating this notebook.

```
classpath.addPath(<path_to_jar>)

e.g
classpath.addPath("mxnet-full_2.11-osx-x86_64-cpu-0.1.2-SNAPSHOT.jar")
```

## Basic Usage
### Create and Update
MXNet has already implemented several popular optimizers in python/mxnet/optimizer.py. An convenient way to create one is by using optimizer.create(name, args...). The following codes create a standard SGD updater which does

```scala
weight = weight - learning_rate * grad
```

In [2]:
import ml.dmlc.mxnet._
import ml.dmlc.mxnet.optimizer.SGD
//val opt = new Optimizer("sgd", learningRate=0.1f)
val opt = new SGD(learningRate=0.1f)

[32mimport [36mml.dmlc.mxnet._[0m
[32mimport [36mml.dmlc.mxnet.optimizer.SGD[0m
[36mopt[0m: [32mml[0m.[32mdmlc[0m.[32mmxnet[0m.[32moptimizer[0m.[32mSGD[0m = ml.dmlc.mxnet.optimizer.SGD@18d0acf8

Then we can use the update function.


In [10]:
val grad = NDArray.ones(2,3)
val weight = NDArray.ones(2,3)
val index = 0
opt.update(index, weight, grad, NDArray.empty(2,3))
weight.toArray

[36mgrad[0m: [32mNDArray[0m = ml.dmlc.mxnet.NDArray@96420bdf
[36mweight[0m: [32mNDArray[0m = ml.dmlc.mxnet.NDArray@f4b4d3bf
[36mindex[0m: [32mInt[0m = [32m0[0m
[36mres9_4[0m: [32mArray[0m[[32mFloat[0m] = [33mArray[0m([32m0.89999F[0m, [32m0.89999F[0m, [32m0.89999F[0m, [32m0.89999F[0m, [32m0.89999F[0m, [32m0.89999F[0m)

When momentum is non-zero, the sgd optimizer needs extra state.


In [12]:
val momOpt = new SGD(learningRate = 0.1f, momentum = 0.01f)
val index = 0
val grad = NDArray.ones(2,3)
val weight = NDArray.ones(2,3)
val state = momOpt.createState(index, weight)
opt.update(index, weight, grad, state)
state

[36mmomOpt[0m: [32mSGD[0m = ml.dmlc.mxnet.optimizer.SGD@17af5997
[36mindex[0m: [32mInt[0m = [32m0[0m
[36mgrad[0m: [32mNDArray[0m = ml.dmlc.mxnet.NDArray@f7a94dad
[36mweight[0m: [32mNDArray[0m = ml.dmlc.mxnet.NDArray@c02afa4c
[36mstate[0m: [32mAnyRef[0m = ml.dmlc.mxnet.NDArray@e31b5efa
[36mres11_6[0m: [32mAnyRef[0m = ml.dmlc.mxnet.NDArray@bf9df5e6

## Further Reading
[Optimizer API](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.Optimizer)