# MXNet Basics - Linear Regression using MXNet

## Jupyter Scala kernel
Add mxnet scala jar which is created as a part of MXNet Scala package installation in classpath as follows:

**Note**: Process to add this jar in your scala kernel classpath can differ according to the scala kernel you are using.

We have used [jupyter-scala kernel](https://github.com/alexarchambault/jupyter-scala) for creating this notebook.

```
classpath.addPath(<path_to_jar>)

e.g
classpath.addPath("mxnet-full_2.11-osx-x86_64-cpu-0.1.2-SNAPSHOT.jar")
```

Import necessary packages as follows:

In [2]:
import ml.dmlc.mxnet._
import ml.dmlc.mxnet.io.{NDArrayIter}
import ml.dmlc.mxnet.module.{FitParams, Module}
import ml.dmlc.mxnet.optimizer.SGD
import ml.dmlc.mxnet.Callback.Speedometer

[32mimport [36mml.dmlc.mxnet._[0m
[32mimport [36mml.dmlc.mxnet.io.{NDArrayIter}[0m
[32mimport [36mml.dmlc.mxnet.module.{FitParams, Module}[0m
[32mimport [36mml.dmlc.mxnet.optimizer.SGD[0m
[32mimport [36mml.dmlc.mxnet.Callback.Speedometer[0m

## Prepare Data

MXNet uses data in the form of **Data Iterators**. The code below illustrates how to encode a dataset into an iterator that MXNet can use. The data used in the example is made up of 2d data points with corresponding integer labels. The function we are trying to learn is:

 y = x<sub>1</sub>  +  2x<sub>2</sub> ,
 
 where (x<sub>1</sub>,x<sub>2</sub>) is one training data point and y is the corresponding label. 

e.g. First label 5 is generated as follows:

5 = 1 + 2*2 (where x1 = 1, x2=2)

In [3]:
//Training data
val trainData = IndexedSeq(NDArray.array(Array(1, 2, 3, 4, 5, 6, 3, 2, 7, 1, 6, 9), shape = Shape(6, 1, 2)))
val trainLabel = IndexedSeq(NDArray.array(Array(5, 11, 17, 7, 9, 24), shape = Shape(6)))
val batchSize = 1

//Evaluation Data
val evalData = IndexedSeq(NDArray.array(Array(7, 2, 6, 10, 12, 2), shape = Shape(3, 1, 2)))
val evalLabel = IndexedSeq(NDArray.array(Array(11, 26, 16), shape = Shape(3)))

log4j:WARN No appenders could be found for logger (MXNetJVM).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.


[36mtrainData[0m: [32mIndexedSeq[0m[[32mNDArray[0m] = [33mVector[0m(ml.dmlc.mxnet.NDArray@57e47ffa)
[36mtrainLabel[0m: [32mIndexedSeq[0m[[32mNDArray[0m] = [33mVector[0m(ml.dmlc.mxnet.NDArray@a040dbc2)
[36mbatchSize[0m: [32mInt[0m = [32m1[0m
[36mevalData[0m: [32mIndexedSeq[0m[[32mNDArray[0m] = [33mVector[0m(ml.dmlc.mxnet.NDArray@ed4b006d)
[36mevalLabel[0m: [32mIndexedSeq[0m[[32mNDArray[0m] = [33mVector[0m(ml.dmlc.mxnet.NDArray@f8bc2cd5)

Once we have the data ready, we need to put it into an iterator and specify parameters such as the 'batch_size', and 'shuffle' which will determine the size of data the iterator feeds during each pass, and whether or not the data will be shuffled respectively.

In [4]:
val trainIter = new NDArrayIter(trainData, trainLabel, batchSize, false, "pad")
val evalIter = new NDArrayIter(evalData, evalLabel, batchSize, false, "pad")

[36mtrainIter[0m: [32mNDArrayIter[0m = non-empty iterator
[36mevalIter[0m: [32mNDArrayIter[0m = non-empty iterator

In the above example, we have made use of NDArrayIter, which is used to iterate over numpy arrays. In general, there are many different types of iterators in MXNet based on the type of data you will be using. Their complete documentation can be found at [Scala API](http://mxnet.io/api/scala/docs/index.html#package)

## MXNet Classes

1. [Model Class](http://mxnet.io/api/scala/model.html): The model class in MXNet is used to define the overall entity of the model. It contains the variable we want to minimize, the training data and labels, and some additional parameters such as the learning rate and optimization algorithm are defined at the model level.

2. [Module Class](http://mxnet.io/api/scala/module.html): The module class provides an intermediate and high-level interface for performing computation with neural networks in MXNet.

3. [Symbols](http://mxnet.io/api/scala/symbol.html): The actual MXNet network is defined using symbols. MXNet has different types of symbols, including data placeholders, neural network layers, and loss function symbols based on our requirement.

4. [IO](http://mxnet.io/api/scala/io.html): The IO class as we already saw works on the data, and carries out operations like breaking the data into batches and shuffling it.

## Defining the Model

MXNet uses **Symbols** for defining a model. [Symbols](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.Symbol) are the building blocks of the model and compose various components of the model. Some of the parts symbols are used to define are:
1. Variables: A variable is a placeholder for future data. This symbol is used to define a spot which will be filled with training data/labels in the future when we are trying to train the model.
2. Neural Network Layers: The layers of a network or any other type of model are also defined by Symbols. Such a *symbol* takes one of the previous symbols as its input, does some transformation on them, and creates an output. One such example is the "Fully Connected" symbol which specifies a fully connected layer of a network. 
3. Output Symbols: Output symbols are MXNet's way of defining a loss. They are suffixed with the work "Output" (eg. the SoftmaxOutput layer" . You can also create your [own loss](https://github.com/dmlc/mxnet/blob/5b6a0eeee174f28ff0272d17748513ecd52a9ebe/docs/tutorials/r/CustomLossFunction.md#how-to-use-your-own-loss-function). Some examples of existing losses are: LinearRegressionOutput, which computes the l2-loss between it's input symbol and the actual labels provided to it, SoftmaxOutput, which computs the categorical cross-entropy. 

The ones described above, and other symbols are chained one after the other, servng as input to one another to create the network topology. More information about the different types of symbols can be found [here](http://mxnet.io/api/scala/symbol.html)
    
    
   

In [5]:
val data = Symbol.Variable("data")
val label = Symbol.Variable("label")
val fc1  = Symbol.FullyConnected("fc1")()(Map("data" -> data, "num_hidden" -> 1))
val softmax = Symbol.LinearRegressionOutput()()(Map("data" -> fc1, "label" -> label))
softmax.listArguments()

[36mdata[0m: [32mSymbol[0m = ml.dmlc.mxnet.Symbol@78525c12
[36mlabel[0m: [32mSymbol[0m = ml.dmlc.mxnet.Symbol@4e85112e
[36mfc1[0m: [32mSymbol[0m = ml.dmlc.mxnet.Symbol@3c61948a
[36msoftmax[0m: [32mSymbol[0m = ml.dmlc.mxnet.Symbol@5ecd7ee4
[36mres4_4[0m: [32mIndexedSeq[0m[[32mString[0m] = [33mArrayBuffer[0m([32m"data"[0m, [32m"fc1_weight"[0m, [32m"fc1_bias"[0m, [32m"label"[0m)

The above network uses the following layers:

1. FullyConnected: The fully connected symbol represents a fully connected layer of a neural network (without any activation being applied), which in essence, is just a linear regression on the input attributes. It takes the following parameters:
            a. data: Input to the layer (specify the symbol whose output should be fed here)
            b. num_hidden: Number of hidden dimension which specifies the size of the output of the layer
    
    
2. Linear Regression Output: Output layers in MXNet aim at implementing a loss. In our example, the Linear Regression Output layer is used which specifies that an l2 loss needs to be applied against it's input and the actual labels provided to this layer. The parameters to this layer are:
            a. data: Input to this layer (specify the symbol whose output should be fed here)
            b. Label: The training label against whom we will compare the input to the layer for calculation of l2 loss

**Note - *Naming Convention*: the label variable's name should be the same as the label_name parameter passed to your training data iterator. The default value of this is "softmax_label", but we have updated it to label in this tutorial as you can see in val label = Symbol.Variable("label")**

Finally, the network is stored into a *Model*, where you define the symbol who's value is to be minimised (in our case, softmax"), the learning rate to be used while optimization and the number of epochs we want to train our model on.

We can plot the network we have created in order to visualize it and save it by specifying "path" in `dot.render()`

In [7]:
val dot = Visualization.plotNetwork(symbol=softmax, nodeAttrs = Map("shape" -> "oval", "fixedsize" -> "false") )
dot.render(engine = "dot", fileName = "linearRegression", path = ".")

[36mdot[0m: [32mVisualization[0m.[32mDot[0m = ml.dmlc.mxnet.Visualization$Dot@72a23b8

## Training the model

Once we have defined the model structure, the next step is to train the parameters of the model to fit the training data. This is done by using the **fit()** function of the **Module** class.

In [8]:
val mod = new Module(softmax, labelNames = IndexedSeq("label"))

mod.fit(trainData = trainIter, evalData = scala.Option(evalIter), numEpoch = 1000, fitParams = new FitParams()
    .setOptimizer(new SGD(learningRate = 0.01f, momentum = 0.9f, wd = 0.0001f)))

[36mmod[0m: [32mModule[0m = ml.dmlc.mxnet.module.Module@76f4d37f

Alternatively, you can also use [FeedForward network](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.FeedForward) and use [Model API](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.Model) of MXNet to build the model instead of Module. This can be done as follows:

```scala
    val model = new FeedForward(symbol = softmax, ctx = Context.cpu(0), numEpoch = 1000, optimizer = new SGD(learningRate = 0.01f, momentum = 0.9f, wd = 0.0001f))
```

## Using a trained model: (Testing and Inference) 

Once we have a trained model, we can do multiple things on it. We can use it for inference, we can evaluate the trained model on test data. This is shown below.

In [9]:
val probArrays  = mod.predict(evalIter)

val prob1 = probArrays(0).toArray
val prob2 = probArrays(1).toArray
val prob3 = probArrays(2).toArray

val (name, value) = mod.score(evalIter, new MSE()).get


[36mprobArrays[0m: [32mIndexedSeq[0m[[32mNDArray[0m] = [33mArrayBuffer[0m(
  ml.dmlc.mxnet.NDArray@85887f13,
  ml.dmlc.mxnet.NDArray@c68f7cef,
  ml.dmlc.mxnet.NDArray@d30a2eee
)
[36mprob1[0m: [32mArray[0m[[32mFloat[0m] = [33mArray[0m([32m11.000008F[0m)
[36mprob2[0m: [32mArray[0m[[32mFloat[0m] = [33mArray[0m([32m25.999908F[0m)
[36mprob3[0m: [32mArray[0m[[32mFloat[0m] = [33mArray[0m([32m15.999969F[0m)
[36mname[0m: [32mString[0m = [32m"mse"[0m
[36mvalue[0m: [32mFloat[0m = [32m3.1435168E-9F[0m

We can also evaluate our model for some metric. In this example, we are evaulating our model's mean squared error on the evaluation data.

Let us try to add some noise to the evaluation data and see how the MSE changes


In [10]:
//Evaluation Data
val evalData = IndexedSeq(NDArray.array(Array(7, 2, 6, 10, 12, 2), shape = Shape(3, 1, 2)))
val evalLabel = IndexedSeq(NDArray.array(Array(11.1f, 26.1f, 16.1f), shape = Shape(3))) //#Adding 0.1 to each of the values 

val evalIter = new NDArrayIter(evalData, evalLabel, batchSize, false, "pad")

[36mevalData[0m: [32mIndexedSeq[0m[[32mNDArray[0m] = [33mVector[0m(ml.dmlc.mxnet.NDArray@d5cb35ca)
[36mevalLabel[0m: [32mIndexedSeq[0m[[32mNDArray[0m] = [33mVector[0m(ml.dmlc.mxnet.NDArray@1319931b)
[36mevalIter[0m: [32mNDArrayIter[0m = non-empty iterator

In [11]:
val (name, value) = mod.score(evalIter, new MSE()).get

[36mname[0m: [32mString[0m = [32m"mse"[0m
[36mvalue[0m: [32mFloat[0m = [32m0.010007773F[0m

Finally, you can create your own metrics and use it to evaluate your model. More information on metrics [here](http://mxnet-test.readthedocs.io/en/latest/api/metric.html).