### Note

Please view the [README](https://github.com/deeplearning4j/deeplearning4j/tree/master/dl4j-examples/tutorials/README.md) to learn about installing, setting up dependencies, and importing notebooks in Zeppelin

### Background

#### What are hyperparameters?
All those paramters which are external to our model and can't be estimated through data. In other words, external tunable parameter on which our network depends. They could be things like batch size, type of optimizer, weights initialization algorithm, regularization factor and learning rate. To get the maximum out of our networks we need to tune the hyperparameters based on the problem at hand. They make a great effect on the network. For example, if you haven't initialized the weights distribution correctly, the network may take forever to train and in some cases it won't even converge to a good solution and fails to give us good predictions.

--- 

#### Goals
- Some good tips for tuning hyperparameters
- How to tune our network's hyperparameters using DL4J's library

## 1. Tips for tuning hyperparameters

### Things to consider
There is no rule of thumb for tuning hyperparameters but the following tips can help a lot:

- Make sure the samples are randomized before feeding into the network because stochastic gradient descent (SGD) depends on it
- 

---

### Possible solutions to common issues
|---|---|
|Slow training?|Increase the learning rate|
|Fluctuating loss?|Reduce the learning rate|
|Network not converging?|See if the weights are initialized correctly|

## 2. Tuning hyperparameters in DL4J
Let's see how all of this works in DL4J

When configuring, most of the details before the list() call are our hyperparameters. For example:
```
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
            .seed(123)
            .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
            .iterations(1)
            .learningRate(0.006)
            .updater(Updater.NESTEROVS).momentum(0.9)
            .regularization(true).l2(1e-4)
            .list() //All of the configurations before this line are hyperparameters
```

---

### Imports

In [4]:
import org.deeplearning4j.datasets.iterator.impl.MnistDataSetIterator
import org.deeplearning4j.nn.api.{Model, OptimizationAlgorithm}
import org.deeplearning4j.nn.conf.layers.{DenseLayer, OutputLayer}
import org.deeplearning4j.nn.conf.{MultiLayerConfiguration, NeuralNetConfiguration, Updater}
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork
import org.deeplearning4j.nn.weights.WeightInit
import org.deeplearning4j.optimize.api.IterationListener
import org.nd4j.linalg.activations.Activation
import org.nd4j.linalg.learning.config.Nesterovs
import org.nd4j.linalg.lossfunctions.LossFunctions.LossFunction

### Setting up the network inputs
We'll use the __MNIST__ dataset for this tutorial.

In [6]:
val mnistTrain = new MnistDataSetIterator(32, true, 123)
val mnistTest = new MnistDataSetIterator(32, false, 123)

### Creating a simple feed-forward network
We'll define a function for configuring a simple feed-forward network and see how it performs with different hyperparameters.

In [8]:
// The function parameters are hyperparameters that we are going to play with.
// We won't deal with batch size much because that mostly depends on how much gpu or cpu memory you have

def buildNetwork(
                  learningRate: Double,
                  regularization: Double,
                  hiddenLayerNodes: Integer,
                  weightInit: WeightInit,
                  activation: Activation,
                  updater: Updater,
                  lossFunction: LossFunction,
                  optimizationAlgorithm: OptimizationAlgorithm
                ): MultiLayerConfiguration = {
    new NeuralNetConfiguration.Builder()
      .seed(123)
      .optimizationAlgo(optimizationAlgorithm)
      .iterations(1)
      .learningRate(learningRate)
      .updater(updater)
      .activation(activation)
      .weightInit(weightInit)
      .regularization(true).l2(regularization)
      .list()
      .layer(0, new DenseLayer.Builder()
        .nIn(784)
        .nOut(hiddenLayerNodes)
        .build())
      .layer(1, new OutputLayer.Builder(lossFunction)
        .nIn(hiddenLayerNodes)
        .nOut(10)
        .activation(Activation.SOFTMAX)
        .build())
      .pretrain(false).backprop(true)
      .build()
}
 
// This function will take the model configuration and tell us the general idea of the network's performance   
def trainModelConfiguration(configuration: MultiLayerConfiguration): Unit = {
    println("---------------------------------------")
    val model = new MultiLayerNetwork(configuration)
    model.init()
    //print the score on the notebook every 100 iteration.
    model.setListeners(new IterationListener {
        override def invoke(): Unit = ???

        override def iterationDone(model: Model, iteration: Int): Unit = {
          if(iteration % 100 == 0) {
            println("Score at iteration " + iteration + " is " + model.score())
          }
        }

        override def invoked(): Nothing = ???
      })

    model.fit(mnistTrain)
    val evaluation = model.evaluate(mnistTest)

    // print the basic statistics about the trained classifier
    println("Accuracy: "+evaluation.accuracy())
    println("Precision: "+evaluation.precision())
    println("Recall: "+evaluation.recall())
    println("---------------------------------------")
}

### Creating configurations with different hyperparameters
Now we'll make a bunch of configurations with different hyperparameters to see how they perform

In [10]:
val lowLearningRate = buildNetwork(1e-10,  // Learning Rate (Too low here)
        1e-4, // Regularization
        1000, // Hidden Nodes
        WeightInit.XAVIER, // Weights initialization type
        Activation.RELU, // Activations
        Updater.ADAGRAD, // Updater
        LossFunction.NEGATIVELOGLIKELIHOOD, // Loss Function
        OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) // Optimization Algorithm
        
val highLearningRate = buildNetwork(1e10,  // Learning Rate (Too High)
        1e-4, // Regularization
        1000, // Hidden Nodes
        WeightInit.XAVIER, // Weights initialization type
        Activation.RELU, // Activations
        Updater.ADAGRAD, // Updater
        LossFunction.NEGATIVELOGLIKELIHOOD, // Loss Function
        OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) // Optimization Algorithm

val zeroWeightsInit = buildNetwork(0.0006,  // Learning Rate
        1e-4, // Regularization
        1000, // Hidden Nodes
        WeightInit.ZERO, // Weights initialization type (All weights zero)
        Activation.RELU, // Activations
        Updater.NONE, // Updater (Nesterovs updater)
        LossFunction.NEGATIVELOGLIKELIHOOD, // Loss Function
        OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) // Optimization Algorithm
        
val betterUpdater = buildNetwork(0.0006,  // Learning Rate
        1e-4, // Regularization
        1000, // Hidden Nodes
        WeightInit.XAVIER, // Weights initialization type
        Activation.RELU, // Activations
        Updater.ADAM, // Updater (Nesterovs updater)
        LossFunction.NEGATIVELOGLIKELIHOOD, // Loss Function
        OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) // Optimization Algorithm

In [11]:
trainModelConfiguration(lowLearningRate)
trainModelConfiguration(highLearningRate)
trainModelConfiguration(zeroWeightsInit)
trainModelConfiguration(betterUpdater)

### What happened here?

- When the learning rate was too low, the network couldn't learn fast enough. The score was decreasing but at an extremely low rate.
- Similarly when the learning rate was too high, the network was able to get itself to a lowered score but couldn't go lower than that. At this stage, it's better to make a checkpoint of the network and train it more with lowered learning rate.
- When the weights were initialized to zero, the network didn't have a better point to start with the training so it was looking for an optimal point for lowering the score in later iterations.
- These settings worked really well with out network configuration which gave better accuracy along with faster training.




### What's next?

- Check out all of our tutorials available [on Github](https://github.com/deeplearning4j/deeplearning4j/tree/master/dl4j-examples/tutorials). Notebooks are numbered for easy following.