## Background

During large-scale date training, the order of magnitude of data can reach millions. If a parameter is acquired via the computation of the whole training set, the update speed will be too slow. To solve this problem, a common used method is [Mini-Batch Gradient Descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) which computes mini-batche data in the training set, resulting faster training of parameters in a neural network.

In this article, we will first define a softmax classifier, then use the training set of [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) to train this neural network, and finally use the test set to verify the accuracy of the neural network. The difference is that we will use Mini-Batch Gradient Descent, thus the accuracy of the neural network can reach 40%.

## Import dependencies & build your own neural network.

Like the [previous course](http://deeplearning.thoughtworks.school/demo/2.0.0-Preview/SoftmaxLinearClassifier.html), we need to introduce each class of DeepLearning.scala.

In [1]:
import $plugin.$ivy.`org.scalamacros:paradise_2.11.11:2.1.0`
import $ivy.`com.thoughtworks.deeplearning::jupyter-differentiable:2.0.0-M1`
import $ivy.`org.nd4j:nd4j-native-platform:0.7.2`
import $ivy.`org.rauschig:jarchivelib:0.5.0`
import $ivy.`org.plotly-scala::plotly-jupyter-scala:0.3.2`
import $url.{`https://raw.githubusercontent.com/ThoughtWorksInc/DeepLearning.scala-website/master/ipynbs/ReadCIFAR10ToNDArray.sc` => ReadCIFAR10ToNDArray}
import $url.{`https://raw.githubusercontent.com/ThoughtWorksInc/DeepLearning.scala-website/master/ipynbs/Utils.sc` => Utils}


import java.io.{FileInputStream, InputStream}

import com.thoughtworks.deeplearning.math._
import com.thoughtworks.deeplearning.jupyter.differentiable.Any._
import com.thoughtworks.deeplearning.jupyter.differentiable.INDArray.{
  Optimizer => INDArrayOptimizer
}
import INDArrayOptimizer.LearningRate
import com.thoughtworks.deeplearning.jupyter.differentiable.INDArray.implicits._
import com.thoughtworks.each.Monadic._
import com.thoughtworks.raii.asynchronous.Do
import com.thoughtworks.deeplearning.jupyter.differentiable.Double._
import com.thoughtworks.deeplearning.jupyter.differentiable.Double.implicits._
import com.thoughtworks.deeplearning.Tape
import com.thoughtworks.deeplearning.jupyter.differentiable
import org.nd4j.linalg.api.ndarray.INDArray
import org.nd4j.linalg.factory.Nd4j
import org.nd4s.Implicits._
import scala.concurrent.ExecutionContext.Implicits.global
import scalaz.concurrent.Task
import scalaz.{-\/, \/, \/-}
import scalaz.std.vector._
import shapeless._
import plotly._
import plotly.element._
import plotly.layout._
import plotly.JupyterScala._

import scala.collection.immutable.IndexedSeq

pprintConfig() = pprintConfig().copy(height = 2)

implicit def optimizer: INDArrayOptimizer = new LearningRate {
  def currentLearningRate() = 0.00001
}

def softmax(scores: differentiable.INDArray): differentiable.INDArray = {
  val expScores = exp(scores)
  expScores / sum(expScores, 1)
}

//10 label of CIFAR10 images(airplane,automobile,bird,cat,deer,dog,frog,horse,ship,truck)
val NumberOfClasses: Int = 10
val NumberOfPixels: Int = 3072

val weight: differentiable.INDArray =
  (Nd4j.randn(NumberOfPixels, NumberOfClasses) * 0.001).toWeight

def myNeuralNetwork(input: INDArray): differentiable.INDArray = {
  softmax(dot(input, weight))
}

def lossFunction(input: INDArray,
                 expectOutput: INDArray): differentiable.Double = {
  val probabilities = myNeuralNetwork(input)
  -mean(log(probabilities) * expectOutput)
}
             
plotly.JupyterScala.init()
def polyLoss(lossSeq: IndexedSeq[Double]): Unit = {
  plotly.JupyterScala.init()

  val plot = Seq(
    Scatter(lossSeq.indices, lossSeq)
  )

  plot.plot(
    title = "loss by time"
  )
}

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.


[32mimport [39m[36m$plugin.$                                            
[39m
[32mimport [39m[36m$ivy.$                                                               
[39m
[32mimport [39m[36m$ivy.$                                    
[39m
[32mimport [39m[36m$ivy.$                               
[39m
[32mimport [39m[36m$ivy.$                                             
[39m
[32mimport [39m[36m$url.$                                                                                                                                             
[39m
[32mimport [39m[36m$url.$                                                                                                               


[39m
[32mimport [39m[36mjava.io.{FileInputStream, InputStream}

[39m
[32mimport [39m[36mcom.thoughtworks.deeplearning.math._
[39m
[32mimport [39m[36mcom.thoughtworks.deeplearning.jupyter.differentiable.Any._
[39m
[32mimport [39m[36mcom.thoughtworks.deeplearning.jupyter

### Disrupt the order of a sequence once for each [epoch](http://stackoverflow.com/questions/4752626/epoch-vs-iteration-when-training-neural-networks), and generate the random arrays.

In [2]:
@monadic[Task]
val trainTask: Task[Unit] = {

  val random = new scala.util.Random

  val MiniBatchSize = 256

  val lossSeq =
    (
      for (iteration <- (0 to 50).toVector) yield {
        val randomIndex = random
          .shuffle[Int, IndexedSeq](0 until 10000) //https://issues.scala-lang.org/browse/SI-6948
          .toArray
        for (times <- (0 until 10000 / MiniBatchSize).toVector) yield {
          val randomIndexArray =
            randomIndex.slice(times * MiniBatchSize,
                              (times + 1) * MiniBatchSize)
          val trainNDArray :: expectLabel :: shapeless.HNil =
            ReadCIFAR10ToNDArray.getSGDTrainNDArray(randomIndexArray)
          val input =
            trainNDArray.reshape(MiniBatchSize, 3072)

          val expectLabelVectorized =
            Utils.makeVectorized(expectLabel, NumberOfClasses)
          val loss = train(lossFunction(input, expectLabelVectorized)).each
          if(times == 3 & iteration % 5 == 4){
            println("at epoch " + (iteration / 5 + 1) + " loss is :" + loss)
          }
          loss
        }
      }
    ).flatten

  polyLoss(lossSeq)
}

[36mtrainTask[39m: [32mTask[39m[[32mUnit[39m] = scalaz.concurrent.Task@396c2d6b

## Prepare and process the test set

Like [the previous article](http://deeplearning.thoughtworks.school/demo/2.0.0-Preview/SoftmaxLinearClassifier.html), we read the images and corresponding label information for test data from CIFAR10 database and process them. However, here we only read the test set, and the training set is randomly read during training.

In [3]:
val testNDArray =
   ReadCIFAR10ToNDArray.readFromResource("/cifar-10-batches-bin/test_batch.bin", 100)

val testData = testNDArray.head

val testExpectResult = testNDArray.tail.head

val vectorizedTestExpectResult = Utils.makeVectorized(testExpectResult, NumberOfClasses)

[36mtestNDArray[39m: [32mINDArray[39m [32m::[39m [32mINDArray[39m [32m::[39m [32mHNil[39m = [[0.62, 0.62, 0.64, 0.65, 0.62, 0.61, 0.63, 0.62, 0.62, 0.62, 0.63, 0.62, 0.63, 0.65, 0.66, 0.66, 0.65, 0.63, 0.62, 0.62, 0.61, 0.58, 0.59, 0.58, 0.58, 0.56, 0.[33m...[39m
[36mtestData[39m: [32mINDArray[39m = [[0.62, 0.62, 0.64, 0.65, 0.62, 0.61, 0.63, 0.62, 0.62, 0.62, 0.63, 0.62, 0.63, 0.65, 0.66, 0.66, 0.65, 0.63, 0.62, 0.62, 0.61, 0.58, 0.59, 0.58, 0.58, 0.56, 0.[33m...[39m
[36mtestExpectResult[39m: [32mINDArray[39m = [3.00, 8.00, 8.00, 0.00, 6.00, 6.00, 1.00, 6.00, 3.00, 1.00, 0.00, 9.00, 5.00, 7.00, 9.00, 8.00, 5.00, 7.00, 8.00, 6.00, 7.00, 0.00, 4.00, 9.00, 5.00, 2.00, 4.0[33m...[39m
[36mvectorizedTestExpectResult[39m: [32mINDArray[39m = [[0.00, 0.00, 0.00, 1.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
 [0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 1.00, 0.00],
[33m...[39m

## Train & Predict your Neural Network

In [4]:
val predictResult = throwableMonadic[Task] {
  trainTask.each
  predict(myNeuralNetwork(testData)).each
}

[36mpredictResult[39m: [32mTask[39m[[32mTape[39m.[32m<refinement>[39m.this.type.[32mData[39m] = scalaz.concurrent.Task@4c5663bd

## Verify the accuracy

Just like the last article, we use the test data to verify the prediction result of the neural network and compute the accuracy. This time, the accuracy may increase to about 41%.

In [5]:
predictResult.unsafePerformSyncAttempt match {
  case -\/(e) => {
    throw e
  }
  case \/-(result) =>
    println("The accuracy is " + Utils.getAccuracy(result,testExpectResult) + "%")
}

at epoch 1 loss is :0.2140218734741211
at epoch 2 loss is :0.19828615188598633
at epoch 3 loss is :0.1983615279197693
at epoch 4 loss is :0.19120612144470214
at epoch 5 loss is :0.19014278650283814
at epoch 6 loss is :0.19041664600372316
at epoch 7 loss is :0.17840851545333863
at epoch 8 loss is :0.18262848854064942
at epoch 9 loss is :0.18263672590255736
at epoch 10 loss is :0.189388644695282


The accuracy is 37.0%


## Summary

We have learned the follows in this article:

* Mini-Batch Gradient Descent
* epoch

[Source code](https://github.com/izhangzhihao/deeplearning-tutorial/blob/2.0.x/src/main/scala/com/github/izhangzhihao/MiniBatchGradientDescent.scala)