## 背景

在大规模数据训练时，数据可以达到百万级量级。如果计算整个训练集，来获得仅仅一个参数的更新速度就太慢了。一个常用的方法是计算训练集中的小批量min-batche）数据随机梯度下降快速实现神经网络参数更新。这节我们将通过使用[Mini-Batch Gradient Descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) 来实现小批量数据随机梯度下降快速更新网络参数，这样神经网络的准确率可以达到40%。

参考：

[Mini-Batch Gradient Descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent): 在大规模数据训练时，数据可以达到百万级量级。如果计算整个训练集，来获得仅仅一个参数的更新速度就太慢了。一个常用的方法是计算训练集中的小批量（batches）数据以提升参数更新速度。

## 引入依赖

In [1]:
import $plugin.$ivy.`com.thoughtworks.implicit-dependent-type::implicit-dependent-type:2.0.0`

import $ivy.`com.thoughtworks.deeplearning::differentiableany:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiablenothing:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiableseq:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiabledouble:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiablefloat:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiablehlist:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiablecoproduct:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiableindarray:1.0.0-RC7`
import $ivy.`org.rauschig:jarchivelib:0.5.0`

import $ivy.`org.plotly-scala::plotly-jupyter-scala:0.3.0`

import java.io.{FileInputStream, InputStream}


import com.thoughtworks.deeplearning
import org.nd4j.linalg.api.ndarray.INDArray
import com.thoughtworks.deeplearning.DifferentiableHList._
import com.thoughtworks.deeplearning.DifferentiableDouble._
import com.thoughtworks.deeplearning.DifferentiableINDArray._
import com.thoughtworks.deeplearning.DifferentiableAny._
import com.thoughtworks.deeplearning.DifferentiableINDArray.Optimizers._
import com.thoughtworks.deeplearning.Layer.Batch
import com.thoughtworks.deeplearning.Symbolic.Layers.Identity
import com.thoughtworks.deeplearning.Symbolic._
import com.thoughtworks.deeplearning.{
  DifferentiableHList,
  DifferentiableINDArray,
  Layer,
  Symbolic
}
import com.thoughtworks.deeplearning.Poly.MathFunctions._
import com.thoughtworks.deeplearning.Poly.MathMethods./
import com.thoughtworks.deeplearning.Poly.MathOps
import org.nd4j.linalg.api.ndarray.INDArray
import org.nd4j.linalg.cpu.nativecpu.NDArray
import org.nd4j.linalg.factory.Nd4j
import org.nd4j.linalg.indexing.{INDArrayIndex, NDArrayIndex}
import org.nd4j.linalg.ops.transforms.Transforms
import org.nd4s.Implicits._
import shapeless._

import plotly._
import plotly.element._
import plotly.layout._
import plotly.JupyterScala._

import scala.collection.immutable.IndexedSeq
import scala.util.Random

pprintConfig() = pprintConfig().copy(height = 2)//减少输出的行数，避免页面输出太长

import $file.ReadCIFAR10ToNDArray
import $file.Utils

[32mimport [39m[36m$plugin.$                                                                             

[39m
[32mimport [39m[36m$ivy.$                                                           
[39m
[32mimport [39m[36m$ivy.$                                                               
[39m
[32mimport [39m[36m$ivy.$                                                           
[39m
[32mimport [39m[36m$ivy.$                                                              
[39m
[32mimport [39m[36m$ivy.$                                                             
[39m
[32mimport [39m[36m$ivy.$                                                             
[39m
[32mimport [39m[36m$ivy.$                                                                 
[39m
[32mimport [39m[36m$ivy.$                                                                
[39m
[32mimport [39m[36m$ivy.$                               

[39m
[32mimport [39m[36m$ivy.$               

## 准备和处理数据

类似[前一节](https://thoughtworksinc.github.io/DeepLearning.scala/demo/SoftmaxLinearClassifier.html)，我们从CIFAR10 database中读取和处理测试数据的图片和对应的标签信息。但是这次我们在这里只读取测试数据即可，训练数据会在训练时随机读取。

In [2]:
//CIFAR10中的图片共有10个分类(airplane,automobile,bird,cat,deer,dog,frog,horse,ship,truck)
val NumberOfClasses: Int = 10

//加载测试数据，我们读取100条作为测试数据
val testNDArray =
   ReadCIFAR10ToNDArray.readFromResource("/cifar-10-batches-bin/test_batch.bin", 100)

val testData = testNDArray.head

val testExpectResult = testNDArray.tail.head

val vectorizedTestExpectResult = Utils.makeVectorized(testExpectResult, NumberOfClasses)

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.


[36mNumberOfClasses[39m: [32mInt[39m = [32m10[39m
[36mtestNDArray[39m: [32mINDArray[39m [32m::[39m [32mINDArray[39m [32m::[39m [32mHNil[39m = [[0.62, 0.62, 0.64, 0.65, 0.62, 0.61, 0.63, 0.62, 0.62, 0.62, 0.63, 0.62, 0.63, 0.65, 0.66, 0.66, 0.65, 0.63, 0.62, 0.62, 0.61, 0.58, 0.59, 0.58, 0.58, 0.56, 0.[33m...[39m
[36mtestData[39m: [32mINDArray[39m = [[0.62, 0.62, 0.64, 0.65, 0.62, 0.61, 0.63, 0.62, 0.62, 0.62, 0.63, 0.62, 0.63, 0.65, 0.66, 0.66, 0.65, 0.63, 0.62, 0.62, 0.61, 0.58, 0.59, 0.58, 0.58, 0.56, 0.[33m...[39m
[36mtestExpectResult[39m: [32mINDArray[39m = [3.00, 8.00, 8.00, 0.00, 6.00, 6.00, 1.00, 6.00, 3.00, 1.00, 0.00, 9.00, 5.00, 7.00, 9.00, 8.00, 5.00, 7.00, 8.00, 6.00, 7.00, 0.00, 4.00, 9.00, 5.00, 2.00, 4.0[33m...[39m
[36mvectorizedTestExpectResult[39m: [32mINDArray[39m = [[0.00, 0.00, 0.00, 1.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
 [0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 1.00, 0.00],
[33m...[39m

## 构建神经网络

跟前一节相同，我们需要编写softmax函数，设置学习率和初始化Weight并编写LossFunction

In [3]:
def softmax(implicit scores: INDArray @Symbolic): INDArray @Symbolic = {
  val expScores = exp(scores)
  expScores / expScores.sum(1)
}

implicit def optimizer: Optimizer = new LearningRate {
  def currentLearningRate() = 0.00001
}

def createMyNeuralNetwork(implicit input: INDArray @Symbolic): INDArray @Symbolic = {
  val initialValueOfWeight = Nd4j.randn(3072, NumberOfClasses) * 0.001
  val weight: INDArray @Symbolic = initialValueOfWeight.toWeight
  val result: INDArray @Symbolic = input dot weight
  softmax.compose(result)
}
val myNeuralNetwork = createMyNeuralNetwork

def lossFunction(implicit pair: (INDArray :: INDArray :: HNil) @Symbolic): Double @Symbolic = {
  val input = pair.head
  val expectedOutput = pair.tail.head
  val probabilities = myNeuralNetwork.compose(input)

  -(expectedOutput * log(probabilities)).mean //此处和准备一节中的交叉熵损失对应
}

defined [32mfunction[39m [36msoftmax[39m
defined [32mfunction[39m [36moptimizer[39m
defined [32mfunction[39m [36mcreateMyNeuralNetwork[39m
[36mmyNeuralNetwork[39m: ([32mSymbolic[39m.[32mTo[39m[[32mINDArray[39m]{type OutputData = org.nd4j.linalg.api.ndarray.INDArray;type OutputDelta = org.nd4j.linalg.api.ndarray.INDArray;type InputData = org.nd4j.linalg.api.ndarray.INDArray;type InputDelta = org.nd4j.linalg.api.ndarray.INDArray})#[32m@[39m = Compose(MultiplyINDArray(Exp(Identity()),Reciprocal(Sum(Exp(Identity()),WrappedArray(1)))),Dot(Identity(),Weight([[-0.00, -0.00, -0.00, 0.00, -0.00, 0.00, 0.00,[33m...[39m
defined [32mfunction[39m [36mlossFunction[39m

## Mini-Batch Gradient Descent

类似前一节我们需要训练神经网络，但是跟上一节不同的是，这次我们的训练数据是随机读取的，上一节是反复训练同一批数据集。训练神经网络并观察每次训练loss的变化，loss的变化趋势是降低，但是不是每次都降低(前途是光明的，道路是曲折的)。

### 根据随机数组读取和处理数据

In [4]:
val MiniBatchSize = 256

def trainData(randomIndexArray: Array[Int]): Double = {
  val trainNDArray :: expectLabel :: shapeless.HNil =
    ReadCIFAR10ToNDArray.getSGDTrainNDArray(randomIndexArray)

  val input =
    trainNDArray.reshape(MiniBatchSize, 3072)

  val expectLabelVectorized =
    Utils.makeVectorized(expectLabel, NumberOfClasses)

  lossFunction.train(input :: expectLabelVectorized :: HNil)
}

[36mMiniBatchSize[39m: [32mInt[39m = [32m256[39m
defined [32mfunction[39m [36mtrainData[39m

### 每个[epoch](http://stackoverflow.com/questions/4752626/epoch-vs-iteration-when-training-neural-networks)打乱一次数组,根据随机数组训练神经网络

In [5]:
val random = new Random

val lossSeq =
  (
    for (iteration <- 0 to 50) yield {
      val randomIndex = random
        .shuffle[Int, IndexedSeq](0 until 10000) //https://issues.scala-lang.org/browse/SI-6948
        .toArray
      for (times <- 0 until 10000 / MiniBatchSize) yield {
        val randomIndexArray =
          randomIndex.slice(times * MiniBatchSize,
                            (times + 1) * MiniBatchSize)
          val loss = trainData(randomIndexArray)
          if(times == 3 & iteration % 5 == 4){
            println("at epoch " + (iteration / 5 + 1) + " loss is :" + loss)
          }
          loss
      }
    }
  ).flatten

val plot = Seq(
  Scatter(lossSeq.indices, lossSeq)
)

plot.plot(
  title = "loss by time"
)

at epoch 1 loss is :0.21111876964569093
at epoch 2 loss is :0.2084895133972168
at epoch 3 loss is :0.19478811025619508
at epoch 4 loss is :0.1909475326538086
at epoch 5 loss is :0.1919918179512024
at epoch 6 loss is :0.18776063919067382
at epoch 7 loss is :0.18520112037658693
at epoch 8 loss is :0.184558641910553
at epoch 9 loss is :0.1930071473121643
at epoch 10 loss is :0.19292012453079224


[36mrandom[39m: [32mRandom[39m = scala.util.Random@6f3e5371
[36mlossSeq[39m: [32mIndexedSeq[39m[[32mDouble[39m] = [33mVector[39m(
  [32m0.22984604835510253[39m,
[33m...[39m
[36mplot[39m: [32mSeq[39m[[32mScatter[39m] = [33mList[39m(
  [33mScatter[39m(
[33m...[39m
[36mres4_3[39m: [32mString[39m = [32m"plot-870570541"[39m

## 训练神经网络

跟上一节相同，我们使用测试数据来查看神经网络判断结果并计算准确率。这次准确率应该会有所上升，最终结果在40%左右。

In [6]:
val right = Utils.getAccuracy(myNeuralNetwork.predict(testData), testExpectResult)
println(s"the result is $right %")

the result is 38.0 %


[36mright[39m: [32mDouble[39m = [32m38.0[39m

## 总结

在这节中我们学到了：

* Mini-Batch Gradient Descent


[完整代码](https://github.com/izhangzhihao/deeplearning-tutorial/blob/master/src/main/scala/com/thoughtworks/deeplearning/tutorial/MiniBatchGradientDescent.scala)