# Mini-Batch Gradient Descent

## 背景

在大规模数据训练时，数据可以达到百万级量级。如果计算整个训练集，来获得仅仅一个参数的更新速度就太慢了。一个常用的方法是计算训练集中的小批量min-batche）数据随机梯度下降快速实现神经网络参数更新。这节我们将通过使用[Mini-Batch Gradient Descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) 来实现小批量数据随机梯度下降快速更新网络参数，这样神经网络的准确率可以达到40%。

参考：

[Mini-Batch Gradient Descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent): 在大规模数据训练时，数据可以达到百万级量级。如果计算整个训练集，来获得仅仅一个参数的更新速度就太慢了。一个常用的方法是计算训练集中的小批量（batches）数据以提升参数更新速度。

## 构建神经网络

In [15]:
import $plugin.$ivy.`com.thoughtworks.implicit-dependent-type::implicit-dependent-type:2.0.0`

import $ivy.`com.thoughtworks.deeplearning::differentiableany:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiablenothing:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiableseq:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiabledouble:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiablefloat:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiablehlist:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiablecoproduct:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiableindarray:1.0.0-RC7`
import $ivy.`org.rauschig:jarchivelib:0.5.0`

import $ivy.`org.plotly-scala::plotly-jupyter-scala:0.3.0`

import java.io.{FileInputStream, InputStream}


import com.thoughtworks.deeplearning
import org.nd4j.linalg.api.ndarray.INDArray
import com.thoughtworks.deeplearning.DifferentiableHList._
import com.thoughtworks.deeplearning.DifferentiableDouble._
import com.thoughtworks.deeplearning.DifferentiableINDArray._
import com.thoughtworks.deeplearning.DifferentiableAny._
import com.thoughtworks.deeplearning.DifferentiableINDArray.Optimizers._
import com.thoughtworks.deeplearning.Layer.Batch
import com.thoughtworks.deeplearning.Symbolic.Layers.Identity
import com.thoughtworks.deeplearning.Symbolic._
import com.thoughtworks.deeplearning.{
  DifferentiableHList,
  DifferentiableINDArray,
  Layer,
  Symbolic
}
import com.thoughtworks.deeplearning.Poly.MathFunctions._
import com.thoughtworks.deeplearning.Poly.MathMethods./
import com.thoughtworks.deeplearning.Poly.MathOps
import org.nd4j.linalg.api.ndarray.INDArray
import org.nd4j.linalg.cpu.nativecpu.NDArray
import org.nd4j.linalg.factory.Nd4j
import org.nd4j.linalg.indexing.{INDArrayIndex, NDArrayIndex}
import org.nd4j.linalg.ops.transforms.Transforms
import org.nd4s.Implicits._
import shapeless._

import plotly._
import plotly.element._
import plotly.layout._
import plotly.JupyterScala._

import scala.collection.immutable.IndexedSeq
import scala.util.Random

pprintConfig() = pprintConfig().copy(height = 2)//减少输出的行数，避免页面输出太长

import $file.ReadCIFAR10ToNDArray
import $file.Utils

Compiling ReadCIFAR10ToNDArray.sc
Compiling Utils.sc


[32mimport [39m[36m$plugin.$                                                                             

[39m
[32mimport [39m[36m$ivy.$                                                           
[39m
[32mimport [39m[36m$ivy.$                                                               
[39m
[32mimport [39m[36m$ivy.$                                                           
[39m
[32mimport [39m[36m$ivy.$                                                              
[39m
[32mimport [39m[36m$ivy.$                                                             
[39m
[32mimport [39m[36m$ivy.$                                                             
[39m
[32mimport [39m[36m$ivy.$                                                                 
[39m
[32mimport [39m[36m$ivy.$                                                                
[39m
[32mimport [39m[36m$ivy.$                               

[39m
[32mimport [39m[36m$ivy.$               

类似[前一节](https://thoughtworksinc.github.io/DeepLearning.scala/demo/SoftmaxLinearClassifier.html)，我们从CIFAR10 database中读取和处理测试数据的图片和对应的标签信息。但是这次我们在这里只读取测试数据即可，训练数据会在训练时随机读取。

In [16]:
//CIFAR10中的图片共有10个分类(airplane,automobile,bird,cat,deer,dog,frog,horse,ship,truck)
val NumberOfClasses: Int = 10

//加载测试数据，我们读取100条作为测试数据
val testNDArray =
   ReadCIFAR10ToNDArray.readFromResource("/cifar-10-batches-bin/test_batch.bin", 100)

val testData = testNDArray.head

val testExpectResult = testNDArray.tail.head

val vectorizedTestExpectResult = Utils.makeVectorized(testExpectResult, NumberOfClasses)

[36mNumberOfClasses[39m: [32mInt[39m = [32m10[39m
[36mtestNDArray[39m: [32mINDArray[39m [32m::[39m [32mINDArray[39m [32m::[39m [32mHNil[39m = [[0.62, 0.62, 0.64, 0.65, 0.62, 0.61, 0.63, 0.62, 0.62, 0.62, 0.63, 0.62, 0.63, 0.65, 0.66, 0.66, 0.65, 0.63, 0.62, 0.62, 0.61, 0.58, 0.59, 0.58, 0.58, 0.56, 0.[33m...[39m
[36mtestData[39m: [32mINDArray[39m = [[0.62, 0.62, 0.64, 0.65, 0.62, 0.61, 0.63, 0.62, 0.62, 0.62, 0.63, 0.62, 0.63, 0.65, 0.66, 0.66, 0.65, 0.63, 0.62, 0.62, 0.61, 0.58, 0.59, 0.58, 0.58, 0.56, 0.[33m...[39m
[36mtestExpectResult[39m: [32mINDArray[39m = [3.00, 8.00, 8.00, 0.00, 6.00, 6.00, 1.00, 6.00, 3.00, 1.00, 0.00, 9.00, 5.00, 7.00, 9.00, 8.00, 5.00, 7.00, 8.00, 6.00, 7.00, 0.00, 4.00, 9.00, 5.00, 2.00, 4.0[33m...[39m
[36mvectorizedTestExpectResult[39m: [32mINDArray[39m = [[0.00, 0.00, 0.00, 1.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
 [0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 1.00, 0.00],
[33m...[39m

跟前一节相同，我们需要编写softmax函数，设置学习率和初始化Weight并编写LossFunction

In [17]:
def softmax(implicit scores: INDArray @Symbolic): INDArray @Symbolic = {
  val expScores = exp(scores)
  expScores / expScores.sum(1)
}

implicit def optimizer: Optimizer = new LearningRate {
  def currentLearningRate() = 0.00001
}

def createMyNeuralNetwork(implicit input: INDArray @Symbolic): INDArray @Symbolic = {
  val initialValueOfWeight = Nd4j.randn(3072, NumberOfClasses) * 0.001
  val weight: INDArray @Symbolic = initialValueOfWeight.toWeight
  val result: INDArray @Symbolic = input dot weight
  softmax.compose(result)
}
val myNeuralNetwork = createMyNeuralNetwork

def lossFunction(implicit pair: (INDArray :: INDArray :: HNil) @Symbolic): Double @Symbolic = {
  val input = pair.head
  val expectedOutput = pair.tail.head
  val probabilities = myNeuralNetwork.compose(input)

  -(expectedOutput * log(probabilities)).mean //此处和准备一节中的交叉熵损失对应
}

defined [32mfunction[39m [36msoftmax[39m
defined [32mfunction[39m [36moptimizer[39m
defined [32mfunction[39m [36mcreateMyNeuralNetwork[39m
[36mmyNeuralNetwork[39m: ([32mSymbolic[39m.[32mTo[39m[[32mINDArray[39m]{type OutputData = org.nd4j.linalg.api.ndarray.INDArray;type OutputDelta = org.nd4j.linalg.api.ndarray.INDArray;type InputData = org.nd4j.linalg.api.ndarray.INDArray;type InputDelta = org.nd4j.linalg.api.ndarray.INDArray})#[32m@[39m = Compose(MultiplyINDArray(Exp(Identity()),Reciprocal(Sum(Exp(Identity()),WrappedArray(1)))),Dot(Identity(),Weight([[0.00, -0.00, -0.00, 0.00, 0.00, 0.00, -0.00, [33m...[39m
defined [32mfunction[39m [36mlossFunction[39m

类似前一节我们需要训练神经网络，但是跟上一节不同的是，这次我们的训练数据是随机读取的，上一节是反复训练同一批数据集。训练神经网络并观察每次训练loss的变化，loss的变化趋势是降低，但是不是每次都降低(前途是光明的，道路是曲折的)。

In [18]:
val MiniBatchSize = 256

val random = new Random

val lossSeq =
  (
    for (_ <- 0 to 50) yield {
      val randomIndex = random
        .shuffle[Int, IndexedSeq](0 until 10000) //https://issues.scala-lang.org/browse/SI-6948
        .toArray
      for (times <- 0 until 10000 / MiniBatchSize) yield {
        val randomIndexArray =
          randomIndex.slice(times * MiniBatchSize,
                            (times + 1) * MiniBatchSize)
          val trainNDArray :: expectLabel :: shapeless.HNil =
            ReadCIFAR10ToNDArray.getSGDTrainNDArray(randomIndexArray)

          val input =
            trainNDArray.reshape(MiniBatchSize, 3072)

          val expectLabelVectorized =
            Utils.makeVectorized(expectLabel, NumberOfClasses)
          val loss = lossFunction.train(input :: expectLabelVectorized :: HNil)
          if(times == 0){
            println(loss)
          }
          loss
      }
    }
  ).flatten

val plot = Seq(
  Scatter(lossSeq.indices, lossSeq)
)

plot.plot(
  title = "loss by time"
)

0.23026542663574218
0.2240816354751587
0.2184659481048584
0.21519021987915038
0.20921459197998046
0.20835773944854735
0.20596656799316407
0.20402472019195556
0.19586119651794434
0.20250749588012695
0.19943023920059205
0.19915710687637328
0.19605278968811035
0.19943703413009645
0.19809952974319459
0.2017662286758423
0.1932112455368042
0.19416286945343017
0.19546313285827638
0.19444431066513063
0.20109293460845948
0.20299816131591797
0.19243981838226318
0.19276008605957032
0.1847397804260254
0.18774752616882323
0.18981533050537108
0.19522281885147094
0.18960124254226685
0.18378946781158448
0.1919804334640503
0.19413912296295166
0.18835822343826295
0.19397008419036865
0.1892086982727051
0.18813304901123046
0.18986120223999023
0.190701687335968
0.19284905195236207
0.1895303726196289
0.18327500820159912
0.1910465955734253
0.18699681758880615
0.18618117570877074
0.18339040279388427
0.18660414218902588
0.18499038219451905
0.19101003408432007
0.18396372795104982
0.18879698514938353
0.185236930

[36mMiniBatchSize[39m: [32mInt[39m = [32m256[39m
[36mrandom[39m: [32mRandom[39m = scala.util.Random@236f974c
[36mlossSeq[39m: [32mIndexedSeq[39m[[32mSymbolic[39m.[32mTo[39m.[32m<refinement>[39m.this.type.[32mOutputData[39m] = [33mVector[39m(
  [32m0.23026542663574218[39m,
[33m...[39m
[36mplot[39m: [32mSeq[39m[[32mScatter[39m] = [33mList[39m(
  [33mScatter[39m(
[33m...[39m
[36mres17_4[39m: [32mString[39m = [32m"plot-1408521785"[39m

跟上一节相同，我们使用测试数据来查看神经网络判断结果并计算准确率。这次准确率应该会有所上升，最终结果在40%左右。

In [19]:
val result = myNeuralNetwork.predict(testData)
println(s"result: $result") //输出判断结果

val right = Utils.getAccuracy(result, testExpectResult)
println(s"the result is $right %")

result: [[0.08, 0.10, 0.14, 0.17, 0.06, 0.16, 0.13, 0.03, 0.10, 0.02],
 [0.06, 0.21, 0.02, 0.01, 0.00, 0.01, 0.01, 0.01, 0.26, 0.42],
 [0.13, 0.07, 0.03, 0.02, 0.01, 0.02, 0.01, 0.01, 0.55, 0.16],
 [0.25, 0.08, 0.09, 0.04, 0.02, 0.03, 0.01, 0.04, 0.38, 0.05],
 [0.04, 0.04, 0.15, 0.14, 0.20, 0.16, 0.13, 0.09, 0.03, 0.02],
 [0.01, 0.11, 0.06, 0.15, 0.08, 0.13, 0.32, 0.06, 0.01, 0.06],
 [0.01, 0.05, 0.05, 0.31, 0.03, 0.33, 0.15, 0.03, 0.02, 0.02],
 [0.03, 0.03, 0.19, 0.09, 0.19, 0.11, 0.19, 0.12, 0.03, 0.02],
 [0.06, 0.06, 0.18, 0.16, 0.13, 0.20, 0.08, 0.08, 0.03, 0.01],
 [0.18, 0.26, 0.05, 0.03, 0.01, 0.01, 0.01, 0.03, 0.24, 0.17],
 [0.25, 0.09, 0.06, 0.08, 0.04, 0.08, 0.03, 0.03, 0.29, 0.05],
 [0.02, 0.26, 0.02, 0.01, 0.01, 0.01, 0.02, 0.04, 0.06, 0.53],
 [0.03, 0.11, 0.10, 0.17, 0.12, 0.20, 0.16, 0.06, 0.04, 0.02],
 [0.07, 0.28, 0.05, 0.07, 0.05, 0.07, 0.11, 0.02, 0.18, 0.08],
 [0.12, 0.14, 0.08, 0.06, 0.04, 0.04, 0.04, 0.13, 0.17, 0.18],
 [0.15, 0.04, 0.14, 0.06, 0.08, 0.09, 0.05, 0.0

[36mresult[39m: [32mSymbolic[39m.[32mTo[39m.[32m<refinement>[39m.this.type.[32mOutputData[39m = [[0.08, 0.10, 0.14, 0.17, 0.06, 0.16, 0.13, 0.03, 0.10, 0.02],
 [0.06, 0.21, 0.02, 0.01, 0.00, 0.01, 0.01, 0.01, 0.26, 0.42],
[33m...[39m
[36mright[39m: [32mDouble[39m = [32m38.0[39m


[完整代码](https://github.com/izhangzhihao/deeplearning-tutorial/blob/master/src/main/scala/com/thoughtworks/deeplearning/tutorial/MiniBatchGradientDescent.scala)