# Mini-Batch Gradient Descent

## 背景

在大规模数据训练时，数据可以达到百万级量级。如果计算整个训练集，来获得仅仅一个参数的更新速度就太慢了。一个常用的方法是计算训练集中的小批量min-batche）数据随机梯度下降快速实现神经网络参数更新。这节我们将通过使用[Mini-Batch Gradient Descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) 来实现小批量数据随机梯度下降快速更新网络参数。

参考：

[Mini-Batch Gradient Descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent): 在大规模数据训练时，数据可以达到百万级量级。如果计算整个训练集，来获得仅仅一个参数的更新速度就太慢了。一个常用的方法是计算训练集中的小批量（batches）数据以提升参数更新速度。

## 准备工作

参考[SoftmaxLinearClassifier](https://thoughtworksinc.github.io/DeepLearning.scala/demo/SoftmaxLinearClassifier.html#准备工作)

## 构建神经网络
import所需的依赖

In [1]:
import $plugin.$ivy.`com.thoughtworks.implicit-dependent-type::implicit-dependent-type:2.0.0`

import $ivy.`com.thoughtworks.deeplearning::differentiableany:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiablenothing:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiableseq:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiabledouble:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiablefloat:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiablehlist:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiablecoproduct:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiableindarray:1.0.0-RC7`
import $ivy.`org.rauschig:jarchivelib:0.5.0`

import $ivy.`org.plotly-scala::plotly-jupyter-scala:0.3.0`

import java.io.{FileInputStream, InputStream}


import com.thoughtworks.deeplearning
import org.nd4j.linalg.api.ndarray.INDArray
import com.thoughtworks.deeplearning.DifferentiableHList._
import com.thoughtworks.deeplearning.DifferentiableDouble._
import com.thoughtworks.deeplearning.DifferentiableINDArray._
import com.thoughtworks.deeplearning.DifferentiableAny._
import com.thoughtworks.deeplearning.DifferentiableINDArray.Optimizers._
import com.thoughtworks.deeplearning.Layer.Batch
import com.thoughtworks.deeplearning.Symbolic.Layers.Identity
import com.thoughtworks.deeplearning.Symbolic._
import com.thoughtworks.deeplearning.{
  DifferentiableHList,
  DifferentiableINDArray,
  Layer,
  Symbolic
}
import com.thoughtworks.deeplearning.Poly.MathFunctions._
import com.thoughtworks.deeplearning.Poly.MathMethods./
import com.thoughtworks.deeplearning.Poly.MathOps
import org.nd4j.linalg.api.ndarray.INDArray
import org.nd4j.linalg.cpu.nativecpu.NDArray
import org.nd4j.linalg.factory.Nd4j
import org.nd4j.linalg.indexing.{INDArrayIndex, NDArrayIndex}
import org.nd4j.linalg.ops.transforms.Transforms
import org.nd4s.Implicits._
import shapeless._

import plotly._
import plotly.element._
import plotly.layout._
import plotly.JupyterScala._

import scala.collection.immutable.IndexedSeq

pprintConfig() = pprintConfig().copy(height = 5)//减少输出的行数，避免页面输出太长

import $file.ReadCIFAR10ToNDArray,ReadCIFAR10ToNDArray._
import $file.Utils,Utils._

[32mimport [39m[36m$plugin.$                                                                             

[39m
[32mimport [39m[36m$ivy.$                                                           
[39m
[32mimport [39m[36m$ivy.$                                                               
[39m
[32mimport [39m[36m$ivy.$                                                           
[39m
[32mimport [39m[36m$ivy.$                                                              
[39m
[32mimport [39m[36m$ivy.$                                                             
[39m
[32mimport [39m[36m$ivy.$                                                             
[39m
[32mimport [39m[36m$ivy.$                                                                 
[39m
[32mimport [39m[36m$ivy.$                                                                
[39m
[32mimport [39m[36m$ivy.$                               

[39m
[32mimport [39m[36m$ivy.$               

从CIFAR10 database中读取测试数据的图片和标签信息

In [2]:
//CIFAR10中的图片共有10个分类(airplane,automobile,bird,cat,deer,dog,frog,horse,ship,truck)
val NumberOfClasses: Int = 10

//加载测试数据，我们读取100条作为测试数据
val testNDArray =
   ReadCIFAR10ToNDArray.readFromResource("/cifar-10-batches-bin/test_batch.bin", 100)

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.


[36mNumberOfClasses[39m: [32mInt[39m = [32m10[39m
[36mtestNDArray[39m: [32mINDArray[39m [32m::[39m [32mINDArray[39m [32m::[39m [32mHNil[39m = [[0.62, 0.62, 0.64, 0.65, 0.62, 0.61, 0.63, 0.62, 0.62, 0.62, 0.63, 0.62, 0.63, 0.65, 0.66, 0.66, 0.65, 0.63, 0.62, 0.62, 0.61, 0.58, 0.59, 0.58, 0.58, 0.56, 0.55, 0.55, 0.56, 0.54, 0.49, 0.45, 0.59, 0.59, 0.62, 0.65, 0.63, 0.62, 0.64, 0.63, 0.64, 0.61, 0.61, 0.62, 0.64, 0.66, 0.67, 0.67, 0.66, 0.62, 0.60, 0.59, 0.57, 0.54, 0.55, 0.55, 0.58, 0.57, 0.57, 0.55, 0.56, 0.53, 0.49, 0.46, 0.59, 0.59, 0.[33m...[39m

分离和处理图像和标签数据

In [3]:
val testData = testNDArray.head

val testExpectResult = testNDArray.tail.head

val vectorizedTestExpectResult = Utils.makeVectorized(testExpectResult, NumberOfClasses)

[36mtestData[39m: [32mINDArray[39m = [[0.62, 0.62, 0.64, 0.65, 0.62, 0.61, 0.63, 0.62, 0.62, 0.62, 0.63, 0.62, 0.63, 0.65, 0.66, 0.66, 0.65, 0.63, 0.62, 0.62, 0.61, 0.58, 0.59, 0.58, 0.58, 0.56, 0.55, 0.55, 0.56, 0.54, 0.49, 0.45, 0.59, 0.59, 0.62, 0.65, 0.63, 0.62, 0.64, 0.63, 0.64, 0.61, 0.61, 0.62, 0.64, 0.66, 0.67, 0.67, 0.66, 0.62, 0.60, 0.59, 0.57, 0.54, 0.55, 0.55, 0.58, 0.57, 0.57, 0.55, 0.56, 0.53, 0.49, 0.46, 0.59, 0.59, 0.[33m...[39m
[36mtestExpectResult[39m: [32mINDArray[39m = [3.00, 8.00, 8.00, 0.00, 6.00, 6.00, 1.00, 6.00, 3.00, 1.00, 0.00, 9.00, 5.00, 7.00, 9.00, 8.00, 5.00, 7.00, 8.00, 6.00, 7.00, 0.00, 4.00, 9.00, 5.00, 2.00, 4.00, 0.00, 9.00, 6.00, 6.00, 5.00, 4.00, 5.00, 9.00, 2.00, 4.00, 1.00, 9.00, 5.00, 4.00, 6.00, 5.00, 6.00, 0.00, 9.00, 3.00, 9.00, 7.00, 6.00, 9.00, 8.00, 0.00, 3.00, 8.00, 8.00, 7.00, 7.00, 4.00, 6.00, 7.00, 3.00, 6.00, 3.00, 6.00, 2.00, 1.0[33m...[39m
[36mvectorizedTestExpectResult[39m: [32mINDArray[39m = [[0.00, 0.00, 0.00, 1.0

编写softmax函数

In [4]:
def softmax(implicit scores: INDArray @Symbolic): INDArray @Symbolic = {
  val expScores = exp(scores)
  expScores / expScores.sum(1)
}

defined [32mfunction[39m [36msoftmax[39m

设置学习率

In [5]:
implicit def optimizer: Optimizer = new LearningRate {
  def currentLearningRate() = 0.00001
}

defined [32mfunction[39m [36moptimizer[39m

7.定义一个神经网络并初始化Weight，

In [6]:
def createMyNeuralNetwork(implicit input: INDArray @Symbolic): INDArray @Symbolic = {
  val initialValueOfWeight = Nd4j.randn(3072, NumberOfClasses) * 0.001
  val weight: INDArray @Symbolic = initialValueOfWeight.toWeight
  val result: INDArray @Symbolic = input dot weight
  softmax.compose(result)
}
val myNeuralNetwork = createMyNeuralNetwork

defined [32mfunction[39m [36mcreateMyNeuralNetwork[39m
[36mmyNeuralNetwork[39m: ([32mSymbolic[39m.[32mTo[39m[[32mINDArray[39m]{type OutputData = org.nd4j.linalg.api.ndarray.INDArray;type OutputDelta = org.nd4j.linalg.api.ndarray.INDArray;type InputData = org.nd4j.linalg.api.ndarray.INDArray;type InputDelta = org.nd4j.linalg.api.ndarray.INDArray})#[32m@[39m = Compose(MultiplyINDArray(Exp(Identity()),Reciprocal(Sum(Exp(Identity()),WrappedArray(1)))),Dot(Identity(),Weight([[0.00, -0.00, -0.00, -0.00, 0.00, 0.00, 0.00, 0.00, -0.00, 0.00],
 [-0.00, 0.00, -0.00, 0.00, -0.00, -0.00, -0.00, 0.00, 0.00, -0.00],
 [-0.00, 0.00, -0.00, -0.00, 0.00, 0.00, -0.00, -0.00, -0.00, -0.00],
[33m...[39m

8.编写损失函数Loss Function

In [7]:
def lossFunction(implicit pair: (INDArray :: INDArray :: HNil) @Symbolic): Double @Symbolic = {
  val input = pair.head
  val expectedOutput = pair.tail.head
  val probabilities = myNeuralNetwork.compose(input)

  -(expectedOutput * log(probabilities)).mean //此处和准备一节中的交叉熵损失对应
}

defined [32mfunction[39m [36mlossFunction[39m

9.训练神经网络并观察每次训练loss的变化，loss的变化趋势是降低，但是不是每次都降低(前途是光明的，道路是曲折的)

In [8]:
val lossSeq = for (_ <- 0 until 2000) yield {
  val trainNDArray = ReadCIFAR10ToNDArray.getSGDTrainNDArray(256)
  val loss = lossFunction.train(
    trainNDArray.head :: Utils.makeVectorized(trainNDArray.tail.head,NumberOfClasses) :: HNil)
  println(loss)
  loss
}
val plot = Seq(
  Scatter(lossSeq.indices, lossSeq)
)

plot.plot(
  title = "loss by time"
)

0.2300957679748535
0.23005924224853516
0.22974896430969238
0.22964296340942383
0.22886085510253906
0.2283884048461914
0.2285029411315918
0.22891342639923096
0.2288301706314087
0.22834606170654298
0.22858500480651855
0.2281005859375
0.22857122421264647
0.22757086753845215
0.22658271789550782
0.22661395072937013
0.22814931869506835
0.22676370143890381
0.22599825859069825
0.22574474811553955
0.22649664878845216
0.22543058395385743
0.22714791297912598
0.22596948146820067
0.22534642219543458
0.22555179595947267
0.225640869140625
0.22564802169799805
0.2249744176864624
0.22441458702087402
0.2247509241104126
0.22422323226928711
0.2229968786239624
0.2242142677307129
0.22479887008666993
0.2242260456085205
0.22224252223968505
0.22386808395385743
0.2247616767883301
0.22345881462097167
0.22232704162597655
0.2248483657836914
0.2232210159301758
0.22340781688690187
0.22385895252227783
0.2202709674835205
0.22251501083374023
0.22217953205108643
0.22218530178070067
0.22254250049591065
0.22087087631225585

0.19296979904174805


[36mlossSeq[39m: [32mIndexedSeq[39m[[32mSymbolic[39m.[32mTo[39m.[32m<refinement>[39m.this.type.[32mOutputData[39m] = [33mVector[39m(
  [32m0.2300957679748535[39m,
  [32m0.23005924224853516[39m,
  [32m0.22974896430969238[39m,
  [32m0.22964296340942383[39m,
[33m...[39m
[36mplot[39m: [32mSeq[39m[[32mScatter[39m] = [33mList[39m(
  [33mScatter[39m(
    [33mSome[39m(
      [33mDoubles[39m(
        [33mVector[39m(
[33m...[39m
[36mres7_2[39m: [32mString[39m = [32m"plot-2056228590"[39m

10.使用训练后的神经网络判断测试数据的标签

In [9]:
val result = myNeuralNetwork.predict(testData)
println(s"result: $result") //输出判断结果

result: [[0.07, 0.11, 0.15, 0.16, 0.07, 0.16, 0.13, 0.03, 0.10, 0.02],
 [0.06, 0.22, 0.02, 0.01, 0.00, 0.01, 0.01, 0.01, 0.26, 0.41],
 [0.12, 0.07, 0.03, 0.02, 0.01, 0.02, 0.00, 0.01, 0.56, 0.16],
 [0.23, 0.09, 0.10, 0.03, 0.03, 0.03, 0.01, 0.04, 0.38, 0.05],
 [0.04, 0.04, 0.16, 0.13, 0.21, 0.15, 0.13, 0.09, 0.03, 0.02],
 [0.01, 0.12, 0.06, 0.14, 0.09, 0.13, 0.31, 0.06, 0.01, 0.06],
 [0.01, 0.05, 0.06, 0.30, 0.03, 0.32, 0.15, 0.03, 0.02, 0.02],
 [0.03, 0.03, 0.20, 0.09, 0.21, 0.11, 0.18, 0.12, 0.03, 0.02],
 [0.05, 0.07, 0.20, 0.14, 0.14, 0.19, 0.08, 0.08, 0.03, 0.01],
 [0.16, 0.28, 0.05, 0.03, 0.01, 0.01, 0.01, 0.02, 0.23, 0.19],
 [0.23, 0.09, 0.07, 0.07, 0.04, 0.08, 0.02, 0.04, 0.29, 0.05],
 [0.02, 0.28, 0.02, 0.01, 0.01, 0.01, 0.02, 0.04, 0.07, 0.52],
 [0.02, 0.12, 0.10, 0.16, 0.13, 0.19, 0.16, 0.06, 0.04, 0.02],
 [0.08, 0.30, 0.05, 0.07, 0.06, 0.07, 0.11, 0.02, 0.17, 0.08],
 [0.11, 0.15, 0.08, 0.06, 0.04, 0.04, 0.03, 0.13, 0.17, 0.18],
 [0.13, 0.04, 0.16, 0.06, 0.09, 0.08, 0.05, 0.0

[36mresult[39m: [32mSymbolic[39m.[32mTo[39m.[32m<refinement>[39m.this.type.[32mOutputData[39m = [[0.07, 0.11, 0.15, 0.16, 0.07, 0.16, 0.13, 0.03, 0.10, 0.02],
 [0.06, 0.22, 0.02, 0.01, 0.00, 0.01, 0.01, 0.01, 0.26, 0.41],
 [0.12, 0.07, 0.03, 0.02, 0.01, 0.02, 0.00, 0.01, 0.56, 0.16],
 [0.23, 0.09, 0.10, 0.03, 0.03, 0.03, 0.01, 0.04, 0.38, 0.05],
 [0.04, 0.04, 0.16, 0.13, 0.21, 0.15, 0.13, 0.09, 0.03, 0.02],
[33m...[39m

12.判断神经网络对测试数据分类判断的正确率，正确率应该在38%左右。

In [10]:
val right = Utils.getAccuracy(result, testExpectResult)
println(s"the result is $right %")

the result is 37.0 %


[36mright[39m: [32mDouble[39m = [32m37.0[39m


13.[完整代码](https://github.com/izhangzhihao/deeplearning-tutorial/blob/master/src/main/scala/com/thoughtworks/deeplearning/tutorial/MiniBatchGradientDescent.scala)