# Softmax分类器

## 背景

这篇文章我们将使用softmax分类器一起来构建一个简单的图像分类神经网络，其准确率可以达到32%。我们将使用[cifar10](https://www.cs.toronto.edu/~kriz/cifar.html)的图像和标签数据来训练这个神经网络。Softmax分类器是二元逻辑回归泛化到多元的情况。Softmax分类器的输出不是得分，而是对应类别的概率。我们开始吧。

## 构建神经网络

类似前一篇教程[GetStarted](https://thoughtworksinc.github.io/DeepLearning.scala/demo/GettingStarted.html)，我们需要引入DeepLearning.scala的各个类。

In [6]:
import $plugin.$ivy.`com.thoughtworks.implicit-dependent-type::implicit-dependent-type:2.0.0`

import $ivy.`com.thoughtworks.deeplearning::differentiableany:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiablenothing:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiableseq:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiabledouble:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiablefloat:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiablehlist:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiablecoproduct:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiableindarray:1.0.0-RC7`
import $ivy.`org.rauschig:jarchivelib:0.5.0`

import $ivy.`org.plotly-scala::plotly-jupyter-scala:0.3.0`

import java.io.{FileInputStream, InputStream}


import com.thoughtworks.deeplearning
import org.nd4j.linalg.api.ndarray.INDArray
import com.thoughtworks.deeplearning.DifferentiableHList._
import com.thoughtworks.deeplearning.DifferentiableDouble._
import com.thoughtworks.deeplearning.DifferentiableINDArray._
import com.thoughtworks.deeplearning.DifferentiableAny._
import com.thoughtworks.deeplearning.DifferentiableINDArray.Optimizers._
import com.thoughtworks.deeplearning.{
  DifferentiableHList,
  DifferentiableINDArray,
  Layer,
  Symbolic
}
import com.thoughtworks.deeplearning.Layer.Batch
import com.thoughtworks.deeplearning.Symbolic.Layers.Identity
import com.thoughtworks.deeplearning.Symbolic._
import com.thoughtworks.deeplearning.Poly.MathFunctions._
import com.thoughtworks.deeplearning.Poly.MathMethods./
import com.thoughtworks.deeplearning.Poly.MathOps
import org.nd4j.linalg.api.ndarray.INDArray
import org.nd4j.linalg.cpu.nativecpu.NDArray
import org.nd4j.linalg.factory.Nd4j
import org.nd4j.linalg.indexing.{INDArrayIndex, NDArrayIndex}
import org.nd4j.linalg.ops.transforms.Transforms
import org.nd4s.Implicits._
import shapeless._

import plotly._
import plotly.element._
import plotly.layout._
import plotly.JupyterScala._

import scala.collection.immutable.IndexedSeq

[32mimport [39m[36m$plugin.$                                                                             

[39m
[32mimport [39m[36m$ivy.$                                                           
[39m
[32mimport [39m[36m$ivy.$                                                               
[39m
[32mimport [39m[36m$ivy.$                                                           
[39m
[32mimport [39m[36m$ivy.$                                                              
[39m
[32mimport [39m[36m$ivy.$                                                             
[39m
[32mimport [39m[36m$ivy.$                                                             
[39m
[32mimport [39m[36m$ivy.$                                                                 
[39m
[32mimport [39m[36m$ivy.$                                                                
[39m
[32mimport [39m[36m$ivy.$                               

[39m
[32mimport [39m[36m$ivy.$               

为了减少`jupyter-scala`输出的行数，避免页面输出太长，需要设置`pprintConfig`。

In [7]:
pprintConfig() = pprintConfig().copy(height = 2)

为了从CIFAR10 database中读取训练数据和测试数据的图片和分类信息。我们需要`import $file.ReadCIFAR10ToNDArray`。

In [8]:
import $file.ReadCIFAR10ToNDArray

//加载train数据,我们读取1000条数据作为训练数据
val trainNDArray =
  ReadCIFAR10ToNDArray.readFromResource("/cifar-10-batches-bin/data_batch_1.bin", 1000)

//加载测试数据，我们读取100条作为测试数据
val testNDArray =
  ReadCIFAR10ToNDArray.readFromResource("/cifar-10-batches-bin/test_batch.bin", 100)

Compiling ReadCIFAR10ToNDArray.sc


[32mimport [39m[36m$file.$                   

//加载train数据,我们读取1000条数据作为训练数据
[39m
[36mtrainNDArray[39m: [32mINDArray[39m [32m::[39m [32mINDArray[39m [32m::[39m [32mHNil[39m = [[0.23, 0.17, 0.20, 0.27, 0.38, 0.46, 0.54, 0.57, 0.58, 0.58, 0.51, 0.49, 0.55, 0.56, 0.54, 0.50, 0.54, 0.52, 0.48, 0.54, 0.54, 0.52, 0.53, 0.54, 0.59, 0.64, 0.[33m...[39m
[36mtestNDArray[39m: [32mINDArray[39m [32m::[39m [32mINDArray[39m [32m::[39m [32mHNil[39m = [[0.62, 0.62, 0.64, 0.65, 0.62, 0.61, 0.63, 0.62, 0.62, 0.62, 0.63, 0.62, 0.63, 0.65, 0.66, 0.66, 0.65, 0.63, 0.62, 0.62, 0.61, 0.58, 0.59, 0.58, 0.58, 0.56, 0.[33m...[39m

为了方便softmax分类器处理数据，我们先处理标签数据([one hot encoding](https://en.wikipedia.org/wiki/One-hot))：将N行一列的NDArray转换为N行NumberOfClasses列的NDArray，每行对应的正确分类的值为1，其它列的值为0。这里区分训练集和测试集的原因是为了能看出网络是否被过度训练导致[过拟合](https://en.wikipedia.org/wiki/Overfitting)。

In [9]:
val trainData = trainNDArray.head
val testData = testNDArray.head

val trainExpectResult = trainNDArray.tail.head
val testExpectResult = testNDArray.tail.head

import $file.Utils

//CIFAR10中的图片共有10个分类(airplane,automobile,bird,cat,deer,dog,frog,horse,ship,truck)
val NumberOfClasses: Int = 10

val vectorizedTrainExpectResult = Utils.makeVectorized(trainExpectResult, NumberOfClasses)
val vectorizedTestExpectResult = Utils.makeVectorized(testExpectResult, NumberOfClasses)

Compiling Utils.sc


[36mtrainData[39m: [32mINDArray[39m = [[0.23, 0.17, 0.20, 0.27, 0.38, 0.46, 0.54, 0.57, 0.58, 0.58, 0.51, 0.49, 0.55, 0.56, 0.54, 0.50, 0.54, 0.52, 0.48, 0.54, 0.54, 0.52, 0.53, 0.54, 0.59, 0.64, 0.[33m...[39m
[36mtestData[39m: [32mINDArray[39m = [[0.62, 0.62, 0.64, 0.65, 0.62, 0.61, 0.63, 0.62, 0.62, 0.62, 0.63, 0.62, 0.63, 0.65, 0.66, 0.66, 0.65, 0.63, 0.62, 0.62, 0.61, 0.58, 0.59, 0.58, 0.58, 0.56, 0.[33m...[39m
[36mtrainExpectResult[39m: [32mINDArray[39m = [6.00, 9.00, 9.00, 4.00, 1.00, 1.00, 2.00, 7.00, 8.00, 3.00, 4.00, 7.00, 7.00, 2.00, 9.00, 9.00, 9.00, 3.00, 2.00, 6.00, 4.00, 3.00, 6.00, 6.00, 2.00, 6.00, 3.0[33m...[39m
[36mtestExpectResult[39m: [32mINDArray[39m = [3.00, 8.00, 8.00, 0.00, 6.00, 6.00, 1.00, 6.00, 3.00, 1.00, 0.00, 9.00, 5.00, 7.00, 9.00, 8.00, 5.00, 7.00, 8.00, 6.00, 7.00, 0.00, 4.00, 9.00, 5.00, 2.00, 4.0[33m...[39m
[32mimport [39m[36m$file.$    

//CIFAR10中的图片共有10个分类(airplane,automobile,bird,cat,deer,dog,frog,horse,ship,truck)
[39m

为了使用softmax分类器(softmax分类器是softmax和一个全连接组合起来的神经网络)，我们需要先编写softmax函数,公式：![](https://www.zhihu.com/equation?tex=f_j%28z%29%3D%5Cfrac%7Be%5E%7Bz_j%7D%7D%7B%5Csum_ke%5E%7Bz_k%7D%7D)

In [10]:
def softmax(implicit scores: INDArray @Symbolic): INDArray @Symbolic = {
  val expScores = exp(scores)
  expScores / expScores.sum(1)
}

defined [32mfunction[39m [36msoftmax[39m

全连接层需要设置学习率，学习率是Weight变化的快慢的直观描述，学习率设置的过小会导致loss下降的很慢，需要更长时间来训练，学习率设置的过大虽然刚开始下降很快但是会导致在接近最低点的时候在附近徘徊loss下降会非常慢。

In [11]:
implicit def optimizer: Optimizer = new LearningRate {
  def currentLearningRate() = 0.00001
}

defined [32mfunction[39m [36moptimizer[39m

定义一个全连接层并初始化Weight，Weight应该是一个N*NumberOfClasses的INDArray,每个图片对应每个分类都有一个评分。[什么是Weight](https://github.com/ThoughtWorksInc/DeepLearning.scala/wiki/Getting-Started#231--weight-intialization)

In [12]:
def createMyNeuralNetwork(implicit input: INDArray @Symbolic): INDArray @Symbolic = {
  val initialValueOfWeight = Nd4j.randn(3072, NumberOfClasses) * 0.001
  val weight: INDArray @Symbolic = initialValueOfWeight.toWeight
  val result: INDArray @Symbolic = input dot weight
  softmax.compose(result) //对结果调用softmax方法，压缩结果值在0到1之间方便处理
}
val myNeuralNetwork = createMyNeuralNetwork

defined [32mfunction[39m [36mcreateMyNeuralNetwork[39m
[36mmyNeuralNetwork[39m: ([32mSymbolic[39m.[32mTo[39m[[32mINDArray[39m]{type OutputData = org.nd4j.linalg.api.ndarray.INDArray;type OutputDelta = org.nd4j.linalg.api.ndarray.INDArray;type InputData = org.nd4j.linalg.api.ndarray.INDArray;type InputDelta = org.nd4j.linalg.api.ndarray.INDArray})#[32m@[39m = Compose(MultiplyINDArray(Exp(Identity()),Reciprocal(Sum(Exp(Identity()),WrappedArray(1)))),Dot(Identity(),Weight([[-0.00, 0.00, -0.00, 0.00, 0.00, -0.00, 0.00, [33m...[39m

为了判断神经网络判断的结果好坏，我们需要编写损失函数Loss Function，这里我们使用cross-entropy loss将此次判断的结果和真实结果进行对比并返回评分，公式：
![](https://zhihu.com/equation?tex=%5Cdisplaystyle+H%28p%2Cq%29%3D-%5Csum_xp%28x%29+logq%28x%29)

In [13]:
def lossFunction(implicit pair: (INDArray :: INDArray :: HNil) @Symbolic): Double @Symbolic = {
  val input = pair.head
  val expectedOutput = pair.tail.head
  val probabilities = myNeuralNetwork.compose(input)

  -(expectedOutput * log(probabilities)).mean //此处和准备一节中的交叉熵损失对应
}

defined [32mfunction[39m [36mlossFunction[39m

为了观察神经网络训练的过程，我们需要输出`loss`，在训练神经网络时，loss的变化趋势应该是越来越低的

In [14]:
val lossSeq = for (i <- 0 until 2000) yield {
  val loss = lossFunction.train(trainData :: vectorizedTrainExpectResult :: HNil)
  if(i % 100 == 0){
    println("loss is :" + loss)
  }
  loss
}

val plot = Seq(
  Scatter(lossSeq.indices, lossSeq)
)

plot.plot(
  title = "loss by time"
)

loss is :0.2303436767578125
loss is :0.1908893310546875
loss is :0.17819090576171875
loss is :0.17023267822265625
loss is :0.1642774658203125
loss is :0.15943809814453125
loss is :0.1553156982421875
loss is :0.1516980224609375
loss is :0.14845870361328126
loss is :0.14551583251953126
loss is :0.1428130615234375
loss is :0.1403093017578125
loss is :0.13797386474609374
loss is :0.13578282470703126
loss is :0.13371734619140624
loss is :0.13176209716796874
loss is :0.1299045654296875
loss is :0.12813433837890625
loss is :0.126442578125
loss is :0.124821923828125


[36mlossSeq[39m: [32mIndexedSeq[39m[[32mSymbolic[39m.[32mTo[39m.[32m<refinement>[39m.this.type.[32mOutputData[39m] = [33mVector[39m(
  [32m0.2303436767578125[39m,
[33m...[39m
[36mplot[39m: [32mSeq[39m[[32mScatter[39m] = [33mList[39m(
  [33mScatter[39m(
[33m...[39m
[36mres13_2[39m: [32mString[39m = [32m"plot-1291879632"[39m

我们使用已经处理好的测试数据来验证神经网络的准确率。

In [15]:
val result = myNeuralNetwork.predict(testData)
println(s"result: $result") //输出判断结果

result: [[0.03, 0.05, 0.17, 0.13, 0.01, 0.13, 0.43, 0.00, 0.04, 0.00],
 [0.03, 0.17, 0.00, 0.05, 0.00, 0.01, 0.00, 0.00, 0.18, 0.55],
 [0.08, 0.09, 0.01, 0.03, 0.01, 0.00, 0.00, 0.01, 0.71, 0.05],
 [0.50, 0.04, 0.07, 0.04, 0.01, 0.02, 0.00, 0.03, 0.21, 0.08],
 [0.02, 0.03, 0.09, 0.05, 0.36, 0.10, 0.25, 0.08, 0.01, 0.00],
 [0.01, 0.07, 0.02, 0.63, 0.02, 0.04, 0.13, 0.05, 0.00, 0.03],
 [0.00, 0.00, 0.00, 0.68, 0.02, 0.05, 0.24, 0.00, 0.00, 0.00],
 [0.04, 0.01, 0.24, 0.04, 0.26, 0.10, 0.17, 0.11, 0.01, 0.01],
 [0.05, 0.17, 0.22, 0.09, 0.17, 0.16, 0.09, 0.03, 0.01, 0.01],
 [0.08, 0.44, 0.02, 0.03, 0.00, 0.06, 0.01, 0.01, 0.11, 0.24],
 [0.33, 0.09, 0.04, 0.09, 0.02, 0.07, 0.04, 0.02, 0.28, 0.03],
 [0.02, 0.49, 0.02, 0.05, 0.01, 0.01, 0.01, 0.03, 0.04, 0.33],
 [0.02, 0.31, 0.02, 0.24, 0.12, 0.07, 0.15, 0.07, 0.01, 0.01],
 [0.04, 0.21, 0.03, 0.07, 0.05, 0.01, 0.15, 0.01, 0.29, 0.15],
 [0.24, 0.18, 0.05, 0.17, 0.02, 0.04, 0.03, 0.13, 0.08, 0.06],
 [0.19, 0.03, 0.19, 0.02, 0.09, 0.06, 0.03, 0.0

[36mresult[39m: [32mSymbolic[39m.[32mTo[39m.[32m<refinement>[39m.this.type.[32mOutputData[39m = [[0.03, 0.05, 0.17, 0.13, 0.01, 0.13, 0.43, 0.00, 0.04, 0.00],
 [0.03, 0.17, 0.00, 0.05, 0.00, 0.01, 0.00, 0.00, 0.18, 0.55],
[33m...[39m

计算并神经网络对测试数据分类判断的正确率，正确率应该在32%左右。

In [16]:
val right = Utils.getAccuracy(result, testExpectResult)
println(s"the result is $right %")

the result is 32.0 %


[36mright[39m: [32mDouble[39m = [32m32.0[39m


[完整代码](https://github.com/izhangzhihao/deeplearning-tutorial/blob/master/src/main/scala/com/thoughtworks/deeplearning/tutorial/SoftmaxLinearClassifier.scala)