# Sofxmax分类器

## 背景

logistic函数在统计学和机器学习领域应用最为广泛。逻辑斯谛回归（Logistic Regression，简称LR）作为一种对数线性模型（log-linear model）被广泛地应用于分类和回归场景中。此外，logistic函数也是神经网络最为常用的激活函数，即sigmoid函数。logistic具体针对的是二分类问题，而softmax解决的是多分类问题，也就是说softmax是logistic的一般化。softmax函数经常用在神经网络的最后一层，作为输出层，进行多分类。此外，softmax在深度学习领域内，softmax经常被用作将某个值转化为激活概率，这一节我们将通过使用softmax实现多分类的神经网络。

参考：

   [softmax](https://en.wikipedia.org/wiki/Softmax_function): softmax是[logistic](https://en.wikipedia.org/wiki/Logistic_function) 对多分类的一般化归纳。公式：![](https://www.zhihu.com/equation?tex=f_j%28z%29%3D%5Cfrac%7Be%5E%7Bz_j%7D%7D%7B%5Csum_ke%5E%7Bz_k%7D%7D)
   
   [交叉熵损失（cross-entropy loss）](https://en.wikipedia.org/wiki/Cross-entropy):p表示真实标记的分布，q则为训练后的模型的预测标记分布，交叉熵损失函数可以衡量p与q的相似性。公式：![](https://zhihu.com/equation?tex=%5Cdisplaystyle+H%28p%2Cq%29%3D-%5Csum_xp%28x%29+logq%28x%29)

## 准备工作

1.创建一个SBT项目，并引入相关依赖（参照[Getting Started](https://github.com/ThoughtWorksInc/DeepLearning.scala/wiki/Getting-Started) 或者将下面的依赖引入build.sbt, 注意DeepLearning.scala暂不支持scala2.12.X )
```
libraryDependencies += "com.thoughtworks.deeplearning" %% "differentiableany" % "latest.release"

libraryDependencies += "com.thoughtworks.deeplearning" %% "differentiablenothing" % "latest.release"

libraryDependencies += "com.thoughtworks.deeplearning" %% "differentiableseq" % "latest.release"

libraryDependencies += "com.thoughtworks.deeplearning" %% "differentiabledouble" % "latest.release"

libraryDependencies += "com.thoughtworks.deeplearning" %% "differentiablefloat" % "latest.release"

libraryDependencies += "com.thoughtworks.deeplearning" %% "differentiablehlist" % "latest.release"

libraryDependencies += "com.thoughtworks.deeplearning" %% "differentiablecoproduct" % "latest.release"

libraryDependencies += "com.thoughtworks.deeplearning" %% "differentiableindarray" % "latest.release"

addCompilerPlugin("com.thoughtworks.implicit-dependent-type" %% "implicit-dependent-type" % "latest.release")

addCompilerPlugin("org.scalamacros" % "paradise" % "2.1.0" cross CrossVersion.full)

fork := true
```
2.[下载CIFAR-10 binary version (suitable for C programs)](https://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz)，文件大小162 MB，md5sum：c32a1d4ab5d03f1284b67883e8d87530

3.将下载好的文件解压到src/main/resources目录。或者直接运行，程序会自动下载cifar10数据并解压文件。

4.如果你是单独下载的notebook文件，请确保有一个[ReadCIFAR10ToNDArray.sc](https://github.com/ThoughtWorksInc/DeepLearning.scala-website/blob/master/ipynbs/ReadCIFAR10ToNDArray.sc)文件和[Utils.sc](https://github.com/ThoughtWorksInc/DeepLearning.scala-website/blob/master/ipynbs/Utils.sc)文件，该文件用于从cifar10文件中读取图片及其标签数据并做归一化处理（[更多信息](https://www.cs.toronto.edu/~kriz/cifar.html)）,还有一些工具方法。

5.注意：如果你使用IntelliJ或者eclipse等其它IDE，智能提示可能会失效，代码有部分可能会爆红，这是IDE的问题，代码本身并无问题。
   

## 构建神经网络

import所需的依赖

In [1]:
import $plugin.$ivy.`com.thoughtworks.implicit-dependent-type::implicit-dependent-type:2.0.0`

import $ivy.`com.thoughtworks.deeplearning::differentiableany:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiablenothing:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiableseq:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiabledouble:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiablefloat:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiablehlist:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiablecoproduct:1.0.0-RC7`
import $ivy.`com.thoughtworks.deeplearning::differentiableindarray:1.0.0-RC7`
import $ivy.`org.rauschig:jarchivelib:0.5.0`

import $ivy.`org.plotly-scala::plotly-jupyter-scala:0.3.0`

import java.io.{FileInputStream, InputStream}


import com.thoughtworks.deeplearning
import org.nd4j.linalg.api.ndarray.INDArray
import com.thoughtworks.deeplearning.DifferentiableHList._
import com.thoughtworks.deeplearning.DifferentiableDouble._
import com.thoughtworks.deeplearning.DifferentiableINDArray._
import com.thoughtworks.deeplearning.DifferentiableAny._
import com.thoughtworks.deeplearning.DifferentiableINDArray.Optimizers._
import com.thoughtworks.deeplearning.{
  DifferentiableHList,
  DifferentiableINDArray,
  Layer,
  Symbolic
}
import com.thoughtworks.deeplearning.Layer.Batch
import com.thoughtworks.deeplearning.Symbolic.Layers.Identity
import com.thoughtworks.deeplearning.Symbolic._
import com.thoughtworks.deeplearning.Poly.MathFunctions._
import com.thoughtworks.deeplearning.Poly.MathMethods./
import com.thoughtworks.deeplearning.Poly.MathOps
import org.nd4j.linalg.api.ndarray.INDArray
import org.nd4j.linalg.cpu.nativecpu.NDArray
import org.nd4j.linalg.factory.Nd4j
import org.nd4j.linalg.indexing.{INDArrayIndex, NDArrayIndex}
import org.nd4j.linalg.ops.transforms.Transforms
import org.nd4s.Implicits._
import shapeless._

import plotly._
import plotly.element._
import plotly.layout._
import plotly.JupyterScala._

import scala.collection.immutable.IndexedSeq

pprintConfig() = pprintConfig().copy(height = 5)//减少输出的行数，避免页面输出太长

import $file.ReadCIFAR10ToNDArray,ReadCIFAR10ToNDArray._
import $file.Utils,Utils._

Compiling Utils.sc


[32mimport [39m[36m$plugin.$                                                                             

[39m
[32mimport [39m[36m$ivy.$                                                           
[39m
[32mimport [39m[36m$ivy.$                                                               
[39m
[32mimport [39m[36m$ivy.$                                                           
[39m
[32mimport [39m[36m$ivy.$                                                              
[39m
[32mimport [39m[36m$ivy.$                                                             
[39m
[32mimport [39m[36m$ivy.$                                                             
[39m
[32mimport [39m[36m$ivy.$                                                                 
[39m
[32mimport [39m[36m$ivy.$                                                                
[39m
[32mimport [39m[36m$ivy.$                               

[39m
[32mimport [39m[36m$ivy.$               

从CIFAR10 database中读取训练数据和测试数据的图片和标签信息

In [2]:
//CIFAR10中的图片共有10个分类(airplane,automobile,bird,cat,deer,dog,frog,horse,ship,truck)
val NumberOfClasses: Int = 10

//加载train数据,我们读取1000条数据作为训练数据
val trainNDArray =
  ReadCIFAR10ToNDArray.readFromResource("/cifar-10-batches-bin/data_batch_1.bin", 1000)

//加载测试数据，我们读取100条作为测试数据
val testNDArray =
  ReadCIFAR10ToNDArray.readFromResource("/cifar-10-batches-bin/test_batch.bin", 100)

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.


[36mNumberOfClasses[39m: [32mInt[39m = [32m10[39m
[36mtrainNDArray[39m: [32mINDArray[39m [32m::[39m [32mINDArray[39m [32m::[39m [32mHNil[39m = [[0.23, 0.17, 0.20, 0.27, 0.38, 0.46, 0.54, 0.57, 0.58, 0.58, 0.51, 0.49, 0.55, 0.56, 0.54, 0.50, 0.54, 0.52, 0.48, 0.54, 0.54, 0.52, 0.53, 0.54, 0.59, 0.64, 0.66, 0.62, 0.62, 0.62, 0.59, 0.58, 0.06, 0.00, 0.07, 0.20, 0.34, 0.47, 0.50, 0.50, 0.49, 0.45, 0.41, 0.39, 0.41, 0.44, 0.43, 0.44, 0.46, 0.43, 0.41, 0.49, 0.50, 0.48, 0.51, 0.48, 0.47, 0.51, 0.52, 0.52, 0.52, 0.48, 0.46, 0.48, 0.10, 0.06, 0.[33m...[39m
[36mtestNDArray[39m: [32mINDArray[39m [32m::[39m [32mINDArray[39m [32m::[39m [32mHNil[39m = [[0.62, 0.62, 0.64, 0.65, 0.62, 0.61, 0.63, 0.62, 0.62, 0.62, 0.63, 0.62, 0.63, 0.65, 0.66, 0.66, 0.65, 0.63, 0.62, 0.62, 0.61, 0.58, 0.59, 0.58, 0.58, 0.56, 0.55, 0.55, 0.56, 0.54, 0.49, 0.45, 0.59, 0.59, 0.62, 0.65, 0.63, 0.62, 0.64, 0.63, 0.64, 0.61, 0.61, 0.62, 0.64, 0.66, 0.67, 0.67, 0.66, 0.62, 0.60, 0.59, 0.57, 0

处理图像和标签数据

In [3]:
val trainData = trainNDArray.head
val testData = testNDArray.head

val trainExpectResult = trainNDArray.tail.head
val testExpectResult = testNDArray.tail.head
  
val vectorizedTrainExpectResult = Utils.makeVectorized(trainExpectResult, NumberOfClasses)
val vectorizedTestExpectResult = Utils.makeVectorized(testExpectResult, NumberOfClasses)

[36mtrainData[39m: [32mINDArray[39m = [[0.23, 0.17, 0.20, 0.27, 0.38, 0.46, 0.54, 0.57, 0.58, 0.58, 0.51, 0.49, 0.55, 0.56, 0.54, 0.50, 0.54, 0.52, 0.48, 0.54, 0.54, 0.52, 0.53, 0.54, 0.59, 0.64, 0.66, 0.62, 0.62, 0.62, 0.59, 0.58, 0.06, 0.00, 0.07, 0.20, 0.34, 0.47, 0.50, 0.50, 0.49, 0.45, 0.41, 0.39, 0.41, 0.44, 0.43, 0.44, 0.46, 0.43, 0.41, 0.49, 0.50, 0.48, 0.51, 0.48, 0.47, 0.51, 0.52, 0.52, 0.52, 0.48, 0.46, 0.48, 0.10, 0.06, 0.[33m...[39m
[36mtestData[39m: [32mINDArray[39m = [[0.62, 0.62, 0.64, 0.65, 0.62, 0.61, 0.63, 0.62, 0.62, 0.62, 0.63, 0.62, 0.63, 0.65, 0.66, 0.66, 0.65, 0.63, 0.62, 0.62, 0.61, 0.58, 0.59, 0.58, 0.58, 0.56, 0.55, 0.55, 0.56, 0.54, 0.49, 0.45, 0.59, 0.59, 0.62, 0.65, 0.63, 0.62, 0.64, 0.63, 0.64, 0.61, 0.61, 0.62, 0.64, 0.66, 0.67, 0.67, 0.66, 0.62, 0.60, 0.59, 0.57, 0.54, 0.55, 0.55, 0.58, 0.57, 0.57, 0.55, 0.56, 0.53, 0.49, 0.46, 0.59, 0.59, 0.[33m...[39m
[36mtrainExpectResult[39m: [32mINDArray[39m = [6.00, 9.00, 9.00, 4.00, 1.00, 1.00, 2.

编写softmax函数,和背景中的softmax公式对应

In [4]:
def softmax(implicit scores: INDArray @Symbolic): INDArray @Symbolic = {
  val expScores = exp(scores)
  expScores / expScores.sum(1)
}

defined [32mfunction[39m [36msoftmax[39m

设置学习率，学习率是Weight变化的快慢的直观描述，学习率设置的过小会导致loss下降的很慢，需要更长时间来训练，学习率设置的过大虽然刚开始下降很快但是会导致在接近最低点的时候在附近徘徊loss下降会非常慢。

In [5]:
implicit def optimizer: Optimizer = new LearningRate {
  def currentLearningRate() = 0.00001
}

defined [32mfunction[39m [36moptimizer[39m

跟定义一个方法一样定义一个神经网络并初始化Weight，Weight应该是一个N*NumberOfClasses的INDArray,每个图片对应每个分类都有一个评分。[什么是Weight](https://github.com/ThoughtWorksInc/DeepLearning.scala/wiki/Getting-Started#231--weight-intialization)

In [6]:
def createMyNeuralNetwork(implicit input: INDArray @Symbolic): INDArray @Symbolic = {
  val initialValueOfWeight = Nd4j.randn(3072, NumberOfClasses) * 0.001
  val weight: INDArray @Symbolic = initialValueOfWeight.toWeight
  val result: INDArray @Symbolic = input dot weight
  softmax.compose(result) //对结果调用softmax方法，压缩结果值在0到1之间方便处理
}
val myNeuralNetwork = createMyNeuralNetwork

defined [32mfunction[39m [36mcreateMyNeuralNetwork[39m
[36mmyNeuralNetwork[39m: ([32mSymbolic[39m.[32mTo[39m[[32mINDArray[39m]{type OutputData = org.nd4j.linalg.api.ndarray.INDArray;type OutputDelta = org.nd4j.linalg.api.ndarray.INDArray;type InputData = org.nd4j.linalg.api.ndarray.INDArray;type InputDelta = org.nd4j.linalg.api.ndarray.INDArray})#[32m@[39m = Compose(MultiplyINDArray(Exp(Identity()),Reciprocal(Sum(Exp(Identity()),WrappedArray(1)))),Dot(Identity(),Weight([[-0.00, -0.00, -0.00, 0.00, 0.00, 0.00, -0.00, -0.00, -0.00, 0.00],
 [0.00, -0.00, 0.00, 0.00, -0.00, 0.00, -0.00, -0.00, 0.00, 0.00],
 [-0.00, 0.00, -0.00, -0.00, -0.00, -0.00, -0.00, -0.00, 0.00, -0.00],
[33m...[39m

编写损失函数Loss Function，将此次判断的结果和真实结果进行计算得出cross-entropy loss并返回

In [7]:
def lossFunction(implicit pair: (INDArray :: INDArray :: HNil) @Symbolic): Double @Symbolic = {
  val input = pair.head
  val expectedOutput = pair.tail.head
  val probabilities = myNeuralNetwork.compose(input)

  -(expectedOutput * log(probabilities)).mean //此处和准备一节中的交叉熵损失对应
}

defined [32mfunction[39m [36mlossFunction[39m

训练神经网络并观察每次训练loss的变化，loss的变化趋势应该是越来越低的

In [8]:
val lossSeq = for (_ <- 0 until 2000) yield {
  val loss = lossFunction.train(trainData :: vectorizedTrainExpectResult :: HNil)
  println(loss)
  loss
}

val plot = Seq(
  Scatter(lossSeq.indices, lossSeq)
)

plot.plot(
  title = "loss by time"
)

0.230025244140625
0.228420166015625
0.2274596923828125
0.2265696044921875
0.22570986328125
0.22487568359375
0.22406591796875
0.223279248046875
0.222514794921875
0.22177158203125
0.2210488525390625
0.220345703125
0.219661279296875
0.21899482421875
0.218345654296875
0.2177130615234375
0.2170963623046875
0.216494921875
0.215908203125
0.215335546875
0.2147764892578125
0.214230419921875
0.213696826171875
0.2131753662109375
0.2126654541015625
0.212166796875
0.2116788330078125
0.21120126953125
0.2107336669921875
0.21027578125
0.2098271484375
0.209387548828125
0.208956591796875
0.208533984375
0.20811953125
0.2077128662109375
0.207313818359375
0.20692197265625
0.2065372802734375
0.206159423828125
0.20578828125
0.205423486328125
0.205064990234375
0.2047124755859375
0.20436583251953125
0.20402491455078126
0.203689501953125
0.20335938720703126
0.203034521484375
0.20271468505859375
0.202399755859375
0.202089599609375
0.2017841064453125
0.201483056640625
0.2011864501953125
0.200894091796875
0.200605

0.12328685302734375


[36mlossSeq[39m: [32mIndexedSeq[39m[[32mSymbolic[39m.[32mTo[39m.[32m<refinement>[39m.this.type.[32mOutputData[39m] = [33mVector[39m(
  [32m0.230025244140625[39m,
  [32m0.228420166015625[39m,
  [32m0.2274596923828125[39m,
  [32m0.2265696044921875[39m,
[33m...[39m
[36mplot[39m: [32mSeq[39m[[32mScatter[39m] = [33mList[39m(
  [33mScatter[39m(
    [33mSome[39m(
      [33mDoubles[39m(
        [33mVector[39m(
[33m...[39m
[36mres7_2[39m: [32mString[39m = [32m"plot-36238955"[39m

使用训练后的神经网络判断测试数据的标签

In [9]:
val result = myNeuralNetwork.predict(testData)
println(s"result: $result") //输出判断结果

result: [[0.03, 0.05, 0.17, 0.13, 0.01, 0.13, 0.43, 0.00, 0.04, 0.00],
 [0.03, 0.17, 0.00, 0.05, 0.00, 0.01, 0.00, 0.00, 0.18, 0.55],
 [0.08, 0.09, 0.01, 0.03, 0.01, 0.00, 0.00, 0.01, 0.71, 0.05],
 [0.50, 0.04, 0.07, 0.04, 0.01, 0.02, 0.00, 0.03, 0.21, 0.08],
 [0.02, 0.03, 0.09, 0.05, 0.35, 0.10, 0.25, 0.08, 0.01, 0.00],
 [0.01, 0.07, 0.02, 0.62, 0.02, 0.04, 0.14, 0.05, 0.00, 0.03],
 [0.00, 0.00, 0.00, 0.68, 0.02, 0.05, 0.24, 0.00, 0.00, 0.00],
 [0.04, 0.01, 0.24, 0.04, 0.26, 0.10, 0.17, 0.11, 0.01, 0.01],
 [0.05, 0.17, 0.22, 0.09, 0.17, 0.16, 0.09, 0.03, 0.01, 0.01],
 [0.09, 0.45, 0.02, 0.03, 0.00, 0.06, 0.01, 0.01, 0.11, 0.24],
 [0.33, 0.09, 0.04, 0.09, 0.02, 0.07, 0.04, 0.02, 0.28, 0.03],
 [0.02, 0.49, 0.02, 0.05, 0.01, 0.01, 0.01, 0.03, 0.04, 0.33],
 [0.02, 0.31, 0.02, 0.24, 0.12, 0.07, 0.15, 0.07, 0.01, 0.01],
 [0.04, 0.21, 0.03, 0.07, 0.05, 0.01, 0.15, 0.02, 0.29, 0.15],
 [0.24, 0.18, 0.05, 0.17, 0.02, 0.04, 0.03, 0.13, 0.08, 0.06],
 [0.20, 0.03, 0.19, 0.02, 0.09, 0.06, 0.03, 0.0

[36mresult[39m: [32mSymbolic[39m.[32mTo[39m.[32m<refinement>[39m.this.type.[32mOutputData[39m = [[0.03, 0.05, 0.17, 0.13, 0.01, 0.13, 0.43, 0.00, 0.04, 0.00],
 [0.03, 0.17, 0.00, 0.05, 0.00, 0.01, 0.00, 0.00, 0.18, 0.55],
 [0.08, 0.09, 0.01, 0.03, 0.01, 0.00, 0.00, 0.01, 0.71, 0.05],
 [0.50, 0.04, 0.07, 0.04, 0.01, 0.02, 0.00, 0.03, 0.21, 0.08],
 [0.02, 0.03, 0.09, 0.05, 0.35, 0.10, 0.25, 0.08, 0.01, 0.00],
[33m...[39m

判断神经网络对测试数据分类判断的正确率，正确率应该在32%左右。

In [10]:
val right = Utils.getAccuracy(result, testExpectResult)
println(s"the result is $right %")

the result is 32.0 %


[36mright[39m: [32mDouble[39m = [32m32.0[39m


[完整代码](https://github.com/izhangzhihao/deeplearning-tutorial/blob/master/src/main/scala/com/thoughtworks/deeplearning/tutorial/SoftmaxLinearClassifier.scala)