## Background

To improve the prediction accuracy, we need to use multi-layer neural network, because generally, the more the layers of a network, the higher the capability of the network. That is because the more the parameters, the more the state information represented, and the stronger the expression capability will be. Multi-layer neural network can tackle with more complex problems.

In this article, we will first define a simple neural network, and then use the training set of [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) to train this neural network. Finally, we will use a test set to verify the accuracy of the neural network, and the final accuracy can reach 51%.

## Import dependencies

In [5]:
import $plugin.$ivy.`com.thoughtworks.implicit-dependent-type::implicit-dependent-type:2.0.0`

import $ivy.`com.thoughtworks.deeplearning::differentiableany:1.0.0`
import $ivy.`com.thoughtworks.deeplearning::differentiablenothing:1.0.0`
import $ivy.`com.thoughtworks.deeplearning::differentiableseq:1.0.0`
import $ivy.`com.thoughtworks.deeplearning::differentiabledouble:1.0.0`
import $ivy.`com.thoughtworks.deeplearning::differentiablefloat:1.0.0`
import $ivy.`com.thoughtworks.deeplearning::differentiablehlist:1.0.0`
import $ivy.`com.thoughtworks.deeplearning::differentiablecoproduct:1.0.0`
import $ivy.`com.thoughtworks.deeplearning::differentiableindarray:1.0.0`
import $ivy.`org.nd4j:nd4j-native-platform:0.7.2`
import $ivy.`org.rauschig:jarchivelib:0.5.0`

import $ivy.`org.plotly-scala::plotly-jupyter-scala:0.3.0`

import java.io.{FileInputStream, InputStream}


import com.thoughtworks.deeplearning
import org.nd4j.linalg.api.ndarray.INDArray
import com.thoughtworks.deeplearning.DifferentiableHList._
import com.thoughtworks.deeplearning.DifferentiableDouble._
import com.thoughtworks.deeplearning.DifferentiableINDArray._
import com.thoughtworks.deeplearning.DifferentiableAny._
import com.thoughtworks.deeplearning.DifferentiableINDArray.Optimizers._
import com.thoughtworks.deeplearning.DifferentiableINDArray.Layers.Weight
import com.thoughtworks.deeplearning.{
  DifferentiableHList,
  DifferentiableINDArray,
  Layer,
  Symbolic
}
import com.thoughtworks.deeplearning.Layer.Tape
import com.thoughtworks.deeplearning.Symbolic.Layers.Identity
import com.thoughtworks.deeplearning.Symbolic._
import com.thoughtworks.deeplearning.Poly.MathFunctions._
import com.thoughtworks.deeplearning.Poly.MathMethods./
import com.thoughtworks.deeplearning.Poly.MathOps
import org.nd4j.linalg.api.ndarray.INDArray
import org.nd4j.linalg.factory.Nd4j
import org.nd4j.linalg.indexing.{INDArrayIndex, NDArrayIndex}
import org.nd4j.linalg.ops.transforms.Transforms
import org.nd4s.Implicits._
import shapeless._

import plotly._
import plotly.element._
import plotly.layout._
import plotly.JupyterScala._

import scala.collection.immutable.IndexedSeq
import scala.util.Random

pprintConfig() = pprintConfig().copy(height = 2)

import $file.ReadCIFAR10ToNDArray
import $file.Utils

Compiling ReadCIFAR10ToNDArray.sc
Compiling Utils.sc


[32mimport [39m[36m$plugin.$                                                                             

[39m
[32mimport [39m[36m$ivy.$                                                       
[39m
[32mimport [39m[36m$ivy.$                                                           
[39m
[32mimport [39m[36m$ivy.$                                                       
[39m
[32mimport [39m[36m$ivy.$                                                          
[39m
[32mimport [39m[36m$ivy.$                                                         
[39m
[32mimport [39m[36m$ivy.$                                                         
[39m
[32mimport [39m[36m$ivy.$                                                             
[39m
[32mimport [39m[36m$ivy.$                                                            
[39m
[32mimport [39m[36m$ivy.$                                    
[39m
[32mimport [39m[36m$ivy.$                               

[39m
[32

## Build two layers of neural network

### Parameter tuning

This article is different with the previous article, in this article, we will adopt some means for parameter tuning, set learning rate and use [L2Regularization](http://neuralnetworksanddeeplearning.com/chap3.html),L2Regularization can be used to avoid [overfitting](https://en.wikipedia.org/wiki/Overfitting). We also solved the problem of too-slow decrease or no decrease of `loss` due to relatively high `learningRate` during extended training time, by decreasing each iteration `learningRate` to that 0.9995 time of each of its original value.

In [6]:
implicit val optimizerFactory = new DifferentiableINDArray.OptimizerFactory {
  override def ndArrayOptimizer(weight: Weight): Optimizer = {
    new LearningRate with L2Regularization {

      var learningRate = 0.001

      override protected def currentLearningRate(): Double = {
        learningRate *= 0.9995
        learningRate
      }

      override protected def l2Regularization: Double = 0.03
    }
  }
}

[36moptimizerFactory[39m: [32mAnyRef[39m with [32mOptimizerFactory[39m = $sess.cmd5Wrapper$Helper$$anon$2@5e698c2

### Write the first layer of the neural network

This is the neural network consisted of full connection and [relu](http://stats.stackexchange.com/questions/126238/what-are-the-advantages-of-relu-over-sigmoid-function-in-deep-neural-network).

In [7]:
def fullyConnectedThenRelu(inputSize: Int, outputSize: Int)(
    implicit row: INDArray @Symbolic): INDArray @Symbolic = {
  val w = (Nd4j.randn(inputSize, outputSize) / math.sqrt(outputSize / 2.0)).toWeight * 0.1
  val b = Nd4j.zeros(outputSize).toWeight
  max((row dot w) + b, 0.0)
}

defined [32mfunction[39m [36mfullyConnectedThenRelu[39m

### Write the second layer of the neural network

Like the last article, we use `softmax` as the classifier.

In [8]:
def softmax(implicit scores: INDArray @Symbolic): INDArray @Symbolic = {
  val expScores = exp(scores)
  expScores / expScores.sum(1)
}

defined [32mfunction[39m [36msoftmax[39m

Write the second neural network of the two layers of the neural network. This is a neural network consisted of a layer of full connection and a layer of `softmax`.

In [9]:
def fullyConnectedThenSoftmax(inputSize: Int, outputSize: Int)(
    implicit row: INDArray @Symbolic): INDArray @Symbolic = {
  val w = (Nd4j.randn(inputSize, outputSize) / math.sqrt(outputSize)).toWeight
  val b = Nd4j.zeros(outputSize).toWeight
  softmax.compose((row dot w) + b)
}

defined [32mfunction[39m [36mfullyConnectedThenSoftmax[39m

## Combine two layers of the neural network

To implement two layers of neural network, we use `compose` to combine the above two layers of neural networks into one tow-layer neural network. `a.compose(b)` can input the output of `b` as `a`, so as to combine the two layers of neural network.

In [10]:
val NumberOfPixels: Int = 3072
def hiddenLayer(implicit input: INDArray @Symbolic): INDArray @Symbolic = {
  val layer0 = fullyConnectedThenRelu(NumberOfPixels, 500).compose(input)
  fullyConnectedThenSoftmax(500, 10).compose(layer0)
}

val predictor = hiddenLayer

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.


[36mNumberOfPixels[39m: [32mInt[39m = [32m3072[39m
defined [32mfunction[39m [36mhiddenLayer[39m
[36mpredictor[39m: ([32mSymbolic[39m.[32mTo[39m[[32mINDArray[39m]{type OutputData = org.nd4j.linalg.api.ndarray.INDArray;type OutputDelta = org.nd4j.linalg.api.ndarray.INDArray;type InputData = org.nd4j.linalg.api.ndarray.INDArray;type InputDelta = org.nd4j.linalg.api.ndarray.INDArray})#[32m@[39m = Compose(Compose(MultiplyINDArray(Exp(Identity()),Reciprocal(Sum(Exp(Identity()),WrappedArray(1)))),PlusINDArray(Dot(Identity(),Weight([[-0.03, 0.16, -0.52, 0.09[33m...[39m

### Write `network` and combine the input layer and [hidden layer](http://stats.stackexchange.com/questions/63152/what-does-the-hidden-layer-in-a-neural-network-compute)

In [11]:
def crossEntropy(
    implicit pair: (INDArray :: INDArray :: HNil) @Symbolic): Double @Symbolic = {
  val score = pair.head
  val label = pair.tail.head
  -(label * log(score * 0.9 + 0.1) + (1.0 - label) * log(1.0 - score * 0.9)).mean
}

def network(
   implicit pair: (INDArray :: INDArray :: HNil) @Symbolic): Double @Symbolic = {
  val input = pair.head
  val label = pair.tail.head
  val score: INDArray @Symbolic = predictor.compose(input)
  val hnilLayer: HNil @Symbolic = HNil
  crossEntropy.compose(score :: label :: hnilLayer)
}

val trainer = network

defined [32mfunction[39m [36mcrossEntropy[39m
defined [32mfunction[39m [36mnetwork[39m
[36mtrainer[39m: ([32mSymbolic[39m.[32mTo[39m[[32mDouble[39m]{type OutputData = Double;type OutputDelta = Double;type InputData = shapeless.::[org.nd4j.linalg.api.ndarray.INDArray,shapeless.::[org.nd4j.linalg.api.ndarray.INDArray,shapeless.HNil]];type InputDelta = shapeless.:+:[org.nd4j.linalg.api.ndarray.INDArray,shapeless.:+:[org.nd4j.linalg.api.ndarray.INDArray,shapeless.CNil]]})#[32m@[39m = Compose(Negative(ReduceMean(PlusINDArray(MultiplyINDArray(Head(Tail(Identity())),Log(PlusDouble(MultiplyDouble(Head(Identity()),Literal(0.9)),Literal(0.1)))),Mu[33m...[39m

## Train the neural network

Like the last article, train the neural network and observe the change of `loss` in each training.

In [12]:
val random = new Random

val MiniBatchSize = 256

//10 label of CIFAR10 images(airplane,automobile,bird,cat,deer,dog,frog,horse,ship,truck)
val NumberOfClasses: Int = 10

def trainData(randomIndexArray: Array[Int]): Double = {
  val trainNDArray :: expectLabel :: shapeless.HNil =
    ReadCIFAR10ToNDArray.getSGDTrainNDArray(randomIndexArray)
  val input =
    trainNDArray.reshape(MiniBatchSize, NumberOfPixels)

  val expectLabelVectorized =
    Utils.makeVectorized(expectLabel, NumberOfClasses)
  trainer.train(input :: expectLabelVectorized :: HNil)
}

val lossSeq =
  (
    for (iteration <- 0 to 50) yield {
      val randomIndex = random
        .shuffle[Int, IndexedSeq](0 until 10000) //https://issues.scala-lang.org/browse/SI-6948
        .toArray
      for (times <- 0 until 10000 / MiniBatchSize) yield {
        val randomIndexArray =
          randomIndex.slice(times * MiniBatchSize,
                            (times + 1) * MiniBatchSize)
          val loss = trainData(randomIndexArray)
          if(times == 3 & iteration % 5 == 4){
            println("at epoch " + (iteration / 5 + 1) + " loss is :" + loss)
          }
          loss
      }
    }
  ).flatten

plotly.JupyterScala.init()

val plot = Seq(
  Scatter(lossSeq.indices, lossSeq)
)

plot.plot(
  title = "loss by time"
)

at epoch 1 loss is :0.2202770948410034
at epoch 2 loss is :0.21146070957183838
at epoch 3 loss is :0.20522694587707518
at epoch 4 loss is :0.1946331739425659
at epoch 5 loss is :0.18703371286392212
at epoch 6 loss is :0.19631543159484863
at epoch 7 loss is :0.18934404850006104
at epoch 8 loss is :0.18849481344223024
at epoch 9 loss is :0.18192483186721803
at epoch 10 loss is :0.17548918724060059


[36mrandom[39m: [32mRandom[39m = scala.util.Random@52f81f44
[36mMiniBatchSize[39m: [32mInt[39m = [32m256[39m
[36mNumberOfClasses[39m: [32mInt[39m = [32m10[39m
defined [32mfunction[39m [36mtrainData[39m
[36mlossSeq[39m: [32mIndexedSeq[39m[[32mDouble[39m] = [33mVector[39m(
  [32m0.25606164932250974[39m,
[33m...[39m
[36mplot[39m: [32mSeq[39m[[32mScatter[39m] = [33mList[39m(
  [33mScatter[39m(
[33m...[39m
[36mres11_7[39m: [32mString[39m = [32m"plot-1350734249"[39m

## Read and process the test set

Like [the previous article](https://thoughtworksinc.github.io/DeepLearning.scala/demo/MiniBatchGradientDescent.html) read and process images and label information of the test set from CIFAR10 database

In [13]:
val testNDArray =
  ReadCIFAR10ToNDArray.readFromResource("/cifar-10-batches-bin/test_batch.bin", 100)

val testData = testNDArray.head

val testExpectResult = testNDArray.tail.head

val vectorizedTestExpectResult = Utils.makeVectorized(testExpectResult, NumberOfClasses)

[36mtestNDArray[39m: [32mINDArray[39m [32m::[39m [32mINDArray[39m [32m::[39m [32mHNil[39m = [[0.62, 0.62, 0.64, 0.65, 0.62, 0.61, 0.63, 0.62, 0.62, 0.62, 0.63, 0.62, 0.63, 0.65, 0.66, 0.66, 0.65, 0.63, 0.62, 0.62, 0.61, 0.58, 0.59, 0.58, 0.58, 0.56, 0.[33m...[39m
[36mtestData[39m: [32mINDArray[39m = [[0.62, 0.62, 0.64, 0.65, 0.62, 0.61, 0.63, 0.62, 0.62, 0.62, 0.63, 0.62, 0.63, 0.65, 0.66, 0.66, 0.65, 0.63, 0.62, 0.62, 0.61, 0.58, 0.59, 0.58, 0.58, 0.56, 0.[33m...[39m
[36mtestExpectResult[39m: [32mINDArray[39m = [3.00, 8.00, 8.00, 0.00, 6.00, 6.00, 1.00, 6.00, 3.00, 1.00, 0.00, 9.00, 5.00, 7.00, 9.00, 8.00, 5.00, 7.00, 8.00, 6.00, 7.00, 0.00, 4.00, 9.00, 5.00, 2.00, 4.0[33m...[39m
[36mvectorizedTestExpectResult[39m: [32mINDArray[39m = [[0.00, 0.00, 0.00, 1.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
 [0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 1.00, 0.00],
[33m...[39m

## Verify the prediction accuracy of the neural network

Like the last article, we use the test data to verify the prediction result of the neural network and compute the accuracy. This accuracy shall increase to about 51%.

In [14]:
val right = Utils.getAccuracy(predictor.predict(testData), testExpectResult)
println(s"the result is $right %")

the result is 51.0 %


[36mright[39m: [32mDouble[39m = [32m51.0[39m

## Summery

In this article, we have learned the follows:

* Parameter tuning
* L2Regularization
* Relu
* Build a two-layer neural network

[Complete code](https://github.com/izhangzhihao/deeplearning-tutorial/blob/master/src/main/scala/com/thoughtworks/deeplearning/tutorial/TwoLayerNet.scala)