## Background

When the numbers of layers are increasing, the parameters of a multi-layer neural network built by full connection increase rapidly, and the computing speed will be very slow, which greatly affects the neural network training. We will introduce [convolution](https://zh.wikipedia.org/zh/convolution), which avoids the explosive growth of parameters during the increase of numbers of the fully connected layers by parameter sharing. As a result, it reduces computation complexity and we can build deeper neural network.

In this article, we will first define a multi-layer neural network including 16 convolutional layers, and then use the training set of [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) to train this neural network. Finally, we will use a test set to verify the accuracy of the neural network. Attention: according to the configuration of your device, this training requires to be operated for several hours, and the hyper-parameters need to be regulated according to the decrease of `loss`and the variation of the accuracy. The final accuracy can reach above 55%.

## Import dependencies 

In [1]:
import $plugin.$ivy.`com.thoughtworks.implicit-dependent-type::implicit-dependent-type:2.0.0`

import $ivy.`com.thoughtworks.deeplearning::differentiableany:1.0.0`
import $ivy.`com.thoughtworks.deeplearning::differentiablenothing:1.0.0`
import $ivy.`com.thoughtworks.deeplearning::differentiableseq:1.0.0`
import $ivy.`com.thoughtworks.deeplearning::differentiabledouble:1.0.0`
import $ivy.`com.thoughtworks.deeplearning::differentiablefloat:1.0.0`
import $ivy.`com.thoughtworks.deeplearning::differentiablehlist:1.0.0`
import $ivy.`com.thoughtworks.deeplearning::differentiablecoproduct:1.0.0`
import $ivy.`com.thoughtworks.deeplearning::differentiableindarray:1.0.0`
import $ivy.`org.nd4j:nd4j-native-platform:0.7.2`
import $ivy.`org.rauschig:jarchivelib:0.5.0`

import $ivy.`org.plotly-scala::plotly-jupyter-scala:0.3.0`

import java.io.{FileInputStream, InputStream}


import com.thoughtworks.deeplearning
import org.nd4j.linalg.api.ndarray.INDArray
import com.thoughtworks.deeplearning.DifferentiableHList._
import com.thoughtworks.deeplearning.DifferentiableDouble._
import com.thoughtworks.deeplearning.DifferentiableINDArray._
import com.thoughtworks.deeplearning.DifferentiableAny._
import com.thoughtworks.deeplearning.DifferentiableInt._
import com.thoughtworks.deeplearning.DifferentiableSeq._
import com.thoughtworks.deeplearning.DifferentiableINDArray.Optimizers._
import com.thoughtworks.deeplearning.Layer.Tape.Aux
import com.thoughtworks.deeplearning._
import com.thoughtworks.deeplearning.Layer.{Aux, Tape}
import com.thoughtworks.deeplearning.Symbolic.Layers.Identity
import com.thoughtworks.deeplearning.Symbolic._
import com.thoughtworks.deeplearning.Poly.MathFunctions._
import com.thoughtworks.deeplearning.Poly.MathMethods.{*, /}
import com.thoughtworks.deeplearning.Poly.MathOps
import org.nd4j.linalg.api.ndarray.INDArray
import org.nd4j.linalg.factory.Nd4j
import org.nd4j.linalg.factory.Nd4j.PadMode
import org.nd4j.linalg.factory.Nd4j.PadMode.EDGE
import org.nd4j.linalg.indexing.{INDArrayIndex, NDArrayIndex}
import org.nd4j.linalg.ops.transforms.Transforms
import org.nd4s.Implicits._
import plotly.Scatter
import shapeless._
import plotly.Plotly._
import plotly._
import plotly.JupyterScala._
import shapeless.OpticDefns.compose

import scala.annotation.tailrec
import scala.collection.immutable.IndexedSeq
import Utils._
import com.thoughtworks.deeplearning.DifferentiableINDArray.Layers.Weight
import scala.util.Random

pprintConfig() = pprintConfig().copy(height = 2)

import $file.ReadCIFAR10ToNDArray
import $file.Utils

[32mimport [39m[36m$plugin.$                                                                             

[39m
[32mimport [39m[36m$ivy.$                                                           
[39m
[32mimport [39m[36m$ivy.$                                                               
[39m
[32mimport [39m[36m$ivy.$                                                           
[39m
[32mimport [39m[36m$ivy.$                                                              
[39m
[32mimport [39m[36m$ivy.$                                                             
[39m
[32mimport [39m[36m$ivy.$                                                             
[39m
[32mimport [39m[36m$ivy.$                                                                 
[39m
[32mimport [39m[36m$ivy.$                                                                
[39m
[32mimport [39m[36m$ivy.$                               

[39m
[32mimport [39m[36m$ivy.$               

## Build the neural network 

### Set the learning rate 

Like the previous article, the learning rate here is set by combination of [Adam](https://en.wikipedia.org/wiki/Adam) and `L2Regularization`. To avoid a relatively slow decrease of `loss` resulted from a relatively high learning rate in later training period, we need to set a flag `isAEpochDone`, so that at the end of each epoch, the learning rate will be regulated to that of 0.75 time of the original value. 

In [2]:
var isAEpochDone = false

implicit val optimizerFactory = new DifferentiableINDArray.OptimizerFactory {
  override def ndArrayOptimizer(weight: Weight): Optimizer = {

    new LearningRate with L2Regularization with Adam {

      var learningRate = 0.00003

      override protected def l2Regularization: Double = 0.00003

      override protected def currentLearningRate(): Double = {
        learningRate = if (isAEpochDone) {
          isAEpochDone = false
          learningRate * 0.75
        } else {
          learningRate
        }
        learningRate
      }
    }
  }
}

[36misAEpochDone[39m: [32mBoolean[39m = [32mfalse[39m
[36moptimizerFactory[39m: [32mAnyRef[39m with [32mOptimizerFactory[39m = $sess.cmd1Wrapper$Helper$$anon$2@58db475a

### Write the convolution layer 

This is a neural network performing ReLU calculation after convolution calculation, here we set `kernelSize` to 3*3, and both `Padding` and `Stride` are 1 at both directions. The reason for this setting is to keep the original dimensions of the image when its enters the convolution layer, so as to keep image dimensions unchanged during overlaying of many networks. Otherwise, the dimensions of the image may reduce quickly, and most data may be lost, resulting in the reduce of parameters learned by the neural network, which will undermine the prediction effect.

In [3]:
val Stride = 1

val Padding = 1

val KernelSize = 3

def convolutionThenRelu(numberOfInputKernels: Int,
                        numberOfOutputKernels: Int)(
    implicit input: INDArray @Symbolic): INDArray @Symbolic = {
  val weight: INDArray @Symbolic =
    (Nd4j.randn(
      Array(numberOfOutputKernels,
            numberOfInputKernels,
            KernelSize,
            KernelSize)) *
      math.sqrt(2.0 / numberOfInputKernels / KernelSize / KernelSize)).toWeight
  //When using RELUs, make sure biases are initialised with small *positive* values for example 0.1
  val bias = Nd4j.ones(numberOfOutputKernels).toWeight * 0.1

  val convResult =
    conv2d(input, weight, bias, (3, 3), (1, 1), (1, 1))
  max(convResult, 0.0)
}

[36mStride[39m: [32mInt[39m = [32m1[39m
[36mPadding[39m: [32mInt[39m = [32m1[39m
[36mKernelSize[39m: [32mInt[39m = [32m3[39m
defined [32mfunction[39m [36mconvolutionThenRelu[39m

### Write softmax classifier 

Similar with the last article; write `softmax` layer and fully connected layer.

In [4]:
def softmax(implicit scores: INDArray @Symbolic): INDArray @Symbolic = {
  val expScores = exp(scores)
  expScores / expScores.sum(1)
}

def fullyConnectedThenSoftmax(inputSize: Int, outputSize: Int)(
    implicit input: INDArray @Symbolic): INDArray @Symbolic = {
  val imageCount = input.shape(0)

  val weight =
    (Nd4j.randn(inputSize, outputSize) / math.sqrt(outputSize)).toWeight
  val bias = Nd4j.zeros(outputSize).toWeight

  softmax.compose(
    (input.reshape(imageCount, inputSize.toLayer) dot weight) + bias)
}

defined [32mfunction[39m [36msoftmax[39m
defined [32mfunction[39m [36mfullyConnectedThenSoftmax[39m

### Build a neural network including 16 convolutional layers

To build a neural network including 16 convolutional layers, we need to use the layer defined above to combine a network consisted of 16 convolutional layers, fully connected layer and `softmax` layer.

In [5]:
val MiniBatchSize = 64

val Depth = List.fill(17)(3)

val InputSize = 32

val KernelNumber = List.fill(17)(3)

val random = new Random

def hiddenLayer(implicit input: INDArray @Symbolic): INDArray @Symbolic = {
  @tailrec
  def convFunction(timesToRun: Int,
                   timesNow: Int,
                   input2: INDArray @Symbolic): INDArray @Symbolic = {
    if (timesToRun <= 0) {
      input2
    } else {
      convFunction(
        timesToRun - 1,
        timesNow + 1,
        convolutionThenRelu(Depth(timesNow * 2 + 1),
                            KernelNumber(timesNow * 2 + 1)).compose(
          convolutionThenRelu(Depth(timesNow * 2),
                              KernelNumber(timesNow * 2)).compose(input2)
        )
      )
    }
  }

  val recLayer = convFunction(8, 0, input)

  fullyConnectedThenSoftmax(32 * 32 * 3, 10).compose(recLayer)
}

val predictor = hiddenLayer

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.


[36mMiniBatchSize[39m: [32mInt[39m = [32m64[39m
[36mDepth[39m: [32mList[39m[[32mInt[39m] = [33mList[39m([32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m)
[36mInputSize[39m: [32mInt[39m = [32m32[39m
[36mKernelNumber[39m: [32mList[39m[[32mInt[39m] = [33mList[39m([32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m, [32m3[39m)
[36mrandom[39m: [32mRandom[39m = scala.util.Random@3c6fd99a
defined [32mfunction[39m [36mhiddenLayer[39m
[36mpredictor[39m: ([32mSymbolic[39m.[32mTo[39m[[32mINDArray[39m]{type OutputData = org.nd4j.linalg.api.ndarray.INDArray;type OutputDelta = org.nd4j.linalg.api.ndarray.INDArray;type InputData = org.

### Write loss function and combine the input layer and hidden layer

In [6]:
def crossEntropyLossFunction(
    implicit pair: (INDArray :: INDArray :: HNil) @Symbolic): Double @Symbolic = {
  val score = pair.head
  val label = pair.tail.head
  -(label * log(score * 0.9 + 0.1) + (1.0 - label) * log(1.0 - score * 0.9)).mean
}

def network(
    implicit pair: (INDArray :: INDArray :: HNil) @Symbolic): Double @Symbolic = {
  val input = pair.head
  val label = pair.tail.head
  val score: INDArray @Symbolic = predictor.compose(input)
  crossEntropyLossFunction.compose(score :: label :: HNil.toLayer)
}

val trainNetwork = network

defined [32mfunction[39m [36mcrossEntropyLossFunction[39m
defined [32mfunction[39m [36mnetwork[39m
[36mtrainNetwork[39m: ([32mSymbolic[39m.[32mTo[39m[[32mDouble[39m]{type OutputData = Double;type OutputDelta = Double;type InputData = shapeless.::[org.nd4j.linalg.api.ndarray.INDArray,shapeless.::[org.nd4j.linalg.api.ndarray.INDArray,shapeless.HNil]];type InputDelta = shapeless.:+:[org.nd4j.linalg.api.ndarray.INDArray,shapeless.:+:[org.nd4j.linalg.api.ndarray.INDArray,shapeless.CNil]]})#[32m@[39m = Compose(Negative(ReduceMean(PlusINDArray(MultiplyINDArray(Head(Tail(Identity())),Log(PlusDouble(MultiplyDouble(Head(Identity()),Literal(0.9)),Literal(0.1)))),Mu[33m...[39m

## Read and process the test set

Similar to [the previous article](https://thoughtworksinc.github.io/DeepLearning.scala/demo/TwoLayerNet.html), read and process the images and labels in test set from CIFAR10 database.

In [7]:
//10 label of CIFAR10 images(airplane,automobile,bird,cat,deer,dog,frog,horse,ship,truck)
val NumberOfClasses: Int = 10

val NumberOfTestSize = 100

val testNDArray =
  ReadCIFAR10ToNDArray.readFromResource("/cifar-10-batches-bin/test_batch.bin", NumberOfTestSize)

val testData = testNDArray.head

val testExpectLabel = testNDArray.tail.head

val testExpectLabelVectorized = Utils.makeVectorized(testExpectLabel, NumberOfClasses)

[36mNumberOfClasses[39m: [32mInt[39m = [32m10[39m
[36mNumberOfTestSize[39m: [32mInt[39m = [32m100[39m
[36mtestNDArray[39m: [32mINDArray[39m [32m::[39m [32mINDArray[39m [32m::[39m [32mHNil[39m = [[0.62, 0.62, 0.64, 0.65, 0.62, 0.61, 0.63, 0.62, 0.62, 0.62, 0.63, 0.62, 0.63, 0.65, 0.66, 0.66, 0.65, 0.63, 0.62, 0.62, 0.61, 0.58, 0.59, 0.58, 0.58, 0.56, 0.[33m...[39m
[36mtestData[39m: [32mINDArray[39m = [[0.62, 0.62, 0.64, 0.65, 0.62, 0.61, 0.63, 0.62, 0.62, 0.62, 0.63, 0.62, 0.63, 0.65, 0.66, 0.66, 0.65, 0.63, 0.62, 0.62, 0.61, 0.58, 0.59, 0.58, 0.58, 0.56, 0.[33m...[39m
[36mtestExpectLabel[39m: [32mINDArray[39m = [3.00, 8.00, 8.00, 00, 6.00, 6.00, 1.00, 6.00, 3.00, 1.00, 00, 9.00, 5.00, 7.00, 9.00, 8.00, 5.00, 7.00, 8.00, 6.00, 7.00, 00, 40, 9.00, 5.00, 2.00, 4[33m...[39m
[36mtestExpectLabelVectorized[39m: [32mINDArray[39m = [[00, 00, 00, 1.00, 00, 00, 00, 00, 00, 00],
 [00, 00, 00, 00, 00, 00, 00, 00, 1.00, 00],
[33m...[39m

## Train the neural network

This article is similar to the previous article. However, some methods for verifying accuracies of the neural network prediction test set and training set are newly added, which print the accuracies of the neural network prediction test set and training set after training each epoch.

In [8]:
val reshapedTestData = testData.reshape(NumberOfTestSize, 3, InputSize, InputSize)

def trainData(randomIndexArray: Array[Int],
              isComputeAccuracy: Boolean): (Double, Double, Double) = {
  val trainNDArray :: expectLabel :: shapeless.HNil =
    ReadCIFAR10ToNDArray.getSGDTrainNDArray(randomIndexArray)
  val input =
    trainNDArray.reshape(MiniBatchSize, 3, InputSize, InputSize)

  val expectLabelVectorized =
    Utils.makeVectorized(expectLabel, NumberOfClasses)
  val trainLoss = trainNetwork.train(input :: expectLabelVectorized :: HNil)

  if (isComputeAccuracy) {
    val trainResult: INDArray = predictor.predict(input)

    val trainAccuracy = Utils.getAccuracy(trainResult, expectLabel) / 100

    val testResult: INDArray = predictor.predict(reshapedTestData)

    val testAccuracy = Utils.getAccuracy(testResult, testExpectLabel) / 100
    println(
      s"train accuracy : $trainAccuracy ,\t\ttest accuracy : $testAccuracy ,\t\ttrain loss : $trainLoss ")
    (trainLoss, trainAccuracy, testAccuracy)
  } else {
    println(s"train loss : $trainLoss ")
    (trainLoss, 0, 0)
  }
}

[36mreshapedTestData[39m: [32mINDArray[39m = [[[[0.62, 0.62, 0.64, 0.65, 0.62, 0.61, 0.63, 0.62, 0.62, 0.62, 0.63, 0.62, 0.63, 0.65, 0.66, 0.66, 0.65, 0.63, 0.62, 0.62, 0.61, 0.58, 0.59, 0.58, 0.58, 0.56, 0.55, 0.55, 0.56, 0.54, 0.49, 0.45],
   [0.59, 0.59, 0.62, 0.65, 0.63, 0.62, 0.64, 0.63, 0.64, 0.61, 0.61, 0.62, 0.64, 0.66, 0.67, 0.67, 0.66, 0.62, 0.60, 0.59, 0.57, 0.54, 0.55, 0.55, 0.58, 0.57, [33m...[39m
defined [32mfunction[39m [36mtrainData[39m

Begin the training of neural network and observe the changes of `loss` during each training; attention: according to the configuration of your device, this training requires to be operated for several hours, and the hyper-parameters need to be regulated according to the decrease of `loss`and the variation of the accuracy according to the configuration of your device.

In [9]:
val resultTuple: Seq[(Double, Double, Double)] =
  (
    for (blocks <- 0 to 50) yield {
      if (blocks % 5 == 0 && blocks != 0) {
        isAEpochDone = true
      }
      val randomIndex = random
        .shuffle[Int, IndexedSeq](0 until 10000) //https://issues.scala-lang.org/browse/SI-6948
        .toArray
      for (times <- 0 until 10000 / MiniBatchSize) yield {
        val randomIndexArray =
          randomIndex.slice(times * MiniBatchSize,
                            (times + 1) * MiniBatchSize)
        trainData(randomIndexArray, isAEpochDone)
      }
    }
  ).flatten

train loss : 0.2887134552001953 
train loss : 0.2755943536758423 
train loss : 0.28840744495391846 
train loss : 0.27853400707244874 
train loss : 0.2880797147750854 
train loss : 0.2994242668151855 
train loss : 0.27094957828521726 
train loss : 0.2903454780578613 
train loss : 0.2740288257598877 
train loss : 0.28114173412322996 
train loss : 0.25999433994293214 
train loss : 0.2745546817779541 
train loss : 0.2774548053741455 
...

Use `trainLossSeq`, `trainAccuracySeq` and `testAccuracySeq` for plotting, and observe the trends of `trainLoss`, `trainAccuracy` and `testAccuracy`.

In [10]:
val (trainLossSeq, trainAccuracySeq, testAccuracySeq) =
  resultTuple.unzip3

val filteredTrainAccuracySeq = 0 +: trainAccuracySeq.filter(item =>
  item != 0)

val filteredTestAccuracySeq = 0 +: testAccuracySeq.filter(item =>
  item != 0)

val interval = 5 * 10000 / MiniBatchSize

plotly.JupyterScala.init()

val plot = Seq(
  Scatter(trainLossSeq.indices, trainLossSeq, name = "train loss"),
  Scatter(trainLossSeq.indices by interval,
          filteredTrainAccuracySeq,
          name = "train accuracy"),
  Scatter(trainLossSeq.indices by interval,
          filteredTestAccuracySeq,
          name = "test accuracy")
)

plot.plot(title = "train loss,train accuracy,test accuracy by time")


[36mplot[39m: [32mSeq[39m[[32mScatter[39m] = [33mList[39m(
  [33mScatter[39m(
    [33mSome[39m(
      [33mDoubles[39m(
        [33mVector[39m(
          [32m0[39m,
          [32m1.0[39m,
          [32m2.0[39m,
          [32m3.0[39m,
          [32m4[39m,
          [32m5.0[39m,
          [32m6.0[39m,
[33m...[39m
    

## Summary

What we have learned in this article:

* Convolutional Neural Network (CNN)
* Adam
* Input layer and hidden layer
* Build a deep neural network

[Complete code](https://github.com/izhangzhihao/deeplearning-tutorial/blob/master/src/main/scala/com/thoughtworks/deeplearning/tutorial/CNNs.scala)