### Note

Please view the [README](https://github.com/deeplearning4j/deeplearning4j/tree/master/dl4j-examples/tutorials/README.md) to learn about installing, setting up dependencies, and importing notebooks in Zeppelin

### Background

#### Convolutional Neural Networks (CNN)
Mainly neural networks extract features from the data they are fed with. Previously, we dealt with fully-connected feedforward networks which multiply inputs with weights, add biases, learns and adjust them, iteratively. In effect, the network parameters are only concerned with individual features and doesn't concern themselves with surrounding features, relative to the particular feature in focus. But considering surrounding features is important for improving network's performance.

To accomplish this, we use convolutional neural networks (CNN for short). CNN use a matrix of weights for producing more general features from the input features. Such a set of weights is called a kernel (also known as filters in 2D convolutions). These kernels are best suited for training with images. They slide over the image pixels in 2D and give responses as another feature matrix. 

At the last few layers of a CNN we have fully-connected layers which then transforms our learned feature matrix into a set of predictions on which we can analyse our network's performance.

--- 

#### Goals
- Learn about how CNNs work
- Working with CNN in DL4J

## 1. How CNNs work

### Basic concept

At each convolutional layer, our network learns about general features from input. As we go deeper, the CNN layers computes more general features. For example, the first CNN might be computing edges present in the image. The next layer might learn about shapes generated through the edges. The following image shows a visual representation of features learned at each CNN layer.

|||||---|---|---|---|
|Features learned by CNN Layers|![Features learned by CNN Layers](http://parse.ele.tue.nl/cluster/0/fig1.png)|[Source](http://parse.ele.tue.nl/cluster/0/fig1.png)|[Site](http://derekjanni.github.io/Easy-Neural-Nets/)|

___Figure 1:___ The above network shows how more generalized features are learned at each layer of the network.

|||||---|---|---|---|
|Basic convolution operation|![Basic convolution operation](http://www.songho.ca/dsp/convolution/files/conv2d_matrix.jpg)|[Source](http://www.songho.ca/dsp/convolution/files/conv2d_matrix.jpg)|[Site](http://www.songho.ca/dsp/convolution/convolution.html)|

___Figure 2:___ The above figure shows how a basic convolution operation is computed. It's just a weighted sum of corresponding matrix values (between input and kernel)

### CNN related terms

- ___Stride___
    _It tells the convolutional layer how many columns or rows (or both) it should skip while sliding the kernel over the inputs. This results in decreasing the output volume. Which further results in lesser network computations._

- ___Padding___
    _It tells us how many zeros to append at the matrix's boundary. This also help us in controlling the output volume._

- ___Pooling___
    _Pooling layer (also subsampling layer) also lets us reduce the output volume by appling different types of filtering or other mathematical operations to the output convolutional responses. Pooling has different types - such as, max pooling, average pooling, L2-norm. Max pooling is the most commonly used pooling type. It gets the maximum of all the values covered by the kernel specified._

## 2. CNN in DL4J
Let's see how all of this works in DL4J

You can build a convolutional layer in DL4J as:
```
val convLayer = new ConvolutionLayer.Builder(Array[Int](5, 5), Array[Int](1, 1), Array[Int](0, 0))
        .name("cnn").nOut(50).biasInit(0).build
```
Here the convolutional layer is created with a kernel size of _5x5_, _1x1_ stride and a padding of _0x0_ with _50_ kernels and all biases initialized to _0_
__Note:__ Kernels also has a depth. The depth is equal to the depth of inputs coming from the previous layer. So, on the first layer, this kernel would have a size of 5x5x3 with 3-channel RGB image being fed as an input.

The output for a convolutional layer is of the shape (WoxHoxDo), where:
__Wo=(W−Fw+2Ph)/Sh+1__ -> __W__ is the input width, __Fw__ is the kernel width, __Ph__ is the horizontal padding, __Sh__ is the horizontal stride
__Ho=(H−Fh+2Pv)/Sv+1__ -> __H__ is the input height, __Fh__ is the kernel height, __Pv__ is the vertical padding, __Sv__ is the vertical stride
__Do=K__ -> __K__ is the number of kernels applied

For a pooling layer we can do something like this:
```
val poolLayer = new SubsamplingLayer.Builder(Array[Int](2, 2), Array[Int](2, 2)).name("maxpool").build
```
The default type of pooling is max pooling. Here we're building a pooling layer with a kernel size of _2x2_ and a stride of _2x2_

The output is of the shape (WoxHoxDo), for a pooling layer, where: 
__Wo = (W−Fw)/Sh+1__ -> __W__ is the input width, __Fw__ is the kernel width, __Sh__ is the horizontal stride
__Ho = (H−Fh)/Sv+1__ -> __H__ is the input height, __Fh__ is the kernel height, __Sv__ is the vertical stride
__Do = Di__ -> __Di__ is the input depth

Also, we have to configure how we pass inputs to it. If it's already an image with multiple channels, we use:
```
val conf = new NeuralNetConfiguration.Builder()
// Hyperparameters code here
.list
// Layers code here
.setInputType(InputType.convolutional(32, 32, 3)) // Setting our input type here (32x32x3 image)
.build
```

Otherwise, if it's a linear array of inputs, we can do something like this:
```
val conf = new NeuralNetConfiguration.Builder()
// Hyperparameters code here
.list
// Layers code here
.setInputType(InputType.convolutionalFlat(28, 28, 1)) // Setting our input type here (784 linear array to 28x28x1 input)
.build
```

### Imports

In [5]:
import java.{lang, util}

import org.deeplearning4j.datasets.iterator.impl.MnistDataSetIterator
import org.deeplearning4j.nn.api.{Model, OptimizationAlgorithm}
import org.deeplearning4j.nn.conf.inputs.InputType
import org.deeplearning4j.nn.conf.layers.{ConvolutionLayer, DenseLayer, OutputLayer, SubsamplingLayer}
import org.deeplearning4j.nn.conf.{LearningRatePolicy, NeuralNetConfiguration, Updater}
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork
import org.deeplearning4j.nn.weights.WeightInit
import org.deeplearning4j.optimize.api.IterationListener
import org.nd4j.linalg.activations.Activation
import org.nd4j.linalg.lossfunctions.LossFunctions

### Setting up the network inputs
We'll use the __MNIST__ dataset for this tutorial.

In [7]:
// Hyperparameters
val seed = 123
val epochs = 1
val batchSize = 64
val learningRate = 0.1
val learningRateDecay = 0.1

// MNIST Dataset
val mnistTrain = new MnistDataSetIterator(batchSize, true, seed)
val mnistTest = new MnistDataSetIterator(batchSize, false, seed)

### Creating a CNN network

In [9]:
// Learning rate schedule
val lrSchedule: util.Map[Integer, lang.Double] = new util.HashMap[Integer, lang.Double]
lrSchedule.put(0, learningRate)
lrSchedule.put(1000, learningRate * learningRateDecay)
lrSchedule.put(4000, learningRate * Math.pow(learningRateDecay, 2))

// Network Configuration
val conf = new NeuralNetConfiguration.Builder()
    .seed(seed)
    .iterations(1)
    .regularization(true).l2(0.005)
    .activation(Activation.RELU)
    .learningRateDecayPolicy(LearningRatePolicy.Schedule)
    .learningRateSchedule(lrSchedule)
    .weightInit(WeightInit.XAVIER)
    .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
    .updater(Updater.ADAM)
    .list
    .layer(0, new ConvolutionLayer.Builder(Array[Int](5, 5), Array[Int](1, 1), Array[Int](0, 0))
      .name("cnn1").nOut(50).biasInit(0).build)
    .layer(1, new SubsamplingLayer.Builder(Array[Int](2, 2), Array[Int](2, 2)).name("maxpool1").build)
    .layer(2, new ConvolutionLayer.Builder(Array(5, 5), Array[Int](1, 1), Array[Int](0, 0))
      .name("cnn2").nOut(100).biasInit(0).build)
    .layer(3, new SubsamplingLayer.Builder(Array[Int](2, 2), Array[Int](2, 2)).name("maxpool2").build)
    .layer(4, new DenseLayer.Builder().nOut(500).build)
    .layer(5, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
      .nOut(10).activation(Activation.SOFTMAX).build)
    .backprop(true).pretrain(false)
    .setInputType(InputType.convolutionalFlat(28, 28, 1))
    .build

// Initializing model and setting custom listeners
val model = new MultiLayerNetwork(conf)
model.init()
model.setListeners(new IterationListener {
  override def invoke(): Unit = ???   
  override def iterationDone(model: Model, iteration: Int): Unit = {
    if(iteration % 100 == 0) {
      println("Score at iteration " + iteration + " is " + model.score())
    }
  }   
  override def invoked(): Nothing = ???
})

### Training and Evaluation

In [11]:
(1 to epochs).foreach(epochStep => {
  println("Epoch: " + epochStep)
  model.fit(mnistTrain) // Training
  // print the basic statistics about the trained classifier
  println("Training Stats for epoch: " + epochStep + " -> " + model.evaluate(mnistTest).stats(true)) // Evaluation
  mnistTest.reset()
})

### Summary

In short, CNNs give better accuracy than simple fully-connected networks. In the example above, we got nearly 99% accuracy with just a simple CNN on the MNIST dataset. 
Some famous and successful CNNs to study are AlexNet, LeNet, InceptionNet etc.

### What's next?

- Check out all of our tutorials available [on Github](https://github.com/deeplearning4j/deeplearning4j/tree/master/dl4j-examples/tutorials). Notebooks are numbered for easy following.