# Basic convolutional network for MNIST classification tutorial

## Libraries

First we will load some libraries. We will need **Flux** *(obviously)*, from which we import the MNIST dataset, our loss function and tools for onehot encoding the labels and creating minibatches. We will also use **Statistics** for calculating accuracy.

In [85]:
using Flux
using Flux: Data.MNIST, Data.DataLoader
using Flux: onehotbatch, onecold, crossentropy
using Statistics
using MLDatasets



## Loading and transforming dataset

Now we will import the MNIST dataset.

In [101]:
x_train, y_train = MNIST.traindata()
x_valid, y_valid = MNIST.testdata();

In [102]:
x_train = Flux.unsqueeze(x_train, 3)
x_valid = Flux.unsqueeze(x_valid, 3);

28×28×1×10000 reshape(reinterpret(N0f8, ::Array{UInt8,3}), 28, 28, 1, 10000) with eltype Normed{UInt8,8}:
[:, :, 1, 1] =
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0    …  0.0    0.0    0.0    0.0    0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0       0.0    0.0    0.0    0.0    0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0       0.0    0.0    0.0    0.0    0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0       0.0    0.0    0.0    0.0    0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0       0.0    0.0    0.0    0.0    0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0    …  0.0    0.0    0.0    0.0    0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.329     0.0    0.0    0.0    0.0    0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.725     0.0    0.0    0.0    0.0    0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.624     0.0    0.0    0.0    0.0    0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.592     0.0    0.0    0.0    0.0    0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.235  …  0.0    0.239  0.475  0.475  0.0
 0.0  0.0  0.0  0.0  

Then we will onehot encode our training and test labels

In [103]:
y_train = onehotbatch(y_train, 0:9)
y_valid = onehotbatch(y_valid, 0:9);

10×10000 Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}:
 0  0  0  1  0  0  0  0  0  0  1  0  0  …  0  0  0  0  0  1  0  0  0  0  0  0
 0  0  1  0  0  1  0  0  0  0  0  0  0     0  0  0  0  0  0  1  0  0  0  0  0
 0  1  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  1  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  1  0  0  0
 0  0  0  0  1  0  1  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  1  0  0
 0  0  0  0  0  0  0  0  1  0  0  0  0  …  1  0  0  0  0  0  0  0  0  0  1  0
 0  0  0  0  0  0  0  0  0  0  0  1  0     0  1  0  0  0  0  0  0  0  0  0  1
 1  0  0  0  0  0  0  0  0  0  0  0  0     0  0  1  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  1  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  1  0  1  0  0  1     0  0  0  0  1  0  0  0  0  0  0  0

And finally we will use DataLoader to manage minibatches. Here I've chosen a batchsize of 128 images.

In [23]:
train_data = DataLoader(x_train, y_train, batchsize=128);

DataLoader((Array{Gray{Normed{UInt8,8}},2}[[Gray{N0f8}(0.0) Gray{N0f8}(0.0) … Gray{N0f8}(0.0) Gray{N0f8}(0.0); Gray{N0f8}(0.0) Gray{N0f8}(0.0) … Gray{N0f8}(0.0) Gray{N0f8}(0.0); … ; Gray{N0f8}(0.0) Gray{N0f8}(0.0) … Gray{N0f8}(0.0) Gray{N0f8}(0.0); Gray{N0f8}(0.0) Gray{N0f8}(0.0) … Gray{N0f8}(0.0) Gray{N0f8}(0.0)], [Gray{N0f8}(0.0) Gray{N0f8}(0.0) … Gray{N0f8}(0.0) Gray{N0f8}(0.0); Gray{N0f8}(0.0) Gray{N0f8}(0.0) … Gray{N0f8}(0.0) Gray{N0f8}(0.0); … ; Gray{N0f8}(0.0) Gray{N0f8}(0.0) … Gray{N0f8}(0.0) Gray{N0f8}(0.0); Gray{N0f8}(0.0) Gray{N0f8}(0.0) … Gray{N0f8}(0.0) Gray{N0f8}(0.0)], [Gray{N0f8}(0.0) Gray{N0f8}(0.0) … Gray{N0f8}(0.0) Gray{N0f8}(0.0); Gray{N0f8}(0.0) Gray{N0f8}(0.0) … Gray{N0f8}(0.0) Gray{N0f8}(0.0); … ; Gray{N0f8}(0.0) Gray{N0f8}(0.0) … Gray{N0f8}(0.0) Gray{N0f8}(0.0); Gray{N0f8}(0.0) Gray{N0f8}(0.0) … Gray{N0f8}(0.0) Gray{N0f8}(0.0)], [Gray{N0f8}(0.0) Gray{N0f8}(0.0) … Gray{N0f8}(0.0) Gray{N0f8}(0.0); Gray{N0f8}(0.0) Gray{N0f8}(0.0) … Gray{N0f8}(0.0) Gray{N0f8}(0.0); 

## Creating the model

Now it is time to create our model. We will use **Chain()** and put our chosen layers there. You can chcek the full list of avilable layers [here](https://fluxml.ai/Flux.jl/stable/models/layers/).

But first lets define layers manually and see exactly what outputs they produce. First 4 layers are no mystery. We start with a convolution kernel 5x5 and we go from 1 channel to 8. Nest 3 layers are similar but with a kernel of 3x3. Then we do a GlobalMeanPool. This layer will take a mean from our 32 feature maps in each sample and return a vetor of 32 for each sample. But we can't feed it into thed ense layer yet becouse the dimentions won't mach. That is why we define layer 55 *(5.5)* which will simply drop the first 2 dimentions and produce a tensor of size 32xN. You can see it in the for loop.

In [141]:
layer1 = Conv((5, 5), 1=>8, pad=2, stride=2, relu) # Conv layer with a kernel of size 5x5, skipping every second pixel
layer2 = Conv((3, 3), 8=>16, pad=1, stride=2, relu)
layer3 = Conv((3, 3), 16=>32, pad=1, stride=2, relu)
layer4 = Conv((3, 3), 32=>32, pad=1, stride=2, relu)

layer5 = GlobalMeanPool() # Layer to take average from our 2x2 feature map

layer55 = x -> reshape(x, :, size(x, 4)) # Layer to drop 2 first singleton dimentions

layer6 = Dense(32, 10) # Standard Linear layer
layer7 = softmax

layer8 = onecold; # Reverse onehot encoding and get a vector with predictions

onecold (generic function with 4 methods)

In [142]:
layers = [layer1, layer2, layer3, layer4, layer5, layer55, layer6, layer7, layer8]
x = x_valid

for layer in layers
    println("Input size to layer: $layer = $(size(x))")
    x = layer(x)
    println("Output size from layer: $layer = $(size(x))\n")
end

Input size to layer: Conv((5, 5), 1=>8, relu) = (28, 28, 1, 10000)
Output size from layer: Conv((5, 5), 1=>8, relu) = (14, 14, 8, 10000)

Input size to layer: Conv((3, 3), 8=>16, relu) = (14, 14, 8, 10000)
Output size from layer: Conv((3, 3), 8=>16, relu) = (7, 7, 16, 10000)

Input size to layer: Conv((3, 3), 16=>32, relu) = (7, 7, 16, 10000)
Output size from layer: Conv((3, 3), 16=>32, relu) = (4, 4, 32, 10000)

Input size to layer: Conv((3, 3), 32=>32, relu) = (4, 4, 32, 10000)
Output size from layer: Conv((3, 3), 32=>32, relu) = (2, 2, 32, 10000)

Input size to layer: GlobalMeanPool() = (2, 2, 32, 10000)
Output size from layer: GlobalMeanPool() = (1, 1, 32, 10000)

Input size to layer: #15 = (1, 1, 32, 10000)
Output size from layer: #15 = (32, 10000)

Input size to layer: Dense(32, 10) = (32, 10000)
Output size from layer: Dense(32, 10) = (10, 10000)

Input size to layer: softmax = (10, 10000)
Output size from layer: softmax = (10, 10000)

Input size to layer: onecold = (10, 10000)


In [144]:
model = Chain(
    Conv((5, 5), 1=>8, pad=2, stride=2, relu), # 28x28 => 14x14
    Conv((3, 3), 8=>16, pad=1, stride=2, relu), # 14x14 => 7x7
    Conv((3, 3), 16=>32, pad=1, stride=2, relu), # 7x7 => 4x4
    Conv((3, 3), 32=>32, pad=1, stride=2, relu), # 4x4 => 2x2
    
    GlobalMeanPool(), # Average pooling on each width x height feature map

    x -> reshape(x, :, size(x, 4)),
    
    Dense(32, 10),
    softmax);

Chain(Conv((5, 5), 1=>8, relu), Conv((3, 3), 8=>16, relu), Conv((3, 3), 16=>32, relu), Conv((3, 3), 32=>32, relu), GlobalMeanPool(), #17, Dense(32, 10), softmax)

In [150]:
ŷ = model(x_train) # Getting predictions 
ŷ = onecold(ŷ) # Decoding predictions
println("Prediction of first image: $(ŷ[1])")

Prediction of first image: 6


We see  the model works but we haven't done any training yet so most likely it won't win any kaggle competitions just yet.

In [None]:
"""
    accuracy(ŷ, y)
Calculate the accuracy of onehot encoded model. Last dimention is the samples axis.

# Example
```
accuracy(model(x_train), y_train)
```
"""
accuracy(ŷ, y) = mean(onecold(ŷ) .== onecold(y))