### Simple MNIST CNN using Flux
Julia version 1.1.0
Inspired by: https://github.com/FluxML/model-zoo/blob/master/vision/mnist/conv.jl

#### Load the MNIST images:

In [1]:
using Flux, Flux.Data.MNIST, Statistics
using Flux: onehotbatch, onecold, crossentropy, throttle
using Base.Iterators: repeated, partition
using StatsBase: countmap

train_imgs = Flux.Data.MNIST.images();

### Preprocess the data
Add one-hot labels and build the dataset as a list of tensors

In [3]:
labels = onehotbatch(MNIST.labels(), 0:9)
num_train_images = size(train_imgs)[1]

# Stratify the training set into batches
batch_size = 32
train_images = [(cat(float.(train_imgs[i])..., dims = 4), labels[:,i]) for i in partition(1:num_train_images, batch_size)];

The ellipse is called a 'splat' and it can be used in a function call or definition where the contents of a tuple list or argument can be separated into list.

In [4]:
num_test_images = 1000
test_labels_categorical = MNIST.labels(:test)[1:num_test_images]

train_images = gpu.(train_images)
test_images = cat(float.(MNIST.images(:test)[1:num_test_images])..., dims = 4) |> gpu
test_labels = onehotbatch(test_labels_categorical, 0:9) |> gpu
;

### Data Exploration

In [5]:
print("Number of images: "); println(num_train_images)
print("Image dimensions: "); println(size(train_imgs[1]))

Number of images: 60000
Image dimensions: (28, 28)


Show the class balance for the training and test sets:

In [6]:
sort(countmap(MNIST.labels()))

OrderedCollections.OrderedDict{Int64,Int64} with 10 entries:
  0 => 5923
  1 => 6742
  2 => 5958
  3 => 6131
  4 => 5842
  5 => 5421
  6 => 5918
  7 => 6265
  8 => 5851
  9 => 5949

In [7]:
sort(countmap(test_labels_categorical))

OrderedCollections.OrderedDict{Int64,Int64} with 10 entries:
  0 => 85
  1 => 126
  2 => 116
  3 => 107
  4 => 110
  5 => 87
  6 => 87
  7 => 99
  8 => 89
  9 => 94

### Build the model
Use categorical cross entropy loss, the ADAMs optimizer, and the generic accuracy metric

In [8]:
model = Chain(
        Conv((3, 3), 1=>32, relu),
        Conv((3, 3), 32=>32, relu),
        x -> maxpool(x, (2,2)),
        Conv((3, 3), 32=>16, relu),
        x -> maxpool(x, (2,2)),
        Conv((3, 3), 16=>10, relu),
        x -> reshape(x, :, size(x, 4)),
        Dropout(0.2),
        Dense(90, 10),
        softmax) |> gpu

first_batch = train_images[1][1]
model(first_batch)

loss(x, y) = crossentropy(model(x), y)
accuracy(x, y) = mean(onecold(model(x)) .== onecold(y))

evalcb = throttle(() -> @show(accuracy(test_images, test_labels)), 10)
optimizer = ADAM();

### Train the network

In [9]:
Flux.train!(loss, params(model), train_images, optimizer, cb = evalcb);

accuracy(test_images, test_labels) = 0.089
accuracy(test_images, test_labels) = 0.507
accuracy(test_images, test_labels) = 0.677
accuracy(test_images, test_labels) = 0.772
accuracy(test_images, test_labels) = 0.799
accuracy(test_images, test_labels) = 0.822
accuracy(test_images, test_labels) = 0.854
accuracy(test_images, test_labels) = 0.882
accuracy(test_images, test_labels) = 0.885
accuracy(test_images, test_labels) = 0.872
accuracy(test_images, test_labels) = 0.903
accuracy(test_images, test_labels) = 0.884
accuracy(test_images, test_labels) = 0.921
accuracy(test_images, test_labels) = 0.925
accuracy(test_images, test_labels) = 0.922
accuracy(test_images, test_labels) = 0.925
accuracy(test_images, test_labels) = 0.924
accuracy(test_images, test_labels) = 0.937
accuracy(test_images, test_labels) = 0.939
accuracy(test_images, test_labels) = 0.951
accuracy(test_images, test_labels) = 0.935
accuracy(test_images, test_labels) = 0.937
