# Simple Multilayer-Perceptron for MNIST classification

Is there any framework out there in which it is easier to to build and train a network 
as Knet and Helferlein?

In [1]:
using Knet
using NNHelferlein
using MLDatasets: MNIST

┌ Info: Precompiling NNHelferlein [b9e938e5-d80d-48a2-bb0e-6649b4a98aeb]
└ @ Base loading.jl:1423


### Get MNIST data from MLDatasets:

The data is already scaled to pixel values between 0.0 and 1.0.    
Only modification necessary is set the class number for the "0" to 10
(because in Julia we have no array-index 0):

In [2]:
mnist_dir = joinpath(NNHelferlein.DATA_DIR, "mnist")
xtrn,ytrn = MNIST.traindata(Float32, dir=mnist_dir)
ytrn[ytrn.==0] .= 10
@show dtrn = minibatch(xtrn, ytrn, 128; xsize = (28*28,:))

xtst,ytst = MNIST.testdata(Float32, dir=mnist_dir)
ytst[ytst.==0] .= 10
@show dtst = minibatch(xtst, ytst, 128; xsize = (28*28,:));

dtrn = minibatch(xtrn, ytrn, 128; xsize = (28 * 28, :)) = 468-element Knet.Train20.Data{Tuple{KnetArray{Float32}, Array{Int64}}}
dtst = minibatch(xtst, ytst, 128; xsize = (28 * 28, :)) = 78-element Knet.Train20.Data{Tuple{KnetArray{Float32}, Array{Int64}}}


The minibatch includes 2-tuples of 784x128 matrix with the flattened pixel data and a 128 vector with the teaching input; i.e. the labels in a range 1-10.    
If a functional GPU is detected, the array type is `KnetArray`, otherwise its a normal `Array`. 
Computations with KnetArrays are performed on the GPU without need to care in the calling code!

Data looks like:

In [3]:
first(dtrn)[1]  # first minimatch:

784×128 Knet.KnetArrays.KnetMatrix{Float32}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0

In [4]:
first(dtrn)[2]'  # labels of first minibatch:

1×128 adjoint(::Vector{Int64}) with eltype Int64:
 5  10  4  1  9  2  1  3  1  4  3  5  …  2  10  10  2  10  2  7  1  8  6  4

### Define the MLP with NNHelferlein types:

The wrapper type `Classifier` provides a signature with nll-loss 
(negative log-likelyhood; crossentropy for one-class classification tasks). 
For correct calculation of the nll, raw activations of the output-layer are 
needed (no activation function applied):

In [5]:
mlp = Classifier(Dense(784, 256),
                 Dense(256, 64), 
                 Dense(64, 10, actf=identity))

Classifier((Dense(P(Knet.KnetArrays.KnetMatrix{Float32}(256,784)), P(Knet.KnetArrays.KnetVector{Float32}(256)), Knet.Ops20.sigm), Dense(P(Knet.KnetArrays.KnetMatrix{Float32}(64,256)), P(Knet.KnetArrays.KnetVector{Float32}(64)), Knet.Ops20.sigm), Dense(P(Knet.KnetArrays.KnetMatrix{Float32}(10,64)), P(Knet.KnetArrays.KnetVector{Float32}(10)), identity)))

In [6]:
print_network(mlp)

Neural network summary:
Classifier with 3 layers,                                       218058 params
Details:
 
    Dense layer 784 → 256 with sigm,                            200960 params
    Dense layer 256 → 64 with sigm,                              16448 params
    Dense layer 64 → 10 with identity,                             650 params
 
Total number of layers: 3
Total number of parameters: 218058


3

### Train with Tensorboard logger:

This runs in just some seconds on a GPU. 

Training curves can be visualised with TensorBoard, by pointing TensorBoard to the
specified log-directory:

In [7]:
mlp = tb_train!(mlp, Adam, dtrn, epochs=10, split=0.8,
        acc_fun=accuracy,
        eval_size=0.2, eval_freq=5, mb_loss_freq=100, 
        tb_name="mlp_run", tb_text="NNHelferlein example: MLP")

println("Test loss:           $(mlp(dtst))")
println("Test accuracy:       $(accuracy(mlp, data=dtst))");

Splitting dataset for training (80%) and validation (20%).
Training 10 epochs with 374 minibatches/epoch and 94 validation mbs.
Evaluation is performed every 75 minibatches with 19 mbs.
Watch the progress with TensorBoard at:
/home/andreas/.julia/dev/NNHelferlein/examples/logs/mlp_run/2022-01-23T09-47-20


[32mProgress: 100%|█████████████████████████████████████████| Time: 0:00:18[39m


Training finished with:
Training loss:       0.05316837587179666
Training accuracy:   0.9854403409090909
Validation loss:     0.1047060098300906
Validation accuracy: 0.9670877659574468
Test loss:           0.088928714
Test accuracy:       0.9730568910256411
