# Simple Multilayer-Perceptron for MNIST classification

Is there any framework out there in which it is easier to to build and train a network 
as Knet and Helferlein?

In [1]:
using Knet
using NNHelferlein

### Get MNIST data from MLDatasets:

The data is already scaled to pixel values between 0.0 and 1.0
and the "0" is encoded as 10 (because in Julia we have no array-index 0):

In [2]:
xtrn, ytrn, xtst, ytst = dataset_mnist()
@show dtrn = minibatch(xtrn, ytrn, 128; xsize = (28*28,:))
@show dtst = minibatch(xtst, ytst, 128; xsize = (28*28,:));

dtrn = minibatch(xtrn, ytrn, 128; xsize = (28 * 28, :)) = 468-element Knet.Train20.Data{Tuple{CuArray{Float32}, Array{Int64}}}
dtst = minibatch(xtst, ytst, 128; xsize = (28 * 28, :)) = 78-element Knet.Train20.Data{Tuple{CuArray{Float32}, Array{Int64}}}


The minibatch includes 2-tuples of 784x128 matrix with the flattened pixel data and a 128 vector with the teaching input; i.e. the labels in a range 1-10.    
If a functional GPU is detected, the array type is `CuArray`, otherwise its a normal `Array`. 
Computations with CuArrays are performed on the GPU without need to care in the calling code!

Data looks like:

In [3]:
first(dtrn)[1]  # first minimatch:

784×128 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0

In [4]:
first(dtrn)[2]'  # labels of first minibatch:

1×128 adjoint(::Vector{Int64}) with eltype Int64:
 5  10  4  1  9  2  1  3  1  4  3  5  …  2  10  10  2  10  2  7  1  8  6  4

### Define the MLP with NNHelferlein types:

The wrapper type `Classifier` provides a signature with nll-loss 
(negative log-likelyhood; crossentropy for one-class classification tasks). 
For correct calculation of the nll, raw activations of the output-layer are 
needed (no activation function applied):

In [5]:
mlp = Classifier(Dense(784, 256),
                 Dense(256, 64), 
                 Dense(64, 10, actf=identity))

Classifier(Any[Dense(P(CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}(256,784)), P(CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}(256)), Knet.Ops20.sigm), Dense(P(CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}(64,256)), P(CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}(64)), Knet.Ops20.sigm), Dense(P(CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}(10,64)), P(CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}(10)), identity)], Knet.Ops20.nll)

In [6]:
summary(mlp)

NNHelferlein neural network of type Classifier:
 
  Dense layer 784 → 256 with sigm,                              200960 params
  Dense layer 256 → 64 with sigm,                                16448 params
  Dense layer 64 → 10 with identity,                               650 params
 
Total number of layers: 3
Total number of parameters: 218058


3

### Train with Tensorboard logger:

This runs in just some seconds on a GPU. 

Training curves can be visualised with TensorBoard, by pointing TensorBoard to the
specified log-directory:

In [7]:
mlp = tb_train!(mlp, Adam, dtrn, epochs=100, split=0.8,
        acc_fun=accuracy,
        eval_size=0.2, eval_freq=5, mb_loss_freq=100, 
        tb_name="mlp_run", tb_text="NNHelferlein example: MLP")

println("Test loss:           $(mlp(dtst))")
println("Test accuracy:       $(accuracy(mlp, data=dtst))");

Splitting dataset for training (80%) and validation (20%).
Training 100 epochs with 374 minibatches/epoch and 94 validation mbs.
Evaluation is performed every 75 minibatches with 19 mbs.
Watch the progress with TensorBoard at:
/home/andreas/Documents/Projekte/2022-NNHelferlein_KnetML/NNHelferlein/examples/logs/mlp_run/2022-12-21T16-11-23


[32mProgress: 100%|█████████████████████████████████████████| Time: 0:01:55[39m:24[39m


Training finished with:
Training loss:       6.223597e-6
Training accuracy:   1.0
Validation loss:     0.12577054
Validation accuracy: 0.9796376329787234
Test loss:           0.10543932
Test accuracy:       0.9814703525641025
