# Simple MLP for a heavily unbalanced for MNIST classification

In order to test the tarining of a MLP with heavily unbalanced data, the MNIST dataset is restricted 
to the *ones* and only 10% of the *twos* 
(i.e. 90% of the training data are *one* and 10% are *two*).

A first normal MLP is trained with the unbalanced data and show the expected restult 
(i.e. only the major class *one* is learned).     
Training the same MLP with *focal_nll* loss instead of *nll* increases the influence of the minor class
and allows for training with unbalanced data.

In [1]:
using Knet
using NNHelferlein

### Get MNIST data from MLDatasets:
... and use only the `1` and 10% of the `2` for training:

In [2]:
xtrn, ytrn, xtst, ytst = dataset_mnist()

trn1 = ytrn .== 1
trn2 = (ytrn .== 2)
trn2 = [rand() < 0.10 ? i : false for i in trn2] 
trn_mask = trn1 .| trn2

tst1 = ytst .== 1
tst2 = ytst .== 2
tst_mask = tst1 .| tst2

println("Training instances for 1: $(sum(trn1))")
println("Training instances for 2: $(sum(trn2))")

Training instances for 1: 6742
Training instances for 2: 611


In [3]:
dtrn = minibatch(xtrn[:,:,trn_mask], ytrn[trn_mask], 128; xsize=(28*28,:))
dtst = minibatch(xtst[:,:,tst_mask], ytst[tst_mask], 128; xsize=(28*28,:))

16-element Knet.Train20.Data{Tuple{CuArray{Float32}, Array{Int64}}}

## Define the MLP with NNHelferlein types and default loss (NLL):

In [4]:
mlp = Classifier(Dense(28*28, 512),
                Dense(512, 256), 
                Dense(256, 64), 
                Dense(64,10, actf=identity)
        )

Classifier(Any[Dense(P(CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}(512,784)), P(CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}(512)), Knet.Ops20.sigm), Dense(P(CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}(256,512)), P(CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}(256)), Knet.Ops20.sigm), Dense(P(CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}(64,256)), P(CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}(64)), Knet.Ops20.sigm), Dense(P(CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}(10,64)), P(CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}(10)), identity)], Knet.Ops20.nll)

### Train:

In [5]:
tb_train!(mlp, Adam, dtrn, epochs=2,
        acc_fun=accuracy,
        tb_name="nll_loss_example", tensorboard=false)

println("Test loss:           $(mlp(dtst))")
println("Test accuracy:       $(accuracy(mlp, data=dtst))");

Training 2 epochs with 57 minibatches/epoch.
Evaluation is performed every 57 minibatches with 12 mbs.
TensorBoard logs are disabled!


[32mProgress: 100%|█████████████████████████████████████████| Time: 0:00:22[39m


Training finished with:
Training loss:       0.29593414
Training accuracy:   0.9168037280701754
Test loss:           1.0457104
Test accuracy:       0.52294921875


>90% accuracy seems not that bad at first glance - but let us look at the confusuion matrix:    
the MLP only learned one of the classes:

In [6]:
confusion_matrix(mlp, data=dtrn);

     "1"   "2"  "pred/true"
 6689     0     "1"
  607     0     "2"

## Define the MLP with focal loss (focal NLL):

In [7]:
mlp = Classifier(Dense(28*28, 512),
                Dense(512, 256), 
                Dense(256, 64), 
                Dense(64,10, actf=identity),
                loss=focal_nll
        );

### Train:

In [8]:
tb_train!(mlp, Adam, dtrn, epochs=2,
        acc_fun=accuracy,
        tb_name="focal_nll_example", tensorboard=false)

println("Test loss:           $(mlp(dtst))")
println("Test accuracy:       $(accuracy(mlp, data=dtst))");

Training 2 epochs with 57 minibatches/epoch.
Evaluation is performed every 57 minibatches with 12 mbs.
TensorBoard logs are disabled!


[32mProgress: 100%|█████████████████████████████████████████| Time: 0:00:07[39m


Training finished with:
Training loss:       0.009121515759293873
Training accuracy:   0.9901315789473685
Test loss:           0.02482672786559348
Test accuracy:       0.97216796875


Now the confusion matrix reveils a balanced training of both classes:

In [9]:
confusion_matrix(mlp, data=dtrn);

     "1"     "2"  "pred/true"
 6641      48     "1"
   24     583     "2"