# MATH2504 Semester 2, 2022 -  Project 3

## Task 2 - Basic ML on MNIST and FashionMNIST

Student name : Brandon Lowe <br />
Student ID : 43162950 <br />
<p><a href="https://github.com/Jaanlo/Brandon-Lowe-2504-2022-PROJECT3">GitHub Repo</a></p>

In [1]:
using Pkg
Pkg.activate(".")

include("src/mnist-classification/dependencies.jl"); # dependencies for task 2

[32m[1m  Activating[22m[39m project at `~/Documents/University/MATH2504/project-3/Brandon-Lowe-2504-2022-PROJECT3`


### Task 2.1 One vs. all (rest) Linear and Logistic

#### Linear Regression Models

The simple linear regression model is given by

$$ \hat{y} = \alpha + \beta x $$

Extending this to the MNIST dataset, the formula can be considered as,

$$ \hat{y}^{(i)} = \beta_0 + \sum_{j=1}^{784}\beta_j x_j^{(i)} $$

We'd like to find a "good" estimate $ \beta \in \mathbb{R}^{785} $ by minimizing the quadratic loss. We form a $ n \times p $ matrix $ A $ called a **design matrix** such that 

$$ A = \begin{bmatrix}
1 & x_1^{(1)} & x_2^{(1)} &  \ldots & x_p^{(1)} \\ 
1 & x_1^{(2)} & x_2^{(2)} &  \ldots & x_p^{(2)} \\
\vdots & \vdots & \vdots &  & \vdots \\
1 & x_1^{(n)} & x_2^{(n)} &  \ldots & x_p^{(n)} \end{bmatrix} $$

Using this notation, we can now express the linear model via

$$ \hat{y} = A\hat{\theta} $$

First, lets add some variables for ease of use later.

In [2]:
n_train, n_test = length(MNIST_train_labels), length(MNIST_test_labels);
train_labels, test_labels = MNIST_train_labels, MNIST_test_labels
test_imgs = MNIST_test_imgs

X = vcat([vec(MNIST_train_imgs[:,:,k])' for k in 1:n_train]...);

A = [ones(n_train) X];

We know that for linear regression our loss function can be represented as,

$$ L(\theta) = ||y - A\theta||^2 = (y - A\theta)^\top (y - A\theta) $$

with the gradient being,

$$ \nabla L(\theta) = -2A^\top y + 2A^\top A\theta $$

which has the unique solution when the matrix $ A^\top A $ is invertible, which we know here that A is

$$ \hat{\theta} = A^\dag y

In [3]:
MNIST_linear_acc = train_linear()
println("Accuracy of model for MNIST test images: ", MNIST_linear_acc)

Accuracy of model for MNIST test images: 0.8603


Looking now at the Fashion MNIST dataset, we use a similar approach to above for the linear regression model

In [4]:
class_names = FashionMNIST.classnames()

10-element Vector{String}:
 "T-Shirt"
 "Trouser"
 "Pullover"
 "Dress"
 "Coat"
 "Sandal"
 "Shirt"
 "Sneaker"
 "Bag"
 "Ankle boot"

So we know we have 10 classes in this dataset, similar to the MNIST dataset. Moving ahead with the linear regression model

In [5]:
n_train, n_test = length(fMNIST_train_labels), length(fMNIST_test_labels);
train_labels, test_labels = fMNIST_train_labels, fMNIST_test_labels
test_imgs = fMNIST_test_imgs

X = vcat([vec(fMNIST_train_imgs[:,:,k])' for k in 1:n_train]...);

A = [ones(n_train) X];

In [6]:
fMNIST_linear_acc = train_linear()
println("Accuracy of model for MNIST test images: ", fMNIST_linear_acc)

Accuracy of model for MNIST test images: 0.8113


#### Logistic Regression Models

Moving on to the logistic regression model. From <a href="https://deeplearningmath.org/">The Mathematical Engineering of Deep Learning</a> Chapter 3, we are given that the logistic regression model can be represented as, 

$$ \hat{y} = \sigma (b + w^\top x) \quad \textrm{where,} \\
\sigma(z) =\frac{1}{1 + e^{-z}} $$

We aim minimize the cross-entropy loss which is given by,

$$ 
L(w) = -\sum_{i=1}^{N}(y^{(i)}\log{\hat{y}^{(i)}} + (1- y^{(i)})\log{(1-\hat{y}^{(i)})}) \\
\nabla L(w) = X^\top (\hat{Y} - Y)
$$

Writing out a function using gradient descent specific for training logistic regression models. I chose to use the loss function from Flux.jl for simplicity. Explicit gradient term has been programmed as per project details. These can be found in `/mnist-classification/logistic-regression.jl`

Initialising the data for MNIST dataset to train model

In [7]:
n_train, n_test = length(MNIST_train_labels), length(MNIST_test_labels);

X = vcat([vec(MNIST_train_imgs[:,:,k])' for k in 1:n_train]...);
X_test = vcat([vec(MNIST_test_imgs[:,:,k])' for k in 1:n_test]...);
A = [ones(n_train) X];

train_labels = MNIST_train_labels;

Calling the train model function,

In [8]:
w = train_logistic(η=0.001);

Epoch = 1 (0.66 sec) Loss = 0.2085357315476932


Epoch = 2 (0.24 sec) Loss = 0.15744168722529453


Epoch = 3 (0.29 sec) Loss = 0.1366063471037671


Epoch = 4 (0.32 sec) Loss = 0.12440842067604052


Epoch = 5 (0.25 sec) Loss = 0.11610293517223355


Epoch = 6 (0.25 sec) Loss = 0.10997874547508622


Epoch = 7 (0.27 sec) Loss = 0.10523106867424734


Epoch = 8 (0.26 sec) Loss = 0.10141561082469965


Epoch = 9 (0.26 sec) Loss = 0.09826605185033173


Epoch = 10 (0.26 sec) Loss = 0.09561256862291026


Epoch = 11 (0.26 sec) Loss = 0.09334067468328755


Epoch = 12 (0.26 sec) Loss = 0.09136972199085368


Epoch = 13 (0.27 sec) Loss = 0.08964089817142187


Epoch = 14 (0.29 sec) Loss = 0.08811012253331892


Epoch = 15 (0.26 sec) Loss = 0.08674361141088945


Epoch = 16 (0.25 sec) Loss = 0.08551497880329456


Epoch = 17 (0.27 sec) Loss = 0.08440327734954925


Epoch = 18 (0.27 sec) Loss = 0.08339164331231777


Epoch = 19 (0.29 sec) Loss = 0.08246633905373343


Epoch = 20 (0.24 sec) Loss = 0.0816160606868035


Testing accuracy of model

In [9]:
T = [ones(n_test) X_test]
MNIST_logistic_acc = mean([logistic_classify(T'[:, k], w) for k in 1:n_test] .== MNIST_test_labels)

0.9035

Initialising the data for the Fashion MNIST dataset to train the model

In [10]:
n_train, n_test = length(fMNIST_train_labels), length(fMNIST_test_labels);

X = vcat([vec(fMNIST_train_imgs[:,:,k])' for k in 1:n_train]...);
X_test = vcat([vec(fMNIST_test_imgs[:,:,k])' for k in 1:n_test]...);
A = [ones(n_train) X]

train_labels = fMNIST_train_labels;

In [11]:
w = train_logistic(η=0.00037, n_epochs=100);

Epoch = 1 (0.27 sec) Loss = 0.3481510752147025


Epoch = 2 (0.3 sec) Loss = 0.274571224165549


Epoch = 3 (0.31 sec) Loss = 0.2415321441444385


Epoch = 4 (0.24 sec) Loss = 0.22163432203031005


Epoch = 5 (0.25 sec) Loss = 0.20764577476459456


Epoch = 6 (0.27 sec) Loss = 0.19688177367035947


Epoch = 7 (0.28 sec) Loss = 0.18821392803484874


Epoch = 8 (0.25 sec) Loss = 0.18104585754514801


Epoch = 9 (0.25 sec) Loss = 0.17500293269528266


Epoch = 10 (0.4 sec) Loss = 0.16983014505449692


Epoch = 11 (0.32 sec) Loss = 0.16534788601275996


Epoch = 12 (0.32 sec) Loss = 0.1614246712620103


Epoch = 13 (0.32 sec) Loss = 0.15796040843200762


Epoch = 14 (0.5 sec) Loss = 0.1548770105100023


Epoch = 15 (0.24 sec) Loss = 0.15211280471507488


Epoch = 16 (0.25 sec) Loss = 0.14961864643653713


Epoch = 17 (0.25 sec) Loss = 0.14735498443325878


Epoch = 18 (0.24 sec) Loss = 0.1452896447047561


Epoch = 19 (0.25 sec) Loss = 0.1433961821916812


Epoch = 20 (0.26 sec) Loss = 0.14165265622529172


Epoch = 21 (0.24 sec) Loss = 0.1400407112982488


Epoch = 22 (0.25 sec) Loss = 0.13854487726161907


Epoch = 23 (0.25 sec) Loss = 0.13715202947073515


Epoch = 24 (0.24 sec) Loss = 0.1358509676717609


Epoch = 25 (0.25 sec) Loss = 0.13463208458502762


Epoch = 26 (0.24 sec) Loss = 0.13348710318962217


Epoch = 27 (0.25 sec) Loss = 0.132408867201362


Epoch = 28 (0.25 sec) Loss = 0.13139117309365117


Epoch = 29 (0.24 sec) Loss = 0.13042863480554384


Epoch = 30 (0.24 sec) Loss = 0.1295165742817189


Epoch = 31 (0.24 sec) Loss = 0.12865093234449068


Epoch = 32 (0.25 sec) Loss = 0.1278281951846645


Epoch = 33 (0.24 sec) Loss = 0.12704533206246224


Epoch = 34 (0.24 sec) Loss = 0.12629973968526295


Epoch = 35 (0.25 sec) Loss = 0.1255891880148511


Epoch = 36 (0.24 sec) Loss = 0.12491176007430617


Epoch = 37 (0.24 sec) Loss = 0.12426577210490879


Epoch = 38 (0.25 sec) Loss = 0.12364964575226937


Epoch = 39 (0.24 sec) Loss = 0.12306168395040988


Epoch = 40 (0.25 sec) Loss = 0.12249972425536937


Epoch = 41 (0.25 sec) Loss = 0.12196082048281731


Epoch = 42 (0.24 sec) Loss = 0.1214413796243753


Epoch = 43 (0.25 sec) Loss = 0.12093794603006781


Epoch = 44 (0.24 sec) Loss = 0.12044779211850862


Epoch = 45 (0.24 sec) Loss = 0.11996793042017302


Epoch = 46 (0.25 sec) Loss = 0.11949215647072985


Epoch = 47 (0.24 sec) Loss = 0.11901195397646666


Epoch = 48 (0.24 sec) Loss = 0.11852965496825632


Epoch = 49 (0.25 sec) Loss = 0.11805456033426294


Epoch = 50 (0.25 sec) Loss = 0.11759065813718819


Epoch = 51 (0.24 sec) Loss = 0.11713910050950714


Epoch = 52 (0.25 sec) Loss = 0.11670052778950277


Epoch = 53 (0.25 sec) Loss = 0.11627587227130934


Epoch = 54 (0.24 sec) Loss = 0.11586589608895748


Epoch = 55 (0.27 sec) Loss = 0.11546935140372724


Epoch = 56 (0.24 sec) Loss = 0.11508172960858487


Epoch = 57 (0.24 sec) Loss = 0.11469880087156166


Epoch = 58 (0.25 sec) Loss = 0.11432016174016185


Epoch = 59 (0.25 sec) Loss = 0.11394735507620656


Epoch = 60 (0.24 sec) Loss = 0.11358167499539193


Epoch = 61 (0.25 sec) Loss = 0.11322382650509244


Epoch = 62 (0.24 sec) Loss = 0.11287412815300503


Epoch = 63 (0.25 sec) Loss = 0.11253267365941201


Epoch = 64 (0.24 sec) Loss = 0.11219941972594405


Epoch = 65 (0.25 sec) Loss = 0.11187423434538218


Epoch = 66 (0.24 sec) Loss = 0.11155692649666496


Epoch = 67 (0.24 sec) Loss = 0.11124726725301483


Epoch = 68 (0.25 sec) Loss = 0.11094500651259753


Epoch = 69 (0.25 sec) Loss = 0.11064988647661492


Epoch = 70 (0.24 sec) Loss = 0.11036165173310175


Epoch = 71 (0.25 sec) Loss = 0.11008005577463574


Epoch = 72 (0.25 sec) Loss = 0.10980486429875284


Epoch = 73 (0.25 sec) Loss = 0.10953585610786501


Epoch = 74 (0.25 sec) Loss = 0.10927282256239598


Epoch = 75 (0.25 sec) Loss = 0.10901556638470626


Epoch = 76 (0.25 sec) Loss = 0.10876390033381553


Epoch = 77 (0.24 sec) Loss = 0.1085176460147921


Epoch = 78 (0.24 sec) Loss = 0.10827663291245584


Epoch = 79 (0.24 sec) Loss = 0.1080406976461151


Epoch = 80 (0.25 sec) Loss = 0.10780968340505706


Epoch = 81 (0.25 sec) Loss = 0.1075834395180066


Epoch = 82 (0.24 sec) Loss = 0.10736182111609203


Epoch = 83 (0.25 sec) Loss = 0.10714468885868908


Epoch = 84 (0.25 sec) Loss = 0.10693190870055094


Epoch = 85 (0.25 sec) Loss = 0.10672335168574898


Epoch = 86 (0.25 sec) Loss = 0.1065188937590126


Epoch = 87 (0.24 sec) Loss = 0.10631841558854013


Epoch = 88 (0.24 sec) Loss = 0.10612180239664284


Epoch = 89 (0.24 sec) Loss = 0.10592894379607257


Epoch = 90 (0.24 sec) Loss = 0.10573973363083086


Epoch = 91 (0.24 sec) Loss = 0.10555406982083058


Epoch = 92 (0.24 sec) Loss = 0.10537185421012174


Epoch = 93 (0.25 sec) Loss = 0.10519299241859319


Epoch = 94 (0.25 sec) Loss = 0.10501739369712712


Epoch = 95 (0.24 sec) Loss = 0.10484497078620779


Epoch = 96 (0.24 sec) Loss = 0.10467563977796902


Epoch = 97 (0.24 sec) Loss = 0.10450931998161149


Epoch = 98 (0.25 sec) Loss = 0.10434593379207079


Epoch = 99 (0.24 sec) Loss = 0.1041854065617507


Epoch = 100 (0.24 sec) Loss = 0.10402766647508396


Testing the accuracy,

In [12]:
T = [ones(n_test) X_test]
fMNIST_logistic_acc = mean([logistic_classify(T'[:, k], w) for k in 1:n_test] .== fMNIST_test_labels)

0.8139

#### Final Results

So overall, we have the following accuracy results for linear regression and logistic regression for MNIST and Fashion MNIST datasets, respectively

In [13]:
println("Accuracy for MNIST linear regression model: $MNIST_linear_acc")
println("Accuracy for Fashion MNIST linear regression model: $fMNIST_linear_acc")
println("Accuracy for MNIST logistic regression model: $MNIST_logistic_acc")
println("Accuracy for Fashion  MNIST logistic regression model: $fMNIST_logistic_acc")

Accuracy for MNIST linear regression model: 0.8603
Accuracy for Fashion MNIST linear regression model: 0.8113
Accuracy for MNIST logistic regression model: 0.9035
Accuracy for Fashion  MNIST logistic regression model: 0.8139


### Task 2.2 One vs. One Linear and Logisitic

Task 2 is unfinished in current state. Currently I was experimenting with only one of the 45 different models. It works (at least I think so) for that one model, but I have yet to figure out a clean (and not RAM hungry) method of building all 45 models. Every time I try my IDE crashes -_-

#### MNIST Dataset

In [15]:
# Initialising img training sets
zero_imgs = MNIST_train_imgs[:,:,MNIST_train_labels .== 0];
one_imgs = MNIST_train_imgs[:,:,MNIST_train_labels .== 1];
# two_imgs = MNIST_train_imgs[:,:,MNIST_train_labels .== 2]
# three_imgs = MNIST_train_imgs[:,:,MNIST_train_labels .== 3]
# four_imgs = MNIST_train_imgs[:,:,MNIST_train_labels .== 4]
# five_imgs = MNIST_train_imgs[:,:,MNIST_train_labels .== 5]
# six_imgs = MNIST_train_imgs[:,:,MNIST_train_labels .== 6]
# seven_imgs = MNIST_train_imgs[:,:,MNIST_train_labels .== 7]
# eight_imgs = MNIST_train_imgs[:,:,MNIST_train_labels .== 8]
# nine_imgs = MNIST_train_imgs[:,:,MNIST_train_labels .== 9];

28×28×6742 Array{Float64, 3}:
[:, :, 1] =
 0.0  0.0  0.0  0.0  0.0  0.0       …  0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0          0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0          0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0          0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0          0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0       …  0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0          0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0          0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0          0.25098   0.0941176  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0          0.984314  0.756863   0.0  0.0  0.0
 ⋮                        ⋮         ⋱                       ⋮         
 0.0  0.0  0.0  0.0  0.0  0.992157     0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  1.0       …  0.0       0.0        0.0  0.0  0.0

In [16]:
# Initialising img training set counts
n_zero_train = last(size(zero_imgs))
n_one_train = last(size(one_imgs));
# n_two_train = last(size(two_imgs))
# n_three_train = last(size(three_imgs))
# n_four_train = last(size(four_imgs))
# n_five_train = last(size(five_imgs))
# n_six_train = last(size(six_imgs))
# n_seven_train = last(size(seven_imgs))
# n_eight_train = last(size(eight_imgs))
# n_nine_train = last(size(nine_imgs));

6742

In [18]:
# Initialising img as vectors
zero_imgs_as_vectors = vcat([vec(zero_imgs[:,:,k])' for k in 1:n_zero_train]...);
one_imgs_as_vectors = vcat([vec(one_imgs[:,:,k])' for k in 1:n_one_train]...);
# two_imgs_as_vectors = vcat([vec(two_imgs[:,:,k])' for k in 1:n_two_train]...)
# three_imgs_as_vectors = vcat([vec(three_imgs[:,:,k])' for k in 1:n_three_train]...)
# four_imgs_as_vectors = vcat([vec(four_imgs[:,:,k])' for k in 1:n_four_train]...)
# five_imgs_as_vectors = vcat([vec(five_imgs[:,:,k])' for k in 1:n_five_train]...)
# six_imgs_as_vectors = vcat([vec(six_imgs[:,:,k])' for k in 1:n_six_train]...)
# seven_imgs_as_vectors = vcat([vec(seven_imgs[:,:,k])' for k in 1:n_seven_train]...)
# eight_imgs_as_vectors = vcat([vec(eight_imgs[:,:,k])' for k in 1:n_eight_train]...)
# nine_imgs_as_vectors = vcat([vec(nine_imgs[:,:,k])' for k in 1:n_nine_train]...);

6742×784 Matrix{Float64}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 ⋮                        ⋮              ⋱                 ⋮              
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  

Forming the training data classes to classify by different classes.

In [19]:
# vcat of One vs One vectors -- zero && others
train_data_zero_one_class = vcat(zero_imgs_as_vectors, one_imgs_as_vectors);
# train_data_zero_two_class = vcat(zero_imgs_as_vectors, two_imgs_as_vectors)
# train_data_zero_three_class = vcat(zero_imgs_as_vectors, three_imgs_as_vectors)
# train_data_zero_four_class = vcat(zero_imgs_as_vectors, four_imgs_as_vectors)
# train_data_zero_five_class = vcat(zero_imgs_as_vectors, five_imgs_as_vectors)
# train_data_zero_six_class = vcat(zero_imgs_as_vectors, six_imgs_as_vectors)
# train_data_zero_seven_class = vcat(zero_imgs_as_vectors, seven_imgs_as_vectors)
# train_data_zero_eight_class = vcat(zero_imgs_as_vectors, eight_imgs_as_vectors)
# train_data_zero_nine_class = vcat(zero_imgs_as_vectors, nine_imgs_as_vectors)

# # vcat of One vs One vectors -- one && others
# train_data_one_two_class = vcat(one_imgs_as_vectors, two_imgs_as_vectors)
# train_data_one_three_class = vcat(one_imgs_as_vectors, three_imgs_as_vectors)
# train_data_one_four_class = vcat(one_imgs_as_vectors, four_imgs_as_vectors)
# train_data_one_five_class = vcat(one_imgs_as_vectors, five_imgs_as_vectors)
# train_data_one_six_class = vcat(one_imgs_as_vectors, six_imgs_as_vectors)
# train_data_one_seven_class = vcat(one_imgs_as_vectors, seven_imgs_as_vectors)
# train_data_one_eight_class = vcat(one_imgs_as_vectors, eight_imgs_as_vectors)
# train_data_one_nine_class = vcat(one_imgs_as_vectors, nine_imgs_as_vectors)

# # vcat of One vs One vectors -- two && others
# train_data_two_three_class = vcat(two_imgs_as_vectors, three_imgs_as_vectors)
# train_data_two_four_class = vcat(two_imgs_as_vectors, four_imgs_as_vectors)
# train_data_two_five_class = vcat(two_imgs_as_vectors, five_imgs_as_vectors)
# train_data_two_six_class = vcat(two_imgs_as_vectors, six_imgs_as_vectors)
# train_data_two_seven_class = vcat(two_imgs_as_vectors, seven_imgs_as_vectors)
# train_data_two_eight_class = vcat(two_imgs_as_vectors, eight_imgs_as_vectors)
# train_data_two_nine_class = vcat(two_imgs_as_vectors, nine_imgs_as_vectors)

# # vcat of One vs One vectors -- three && others
# train_data_three_four_class = vcat(three_imgs_as_vectors, four_imgs_as_vectors)
# train_data_three_five_class = vcat(three_imgs_as_vectors, five_imgs_as_vectors)
# train_data_three_six_class = vcat(three_imgs_as_vectors, six_imgs_as_vectors)
# train_data_three_seven_class = vcat(three_imgs_as_vectors, seven_imgs_as_vectors)
# train_data_three_eight_class = vcat(three_imgs_as_vectors, eight_imgs_as_vectors)
# train_data_three_nine_class = vcat(three_imgs_as_vectors, nine_imgs_as_vectors)

# # vcat of One vs One vectors -- four && others
# train_data_four_five_class = vcat(four_imgs_as_vectors, five_imgs_as_vectors)
# train_data_four_six_class = vcat(four_imgs_as_vectors, six_imgs_as_vectors)
# train_data_four_seven_class = vcat(four_imgs_as_vectors, seven_imgs_as_vectors)
# train_data_four_eight_class = vcat(four_imgs_as_vectors, eight_imgs_as_vectors)
# train_data_four_nine_class = vcat(four_imgs_as_vectors, nine_imgs_as_vectors)

# # vcat of One vs One vectors -- five && others
# train_data_five_six_class = vcat(five_imgs_as_vectors, six_imgs_as_vectors)
# train_data_five_seven_class = vcat(five_imgs_as_vectors, seven_imgs_as_vectors)
# train_data_five_eight_class = vcat(five_imgs_as_vectors, eight_imgs_as_vectors)
# train_data_five_nine_class = vcat(five_imgs_as_vectors, nine_imgs_as_vectors)

# # vcat of One vs One vectors -- six && others
# train_data_six_seven_class = vcat(six_imgs_as_vectors, seven_imgs_as_vectors)
# train_data_six_eight_class = vcat(six_imgs_as_vectors, eight_imgs_as_vectors)
# train_data_six_nine_class = vcat(six_imgs_as_vectors, nine_imgs_as_vectors)

# # vcat of One vs One vectors -- seven && others
# train_data_seven_eight_class = vcat(seven_imgs_as_vectors, eight_imgs_as_vectors)
# train_data_seven_nine_class = vcat(seven_imgs_as_vectors, nine_imgs_as_vectors)

# # vcat of One vs One vectors -- eight && others
# train_data_eight_nine_class = vcat(eight_imgs_as_vectors, nine_imgs_as_vectors);

12665×784 Matrix{Float64}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 ⋮                        ⋮              ⋱                 ⋮              
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0 

Need to make labels for the classes formed above. Lables are just 1's and 0's for the two different classes we're classifying by

In [20]:
# labels of One vs One vectors -- zero && others
train_labels_zero_one_class = vcat(zeros(n_zero_train), ones(n_one_train));
# train_labels_zero_two_class = vcat(zeros(n_zero_train), ones(n_two_train))
# train_labels_zero_three_class = vcat(zeros(n_zero_train), ones(n_three_train))
# train_labels_zero_four_class = vcat(zeros(n_zero_train), ones(n_four_train))
# train_labels_zero_five_class = vcat(zeros(n_zero_train), ones(n_five_train))
# train_labels_zero_six_class = vcat(zeros(n_zero_train), ones(n_six_train))
# train_labels_zero_seven_class = vcat(zeros(n_zero_train), ones(n_seven_train))
# train_labels_zero_eight_class = vcat(zeros(n_zero_train), ones(n_eight_train))
# train_labels_zero_nine_class = vcat(zeros(n_zero_train), ones(n_nine_train))

# # labels of One vs One vectors -- one && others
# train_labels_one_two_class = vcat(zeros(n_one_train), ones(n_two_train))
# train_labels_one_three_class = vcat(zeros(n_one_train), ones(n_three_train))
# train_labels_one_four_class = vcat(zeros(n_one_train), ones(n_four_train))
# train_labels_one_five_class = vcat(zeros(n_one_train), ones(n_five_train))
# train_labels_one_six_class = vcat(zeros(n_one_train), ones(n_six_train))
# train_labels_one_seven_class = vcat(zeros(n_one_train), ones(n_seven_train))
# train_labels_one_eight_class = vcat(zeros(n_one_train), ones(n_eight_train))
# train_labels_one_nine_class = vcat(zeros(n_one_train), ones(n_nine_train))

# # labels of One vs One vectors -- two && others
# train_labels_two_three_class = vcat(zeros(n_two_train), ones(n_three_train))
# train_labels_two_four_class = vcat(zeros(n_two_train), ones(n_four_train))
# train_labels_two_five_class = vcat(zeros(n_two_train), ones(n_five_train))
# train_labels_two_six_class = vcat(zeros(n_two_train), ones(n_six_train))
# train_labels_two_seven_class = vcat(zeros(n_two_train), ones(n_seven_train))
# train_labels_two_eight_class = vcat(zeros(n_two_train), ones(n_eight_train))
# train_labels_two_nine_class = vcat(zeros(n_two_train), ones(n_nine_train))

# # labels of One vs One vectors -- three && others
# train_labels_three_four_class = vcat(zeros(n_three_train), ones(n_four_train))
# train_labels_three_five_class = vcat(zeros(n_three_train), ones(n_five_train))
# train_labels_three_six_class = vcat(zeros(n_three_train), ones(n_six_train))
# train_labels_three_seven_class = vcat(zeros(n_three_train), ones(n_seven_train))
# train_labels_three_eight_class = vcat(zeros(n_three_train), ones(n_eight_train))
# train_labels_three_nine_class = vcat(zeros(n_three_train), ones(n_nine_train))

# # labels of One vs One vectors -- four && others
# train_labels_four_five_class = vcat(zeros(n_four_train), ones(n_five_train))
# train_labels_four_six_class = vcat(zeros(n_four_train), ones(n_six_train))
# train_labels_four_seven_class = vcat(zeros(n_four_train), ones(n_seven_train))
# train_labels_four_eight_class = vcat(zeros(n_four_train), ones(n_eight_train))
# train_labels_four_nine_class = vcat(zeros(n_four_train), ones(n_nine_train))

# # labels of One vs One vectors -- five && others
# train_labels_five_six_class = vcat(zeros(n_five_train), ones(n_six_train))
# train_labels_five_seven_class = vcat(zeros(n_five_train), ones(n_seven_train))
# train_labels_five_eight_class = vcat(zeros(n_five_train), ones(n_eight_train))
# train_labels_five_nine_class = vcat(zeros(n_five_train), ones(n_nine_train))

# # labels of One vs One vectors -- six && others
# train_labels_six_seven_class = vcat(zeros(n_six_train), ones(n_seven_train))
# train_labels_six_eight_class = vcat(zeros(n_six_train), ones(n_eight_train))
# train_labels_six_nine_class = vcat(zeros(n_six_train), ones(n_nine_train))

# # labels of One vs One vectors -- seven && others
# train_labels_seven_eight_class = vcat(zeros(n_seven_train), ones(n_eight_train))
# train_labels_seven_nine_class = vcat(zeros(n_seven_train), ones(n_nine_train))

# # labels of One vs One vectors -- eight && others
# train_labels_eight_nine_class = vcat(zeros(n_eight_train), ones(n_nine_train));

12665-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 ⋮
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0

#### Logistic Regression

In [21]:
w = ovo_train_logistic(train_data_zero_one_class, train_labels_zero_one_class)

first_zero_vector = zero_imgs_as_vectors[1,:]
first_one_vector = one_imgs_as_vectors[1,:]
logistic_predict(first_one_vector, w') # works on test data set, wooo!

Epoch = 1 (0.6 sec) Loss = 16.856419978187223


Epoch = 2 (0.53 sec) Loss = 0.2889672616554


Epoch = 3 (0.53 sec) Loss = 0.18498519307489092


1-element Vector{Float64}:
 1.0

In [22]:
test_one_imgs = MNIST_test_imgs[:,:,MNIST_test_labels .== 1]
test_one_imgs_as_vectors = vcat([vec(test_one_imgs[:,:,k])' for k in 1:last(size(test_one_imgs))]...)

mean(logistic_classifier(test_one_imgs_as_vectors', w'))

0.9955947136563876

#### Fashion MNIST Dataset

In [30]:
FashionMNIST.classnames()

10-element Vector{String}:
 "T-Shirt"
 "Trouser"
 "Pullover"
 "Dress"
 "Coat"
 "Sandal"
 "Shirt"
 "Sneaker"
 "Bag"
 "Ankle boot"

### Task 2.3 Multi-class Classifier (logistic softmax)

#### Logistic Softmax Regression

This task focusses on logistic softmax regression. From lectures we know that the softmax logistic regression is,

$$ \hat{y} = S_{\textrm{softmax}}(b + Wx) $$

where the softmax function is,

$$ S_{\textrm{softmax}}(z) = \frac{1}{\sum_{i=1}^{K}e^{z_{i}}}\begin{bmatrix} e^{z_{1}} \\ \vdots \\ e^{z_{K}} \end{bmatrix} $$

Using Cross Entropy loss once more, this time with Softmax gives us the following,

$$ C(w) = -\sum_{i=1}^{N}ylog{\hat{y}} $$

This means that the gradient is,

$$ \nabla C(w) = (\hat{y} - y)x $$

all functions for logistic softmax regression can be found in `/mnist-classification/logistic-softmax-regression.jl`

In [23]:
n_train, n_test = length(MNIST_train_labels), length(MNIST_test_labels)

X = vcat([vec(MNIST_train_imgs[:,:,k])' for k in 1:n_train]...);
X_test = vcat([vec(MNIST_test_imgs[:,:,k])' for k in 1:n_test]...);
A = [ones(n_train) X]

train_labels, test_labels = MNIST_train_labels, MNIST_test_labels;

In [24]:
W = train_softmax_logistic(η=0.0025, n_epochs=40);

Epoch = 1 (0.74 sec) Loss = 0.8589541136761422


Epoch = 2 (0.31 sec) Loss = 0.6638230898520099


Epoch = 3 (0.3 sec) Loss = 0.5761267021479102


Epoch = 4 (0.3 sec) Loss = 0.5244914012414507


Epoch = 5 (0.31 sec) Loss = 0.4894022871270895


Epoch = 6 (0.31 sec) Loss = 0.4638410504966134


Epoch = 7 (0.53 sec) Loss = 0.44392259529455


Epoch = 8 (0.24 sec) Loss = 0.427918929445374


Epoch = 9 (0.23 sec) Loss = 0.41460106272722763


Epoch = 10 (0.3 sec) Loss = 0.40328080585328735


Epoch = 11 (0.24 sec) Loss = 0.39345923777247804


Epoch = 12 (0.23 sec) Loss = 0.38481967793027083


Epoch = 13 (0.27 sec) Loss = 0.37712721423076984


Epoch = 14 (0.25 sec) Loss = 0.3701947356026861


Epoch = 15 (0.24 sec) Loss = 0.3638914771135347


Epoch = 16 (0.25 sec) Loss = 0.3581172173529608


Epoch = 17 (0.24 sec) Loss = 0.35279579206130707


Epoch = 18 (0.23 sec) Loss = 0.3478723941946681


Epoch = 19 (0.25 sec) Loss = 0.34330007393955564


Epoch = 20 (0.25 sec) Loss = 0.3390427258044317


Epoch = 21 (0.26 sec) Loss = 0.33507093128027426


Epoch = 22 (0.23 sec) Loss = 0.3313571722215263


Epoch = 23 (0.23 sec) Loss = 0.3278839952228694


Epoch = 24 (0.24 sec) Loss = 0.32462610014438253


Epoch = 25 (0.23 sec) Loss = 0.3215727735001224


Epoch = 26 (0.23 sec) Loss = 0.31870348303797486


Epoch = 27 (0.29 sec) Loss = 0.31599300770106326


Epoch = 28 (0.23 sec) Loss = 0.3134836825706334


Epoch = 29 (0.24 sec) Loss = 0.3109733178153201


Epoch = 30 (0.25 sec) Loss = 0.30906871156945015


Epoch = 31 (0.24 sec) Loss = 0.30614284051106055


Epoch = 32 (0.27 sec) Loss = 0.30989047088123395


Epoch = 33 (0.23 sec) Loss = 0.3029336486181426


Epoch = 34 (0.23 sec) Loss = 0.303547574712528


Epoch = 35 (0.25 sec) Loss = 0.29887335637868795


Epoch = 36 (0.23 sec) Loss = 0.30205192572694556


Epoch = 37 (0.25 sec) Loss = 0.29618326345100765


Epoch = 38 (0.27 sec) Loss = 0.29996993410544964


Epoch = 39 (0.23 sec) Loss = 0.2936528126021526


Epoch = 40 (0.23 sec) Loss = 0.29311215940762153


In [25]:
MNIST_softmax_acc = mean([logistic_sofmax_classifier(X_test'[:,k], W) for k in 1:n_test] .== test_labels)

0.9168

In [26]:
n_train, n_test = length(fMNIST_train_labels), length(fMNIST_test_labels)

X = vcat([vec(fMNIST_train_imgs[:,:,k])' for k in 1:n_train]...);
X_test = vcat([vec(fMNIST_test_imgs[:,:,k])' for k in 1:n_test]...);
A = [ones(n_train) X]

train_labels, test_labels = fMNIST_train_labels, fMNIST_test_labels;

In [27]:
W = train_softmax_logistic(n_epochs = 100, η=0.00029)
fMNIST_softmax_acc = mean([logistic_sofmax_classifier(X_test'[:,k], W) for k in 1:n_test] .== test_labels)

Epoch = 1 (0.24 sec) Loss = 2.4676640377588863


Epoch = 2 (0.27 sec) Loss = 1.9331564143777646


Epoch = 3 (0.24 sec) Loss = 1.6817431979172146


Epoch = 4 (0.29 sec) Loss = 1.524183000184793


Epoch = 5 (0.31 sec) Loss = 1.413533981299424


Epoch = 6 (0.26 sec) Loss = 1.3296400246016409


Epoch = 7 (0.23 sec) Loss = 1.2631365253224183


Epoch = 8 (0.24 sec) Loss = 1.2092487160781569


Epoch = 9 (0.43 sec) Loss = 1.1644408422344374


Epoch = 10 (0.31 sec) Loss = 1.1261244137782562


Epoch = 11 (0.3 sec) Loss = 1.0926217423197022


Epoch = 12 (0.3 sec) Loss = 1.0628664191812471


Epoch = 13 (0.51 sec) Loss = 1.0361536974152157


Epoch = 14 (0.23 sec) Loss = 1.0119877995548272


Epoch = 15 (0.25 sec) Loss = 0.9900017545255059


Epoch = 16 (0.24 sec) Loss = 0.9699147815681457


Epoch = 17 (0.24 sec) Loss = 0.9515052940674795


Epoch = 18 (0.24 sec) Loss = 0.9345916059591892


Epoch = 19 (0.24 sec) Loss = 0.9190184420145636


Epoch = 20 (0.25 sec) Loss = 0.9046483238404551


Epoch = 21 (0.24 sec) Loss = 0.8913569795074036


Epoch = 22 (0.25 sec) Loss = 0.8790315937653499


Epoch = 23 (0.24 sec) Loss = 0.8675703557673793


Epoch = 24 (0.24 sec) Loss = 0.856882192807544


Epoch = 25 (0.24 sec) Loss = 0.8468863071980656


Epoch = 26 (0.23 sec) Loss = 0.8375115059685334


Epoch = 27 (0.23 sec) Loss = 0.8286953900169659


Epoch = 28 (0.24 sec) Loss = 0.8203834707941748


Epoch = 29 (0.24 sec) Loss = 0.8125282726128032


Epoch = 30 (0.23 sec) Loss = 0.805088462500861


Epoch = 31 (0.24 sec) Loss = 0.798028034308544


Epoch = 32 (0.24 sec) Loss = 0.7913155625527034


Epoch = 33 (0.23 sec) Loss = 0.7849235329501144


Epoch = 34 (0.24 sec) Loss = 0.7788277499778988


Epoch = 35 (0.24 sec) Loss = 0.7730068176418184


Epoch = 36 (0.24 sec) Loss = 0.7674416883115251


Epoch = 37 (0.23 sec) Loss = 0.762115275233188


Epoch = 38 (0.23 sec) Loss = 0.7570121255333062


Epoch = 39 (0.24 sec) Loss = 0.7521181507835638


Epoch = 40 (0.25 sec) Loss = 0.7474204111495663


Epoch = 41 (0.24 sec) Loss = 0.7429069474136576


Epoch = 42 (0.23 sec) Loss = 0.7385666536412148


Epoch = 43 (0.24 sec) Loss = 0.7343891825121257


Epoch = 44 (0.23 sec) Loss = 0.7303648754373882


Epoch = 45 (0.24 sec) Loss = 0.7264847103060147


Epoch = 46 (0.24 sec) Loss = 0.7227402607920265


Epoch = 47 (0.24 sec) Loss = 0.7191236623968102


Epoch = 48 (0.23 sec) Loss = 0.7156275816814405


Epoch = 49 (0.23 sec) Loss = 0.7122451863654962


Epoch = 50 (0.24 sec) Loss = 0.7089701150555558


Epoch = 51 (0.23 sec) Loss = 0.7057964462534779


Epoch = 52 (0.24 sec) Loss = 0.7027186669393959


Epoch = 53 (0.24 sec) Loss = 0.6997316414137242


Epoch = 54 (0.23 sec) Loss = 0.6968305812327164


Epoch = 55 (0.23 sec) Loss = 0.6940110170243091


Epoch = 56 (0.24 sec) Loss = 0.6912687727833008


Epoch = 57 (0.23 sec) Loss = 0.6885999429824827


Epoch = 58 (0.23 sec) Loss = 0.6860008725613722


Epoch = 59 (0.23 sec) Loss = 0.6834681396166055


Epoch = 60 (0.23 sec) Loss = 0.6809985404495467


Epoch = 61 (0.24 sec) Loss = 0.6785890765386149


Epoch = 62 (0.23 sec) Loss = 0.6762369429902043


Epoch = 63 (0.24 sec) Loss = 0.673939518064852


Epoch = 64 (0.24 sec) Loss = 0.6716943534518875


Epoch = 65 (0.23 sec) Loss = 0.6694991650544848


Epoch = 66 (0.24 sec) Loss = 0.6673518241313837


Epoch = 67 (0.23 sec) Loss = 0.6652503487114023


Epoch = 68 (0.24 sec) Loss = 0.6631928952480283


Epoch = 69 (0.23 sec) Loss = 0.6611777505141823


Epoch = 70 (0.25 sec) Loss = 0.6592033237545071


Epoch = 71 (0.24 sec) Loss = 0.6572681391183237


Epoch = 72 (0.23 sec) Loss = 0.6553708283946102


Epoch = 73 (0.23 sec) Loss = 0.6535101240644632


Epoch = 74 (0.23 sec) Loss = 0.6516848526789935


Epoch = 75 (0.23 sec) Loss = 0.6498939285631921


Epoch = 76 (0.22 sec) Loss = 0.6481363478399743


Epoch = 77 (0.17 sec) Loss = 0.6464111827637927


Epoch = 78 (0.17 sec) Loss = 0.644717576350015


Epoch = 79 (0.17 sec) Loss = 0.6430547372844623


Epoch = 80 (0.17 sec) Loss = 0.6414219350969341


Epoch = 81 (0.16 sec) Loss = 0.6398184955828135


Epoch = 82 (0.17 sec) Loss = 0.6382437964577062


Epoch = 83 (0.17 sec) Loss = 0.6366972632312936


Epoch = 84 (0.17 sec) Loss = 0.6351783652879043


Epoch = 85 (0.17 sec) Loss = 0.6336866121626853


Epoch = 86 (0.17 sec) Loss = 0.6322215500035219


Epoch = 87 (0.17 sec) Loss = 0.6307827582099882


Epoch = 88 (0.17 sec) Loss = 0.6293698462416081


Epoch = 89 (0.17 sec) Loss = 0.6279824505885252


Epoch = 90 (0.17 sec) Loss = 0.626620231898374


Epoch = 91 (0.17 sec) Loss = 0.6252828722537084


Epoch = 92 (0.17 sec) Loss = 0.6239700725947827


Epoch = 93 (0.17 sec) Loss = 0.6226815502828522


Epoch = 94 (0.17 sec) Loss = 0.6214170367994222


Epoch = 95 (0.17 sec) Loss = 0.6201762755770908


Epoch = 96 (0.17 sec) Loss = 0.6189590199577646


Epoch = 97 (0.18 sec) Loss = 0.6177650312741355


Epoch = 98 (0.17 sec) Loss = 0.6165940770503876


Epoch = 99 (0.17 sec) Loss = 0.6154459293181461


Epoch = 100 (0.17 sec) Loss = 0.6143203630437892


0.8079

#### Final Results

Final accuracy after explicitly solving for the gradient function in the gradient descent training algorithms

In [28]:
@show MNIST_softmax_acc
@show fMNIST_softmax_acc;

MNIST_softmax_acc = 0.9168
fMNIST_softmax_acc = 0.8079


### Task 2.4 Comparison of results and discussion

Benchmarking regression models!!

In [29]:
n_train, n_test = length(MNIST_train_labels), length(MNIST_test_labels);
train_labels, test_labels = MNIST_train_labels, MNIST_test_labels
test_imgs = MNIST_test_imgs

X = vcat([vec(MNIST_train_imgs[:,:,k])' for k in 1:n_train]...);

A = [ones(n_train) X];

print("Runtime for MNIST linear regression model is: ")
@btime train_linear()

n_train, n_test = length(fMNIST_train_labels), length(fMNIST_test_labels);
train_labels, test_labels = fMNIST_train_labels, fMNIST_test_labels
test_imgs = fMNIST_test_imgs

X = vcat([vec(fMNIST_train_imgs[:,:,k])' for k in 1:n_train]...);

A = [ones(n_train) X];

print("\nRuntime for Fashion MNIST linear regression model is: ")
@btime train_linear()

n_train, n_test = length(MNIST_train_labels), length(MNIST_test_labels);

X = vcat([vec(MNIST_train_imgs[:,:,k])' for k in 1:n_train]...);
X_test = vcat([vec(MNIST_test_imgs[:,:,k])' for k in 1:n_test]...);
A = [ones(n_train) X];

train_labels = MNIST_train_labels;

print("\nRuntime for MNIST logistic regression model is: ")
@btime train_logistic(η=0.001, verbose=false);

n_train, n_test = length(fMNIST_train_labels), length(fMNIST_test_labels);

X = vcat([vec(fMNIST_train_imgs[:,:,k])' for k in 1:n_train]...);
X_test = vcat([vec(fMNIST_test_imgs[:,:,k])' for k in 1:n_test]...);
A = [ones(n_train) X]

train_labels = fMNIST_train_labels;

print("\nRuntime for Fashion MNIST logistic regression model is: ")
@btime train_logistic(η=0.00037, n_epochs=100, verbose=false);

n_train, n_test = length(MNIST_train_labels), length(MNIST_test_labels)

X = vcat([vec(MNIST_train_imgs[:,:,k])' for k in 1:n_train]...);
X_test = vcat([vec(MNIST_test_imgs[:,:,k])' for k in 1:n_test]...);
A = [ones(n_train) X]

train_labels, test_labels = MNIST_train_labels, MNIST_test_labels;

print("\nRuntime for  MNIST logistic softmax regression model is: ")
@btime train_softmax_logistic(η=0.0025, n_epochs=40, verbose=false)

n_train, n_test = length(fMNIST_train_labels), length(fMNIST_test_labels)

X = vcat([vec(fMNIST_train_imgs[:,:,k])' for k in 1:n_train]...);
X_test = vcat([vec(fMNIST_test_imgs[:,:,k])' for k in 1:n_test]...);
A = [ones(n_train) X]

train_labels, test_labels = fMNIST_train_labels, fMNIST_test_labels;

print("\nRuntime for  Fashion MNIST logistic softmax regression model is: ")
@btime train_softmax_logistic(n_epochs = 100, η=0.00037, verbose=false);


Runtime for MNIST linear regression model is:   

4.078 s (369998 allocations: 2.11 GiB)

Runtime for Fashion MNIST linear regression model is: 

  4.000 s (369997 allocations: 2.11 GiB)



Runtime for MNIST logistic regression model is:   

5.046 s (30064 allocations: 7.42 GiB)

Runtime for Fashion MNIST logistic regression model is: 

  24.804 s (150304 allocations: 37.11 GiB)



Runtime for  MNIST logistic softmax regression model is:   

9.553 s (74524 allocations: 15.06 GiB)

Runtime for  Fashion MNIST logistic softmax regression model is: 

  24.456 s (186304 allocations: 37.65 GiB)


One vs All regression models all have 745 model parameters. One vs One has 45 models with each model having 745 model parameters. Thus for One vs One regression, there is 33,525 model parameters being stored. I was unable to complete task 2.2 so I have left the remainder of the table blank as I am unable to comment on the accuracy or time complexity of the One vs One functions.  As there are a substantial amount more (45x more) model parameters however, I would assumed that the accuracy would improve, but the time taken for the models to complete would greatly increase.

|Data Source|OvA inear Regression|OvA Logistic Regression|OvO Linear Regression|OvO Logistic Regression|Logistic Softmax Regression|
|-----------|--------------------|-----------------------|---------------------|-----------------------|---------------------------|
|MNIST Accuracy|0.8603|0.9035|-|-|0.9151
|MNIST Complexity (# parameters)|745|745|33525|33525|745
|MNIST (approx) Time|4s|5s|-|-|9s 
|Fashion MNIST Accuracy|0.8113|0.8191||-|0.9012
|Fashion MNIST Complexity|745|745|33525|33525|745
|Fashion MNIST Time|4s|23.5s|-|-|24s 

Even though we didn't carry out a multi-class linear model example, the multi-class linear model is equivalent to the one vs. all approach taken in task 2.1. This is because the one vs. all approach taken is a method in which we split a multi-class data set into multiple binary classification models, and then train each binary classification models. Predicitions are made using the binary model that is the most confident. For example, looking at the MNIST dataset, we are interested in the multi-class classification between 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. This is then separated into 10 different binary classification datasets such as:
- 0 vs rest
- 1 vs rest
$  \\ \quad \vdots $
- 9 vs rest

## Task 3 Your own Random Forest Implementation (MNIST and FashionMNIST)

All your MNIST data initialisation needs!

In [30]:
y_train = MNIST_train_labels
y_test = MNIST_test_labels

n_train, n_test = last(size(y_train)), last(size(y_test))

X_train = vcat([vec(MNIST_train_imgs[:,:,k])' for k in 1:n_train]...)
X_test = vcat([vec(MNIST_test_imgs[:,:,k])' for k in 1:n_test]...);

Reading up on Random Forest algorithms and the `DecisionTree.jl` docs gives the following tree model. The parameters are set via experimenting with the values to achieve $>0.92$ accuracy

In [31]:
MNIST_tree_model = RandomForestClassifier(n_subfeatures = 50, n_trees = 100, partial_sampling=0.9, max_depth=-1, rng=0)

RandomForestClassifier
n_trees:             100
n_subfeatures:       50
partial_sampling:    0.9
max_depth:           -1
min_samples_leaf:    1
min_samples_split:   2
min_purity_increase: 0.0
classes:             nothing
ensemble:            nothing

Fitting the model to the training data

In [4]:
DecisionTree.fit!(MNIST_tree_model, X_train, y_train)

RandomForestClassifier
n_trees:             100
n_subfeatures:       50
partial_sampling:    0.9
max_depth:           -1
min_samples_leaf:    1
min_samples_split:   2
min_purity_increase: 0.0
classes:             [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
ensemble:            Ensemble of Decision Trees
Trees:      100
Avg Leaves: 3707.51
Avg Depth:  22.72

Predicting the test labels based on the model fitted above and then calculating the accuracy of the fitted model

In [5]:
predicted_labels = DecisionTree.predict(MNIST_tree_model, X_test)
MNIST_rf_accuracy = mean(predicted_labels .== y_test)
println("\nPrediction accuracy (measured on test set of size $n_test): ", MNIST_rf_accuracy)


Prediction accuracy (measured on test set of size 10000): 0.9685


Now looking at our Fashion MNIST dataset doing exactly the same as above, but tweaking the parameters to achieve the desired accuracy 

In [6]:
y_train = fMNIST_train_labels
y_test = fMNIST_test_labels

n_train, n_test = last(size(y_train)), last(size(y_test))

X_train = vcat([vec(fMNIST_train_imgs[:,:,k])' for k in 1:n_train]...)
X_test = vcat([vec(fMNIST_test_imgs[:,:,k])' for k in 1:n_test]...);

In [16]:
fMNIST_tree_model = RandomForestClassifier(n_subfeatures = 90, n_trees = 120, partial_sampling=0.9, max_depth=-1, rng=0)

RandomForestClassifier
n_trees:             120
n_subfeatures:       90
partial_sampling:    0.9
max_depth:           -1
min_samples_leaf:    1
min_samples_split:   2
min_purity_increase: 0.0
classes:             nothing
ensemble:            nothing

In [17]:
DecisionTree.fit!(fMNIST_tree_model, X_train, y_train)

RandomForestClassifier
n_trees:             120
n_subfeatures:       90
partial_sampling:    0.9
max_depth:           -1
min_samples_leaf:    1
min_samples_split:   2
min_purity_increase: 0.0
classes:             [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
ensemble:            Ensemble of Decision Trees
Trees:      120
Avg Leaves: 3425.225
Avg Depth:  25.95

In [18]:
predicted_labels = DecisionTree.predict(fMNIST_tree_model, X_test)
fMNIST_rf_accuracy = mean(predicted_labels .== y_test)
println("\nPrediction accuracy (measured on test set of size $n_test): ", fMNIST_rf_accuracy)


Prediction accuracy (measured on test set of size 10000): 0.8811


Playing around with the parameters only seemed to get me to an accuracy of about 0.88, with neglible gains with a significant time penalty so instead of waiting 5+ minutes to see a 0.0001 improvement in accuracy, I will call it here.  Thus, we have a an accuracy for the MNIST dataset is 0.9685 and Fashion MNIST dataset is 0.8811