# Exercises - Week 10 - Non-linear Methods
#### Simon Lee, BIO-322, Machine Learning for Bioengineers, Winter 2022

## Conceptual
#### Exercise 1
Here below is an image (with padding 1 already applied; padding 1 means that one row/column of zeros is added to the top and bottom/left and right of the original image.). We would like to process it with a convolutional network with one convolution layer with two $3 \\times 3$ filters (depicted below the image), stride 1 and relu non-linearity (stride 1 means that the filter moves 1 position between each application, as in the formula in the slides).
- Determine the width, height and depth of the volume after the convolutional layer.
- Compute the output of the convolutional layer assuming the two biases to be zero.
$(MLCourse.embed_figure("conv_exercise.png"))

## Answer

After the convolutional layer the dimension will be $4 \times 4 \times 2$;
$4\times 4$ because this is the number of positions the $3\times3$ filters can be applied at (with stride 1 and padding 1) and 2, because there are two filters.

For filter 1

$$
\begin{bmatrix}
3  & 3 & 5 & 1\\
7 & 4 & 4 & 5 \\
2 & 7 & 5 & 3 \\
4 & 3 & 2 & 4 \\
\end{bmatrix}
$$

and for filter 2

$$
\begin{bmatrix}
2  & 5 & 2 & 4\\
8 & 9 & 5 & 7 \\
7 & 3 & 4 & 6 \\
5 & 2 & 5 & 3 \\
\end{bmatrix}
$$


## Applied
#### Exercise 2
In this exercise our goal is to find a good machine learning model to classify images of Zalando's articles. You can load a description of the so-called Fashion-MNIST data set with `OpenML.describe_dataset(40996)` and load the data set with `OpenML.load(40996)`. Take our recipe for supervised learning (last slide of the presentation on \"Model Assessment and Hyperparameter Tuning\") as a guideline. Hints: cleaning is not necessary, but plotting some examples is advisable; linear classification is a good starting point for a first benchmark, but you should also explore other models like random forests (`RandomForestClassifier`), multilayer perceptrons (`NeuralNetworkClassifier`) and convolutional neural networks (`ImageClassifer`) and play manually a bit with their hyper-parameters (proper tuning with `TunedModel` may be too time-intensive). To reduce the computation time, we will only use the first 5000 images of the full dataset for training and we will not do cross-validation here but instead use samples 5001 to 10000 as a validation set to estimate the classification accuracy. *Hint:* you can use the resampling strategy `Holdout` for model tuning and evaluation.

In [1]:
begin
    using Pkg
    Pkg.activate(joinpath(Pkg.devdir(), "MLCourse"))
    using DataFrames, MLJ, MLJLinearModels, MLCourse, Random, Distributions, Plots, MLJFlux, Flux, OpenML, MLJDecisionTreeInterface
end

[32m[1m  Activating[22m[39m project at `~/.julia/dev/MLCourse`


In [2]:
OpenML.describe_dataset(40996)

**Author**: Han Xiao, Kashif Rasul, Roland Vollgraf   **Source**: [Zalando Research](https://github.com/zalandoresearch/fashion-mnist)   **Please cite**: Han Xiao and Kashif Rasul and Roland Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms, arXiv, cs.LG/1708.07747  

Fashion-MNIST is a dataset of Zalando's article images, consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits. 

Raw data available at: https://github.com/zalandoresearch/fashion-mnist

### Target classes

Each training and test example is assigned to one of the following labels: Label  Description   0  T-shirt/top   1  Trouser   2  Pullover   3  Dress   4  Coat   5  Sandal   6  Shirt   7  Sneaker   8  Bag   9  Ankle boot


In [3]:
# takes significant time to load for your own discretion
fashion = OpenML.load(40996) |> DataFrame

Let us look a bit at the raw data.

In [None]:
(maximum(Array(fashion[:, 1:end-1])), minimum(Array(fashion[:, 1:end-1])))

Like for MNIST, the images seem to be encoded with number between 0 and 255. Let us therefore transform scale the input.

In [None]:
begin
	fashion_input = select(fashion[1:10000, :], Not(:class)) ./ 255
	train_input = fashion_input[1:5000, :]
	test_input = fashion_input[5001:end, :]
	train_class = fashion.class[1:5000]
	test_class = fashion.class[5001:10000]
end

In [None]:
images = coerce(PermutedDimsArray(reshape(Array(fashion_input), :, 28, 28),
                                  (3, 2, 1)),
                GrayImage);


In [None]:
plot([plot(images[i]) for i in 1:12]..., layout = (3, 4), ticks = false)

In [None]:
m1 = machine(LogisticClassifier(lambda = 1e-4), train_input, train_class) |> fit!;

In [None]:
mean(predict_mode(m1, test_input) .== test_class)

The test accuracy of linear classification with a little bit of L2 regularization (regularization constant $\lambda = 10^{-4}$) is approximately 81%.

In [None]:
m2 = machine(RandomForestClassifier(n_trees = 500), train_input, train_class) |> fit!;

In [None]:
mean(predict_mode(m2, test_input) .== test_class)

The test accuracy of a random forest with 500 trees is slightly better than with the linear method above.

In [None]:
m3 = machine(NeuralNetworkClassifier(builder = MLJFlux.Short(n_hidden = 128,
                                                             dropout = 0.1,
                                                             σ = relu),
                                    batch_size = 32,
                                    epochs = 30),
             train_input, train_class);

In [None]:
fit!(m3, verbosity = 2);

In [None]:
mean(predict_mode(m3, test_input) .== test_class)

We can further improve with a multilayer perceptron.

In [None]:
begin
    f() = 0
    builder = MLJFlux.@builder(
                  begin
                      front = Chain(Conv((5, 5), n_channels => 16, relu),
                                    Conv((3, 3), 16 => 32, relu),
                                    Flux.flatten)
                      d = first(Flux.outputsize(front, (n_in..., n_channels, 1)))
                      Chain(front, Dropout(0.1), Dense(d, n_out))
                  end)
end

In [None]:
m4 = machine(ImageClassifier(builder = builder,
                             batch_size = 32,
                             epochs = 20),
             images[1:5000], train_class);

In [None]:
fit!(m4, verbosity = 2);

mean(predict_mode(m4, images[5001:end]) .== test_class)

With this convolutional network we reach almost 85%. With further hyper-parameter tuning we expect even better results for the convolutional network."