# Homework: Not So Basic Artificial Neural Networks

Your task is to implement a simple framework for convolutional neural networks training. While convolutional neural networks is a subject of lecture 3, we expect that there are a lot of students who are familiar with the topic.

In order to successfully pass this homework, you will have to:

- Implement all the blocks in `homework_modules.ipynb` (esp `Conv2d` and `MaxPool2d` layers). Good implementation should pass all the tests in `homework_test_modules.ipynb`.
- Settle with a bit of math in `homework_differentiation.ipynb`
- Train a CNN that has at least one `Conv2d` layer, `MaxPool2d` layer and `BatchNormalization` layer and achieves at least 97% accuracy on MNIST test set.

Feel free to use `homework_main-basic.ipynb` for debugging or as source of code snippets.

Note, that this homework requires sending **multiple** files, please do not forget to include all the files when sending to TA. The list of files:
- This notebook with cnn trained
- `homework_modules.ipynb`
- `homework_differentiation.ipynb`

# Imports

In [None]:
%matplotlib inline
from time import time, sleep
import numpy as np
import matplotlib.pyplot as plt
from IPython import display
import gzip

In [None]:
# Import your google drive with notebooks
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

In [None]:
# move to folder with homework (all files need to be in one folder)
%cd '/content/drive/MyDrive/smthg/path_to_folder/'

In [None]:
# (re-)load layers
%run homework_modules.ipynb

# Digit Classification

## Upload MNIST dataset

In [None]:
def load_image(filename):
    # Read the inputs in Yann LeCun's binary format.
    with gzip.open(filename, 'rb') as f:
        data = np.frombuffer(f.read(), np.uint8, offset=16)
    # The inputs are vectors now, we reshape them to monochrome 2D images
    data = data.reshape(-1, 28, 28)
    # The inputs come as bytes, we convert them to float32 in range [0,1].
    return (data / np.float32(256)).squeeze()

def load_mnist_labels(filename):
    # Read the labels in Yann LeCun's binary format.
    with gzip.open(filename, 'rb') as f:
        data = np.frombuffer(f.read(), np.uint8, offset=8)
    # The labels are vectors of integers now, that's exactly what we want.
    return data

In [None]:
X_train = load_image('data/train-images-idx3-ubyte.gz')
X_test = load_image('data/t10k-images-idx3-ubyte.gz')
Y_train = load_mnist_labels('data/train-labels-idx1-ubyte.gz')
Y_test = load_mnist_labels('data/t10k-labels-idx1-ubyte.gz')
# We reserve the last 10000 training examples for validation.
X_train, X_val = X_train[:-10000], X_train[-10000:]
Y_train, Y_val = Y_train[:-10000], Y_train[-10000:]

In [None]:
print('X_train: ' + str(X_train.shape))
print('Y_train: ' + str(Y_train.shape))
print('X_val: ' + str(X_val.shape))
print('Y_val: ' + str(Y_val.shape))
print('X_test:  '  + str(X_test.shape))
print('Y_test:  '  + str(Y_test.shape))

In [None]:
plt.subplot(331)
plt.imshow(X_train[0], cmap=plt.get_cmap('gray'))
plt.show()
print()
print('Y_train[0]: ' + str(Y_train[0]))

## Preparing Data

### Task 1:

make one-hot encoding for labels. Clue: use [np.eye](https://numpy.org/doc/stable/reference/generated/numpy.eye.html) for them

In [None]:
def one_hot_encode(y):
    # YOUR CODE HERE:
    ###########################
    ### ╰( ͡° ͜ʖ ͡° )つ──☆*:・ﾟ
    ###########################
    one_hot_y = None
    return one_hot_y

hot_y_train = one_hot_encode(Y_train)
hot_y_val = one_hot_encode(Y_val)
hot_y_test = one_hot_encode(Y_test)

#### Test task 1

In [None]:
def one_hot_encode_test(hot_y_train):
    first_ten_answers = np.array([[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
                        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
                        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
                        [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
                        [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
                        [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
                        [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
                        [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
                        [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
                        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.]])
    np.testing.assert_equal(hot_y_train[:10], first_ten_answers, err_msg="First ten samples are not equal")
    print("The test pass successfully !!!")

one_hot_encode_test(hot_y_train)

### Task 2:  

In `homework_main-basic.ipynb` we treated mnist images as vectors, so we flattened it. For CNN, we assume that images have size `(bs, num_channels, w, h)`. Our mnist image is grayscale, so, it don't have a `num_channels` dimension. You need to reshape `X_train`, `X_val` and `X_test` to apropriate size.

In [None]:
# YOUR CODE HERE:
###########################
### ╰( ͡° ͜ʖ ͡° )つ──☆*:・ﾟ
###########################
X_train, X_val, X_test = None, None, None

#### Test task 2

In [None]:
def dimension_test(X_train, X_val, X_test):
    true_train_shape = (50000, 1, 28, 28)
    true_test_shape = (10000, 1, 28, 28)
    np.testing.assert_equal(X_train.shape, true_train_shape, err_msg="Train shape doesn't the same")
    np.testing.assert_equal(X_val.shape, true_test_shape, err_msg="Valid shape doesn't the same")
    np.testing.assert_equal(X_test.shape, true_test_shape, err_msg="Test shape doesn't the same")
    print("The test pass successfully !!!")

dimension_test(X_train, X_val, X_test)

## CNN classification

### Task 3:

You need to define `in_features` for the final linear layer based on `kernel_size` and `out_channels` variables.

In [None]:
def create_cnn(kernel_size, out_channels):
    CNN = Sequential()
    CNN.add(Conv2d(in_channels = 1, out_channels = out_channels, kernel_size = kernel_size))
    CNN.add(MaxPool2d(kernel_size = kernel_size))
    CNN.add(ReLU())
    CNN.add(Flatten())

    # YOUR CODE HERE: Define `in_features` for variables `kernel_size`, `out_channels`
    ###########################
    ### ╰( ͡° ͜ʖ ͡° )つ──☆*:・ﾟ
    ###########################
    in_features = None
    CNN.add(Linear(in_features, 10))
    CNN.add(LogSoftMax())
    return CNN

#### Test task 3

In [None]:
def test_cnn_creation():
    kernel_sizes = [1, 3, 5, 7]
    out_channels = [1, 2, 3, 4]
    rand_input = np.random.normal(size=(3, 1, 28, 28))
    try:
        rand_output = [create_cnn(ks, oc).forward(rand_input) for ks in kernel_sizes for oc in out_channels]
        print("The test pass successfully !!!")
    except Exception as e:
        print(e.message)
        raise AssertionError
test_cnn_creation()

### Task 4

You need to fill gaps in the `train` pipeline. Note that `optimizer_name` can be one of `['sgd_momentum', 'adam_optimizer']`.

In [None]:
# batch generator
def get_batches(dataset, batch_size):
    X, Y = dataset
    n_samples = X.shape[0]

    # Shuffle at the start of epoch
    indices = np.arange(n_samples)
    np.random.shuffle(indices)

    for start in range(0, n_samples, batch_size):
        end = min(start + batch_size, n_samples)

        batch_idx = indices[start:end]

        yield X[batch_idx], Y[batch_idx]

In [None]:
def train(net, criterion, optimizer_name, optimizer_config,
          n_epoch, X_train, y_train, X_val, y_val, batch_size):

    loss_train_history = []
    loss_val_history = []
    optimizer_state = {}

    for i in range(n_epoch):
        print('Epoch {}/{}:'.format(i, n_epoch - 1), flush=True)

        for phase in ['train', 'val']:
            if phase == 'train':
                X = X_train
                y = y_train
                net.train()
            else:
                X = X_val
                y = y_val
                net.evaluate()

            num_batches = X.shape[0] / batch_size
            running_loss = 0.
            running_acc = 0.

            for x_batch, y_batch in get_batches((X, y), batch_size):

                net.zeroGradParameters()

                # Forward
                predictions = # Your code goes here
                loss = # Your code goes here

                # Backward
                if phase == 'train':
                    dp = # Your code goes here
                    net.backward(x_batch, dp)

                    # Update weights
                    if optimizer_name == 'sgd_momentum':
                        # Your code goes here
                    else:
                        # Your code goes here

                running_loss += loss
                running_acc += np.sum(predictions.argmax(axis=1) == y_batch.argmax(axis=1))

            epoch_loss = running_loss / num_batches
            epoch_acc = running_acc / y.shape[0]
            if phase == 'train':
                loss_train_history.append(epoch_loss)
            else:
                loss_val_history.append(epoch_loss)

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc), flush=True)

    return net, loss_train_history, loss_val_history

### Task 5

You need to perform hyperparameter selection. \\
To narrow the search space, we fix `optimizer_name=sgd_momentum` and `batch_size=32`. So, there are neural network hyperparameters (`kernel_size` and `out_channels`) and optimizer's (`lr` and `momentum`). \\
See the structure of `sgd_momentum` to pass apropriate `optimizer_config` into `train` pipeline.

Since the pipeline was not defined for sklearn, using [`RandomSearchCV`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html) may be inapropriate. Use [`ParameterSampler`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.ParameterSampler.html) for them. The target criterion for selection is the validation loss.

A neural network from scratch does not involve computations using GPUs, so the training time for one epoch is significant. In this regard, use 3 runs of hyperparameter selection on 3 epochs (**It will take about 50 minutes.**).

In [None]:
import numpy as np
from sklearn.model_selection import ParameterSampler

# YOUR CODE HERE: # Define the hyperparameter grid.
###########################
### ╰( ͡° ͜ʖ ͡° )つ──☆*:・ﾟ
###########################
param_grid = None

# YOUR CODE HERE: # Define a list of parameters. The length of list is should equal to 3
###########################
### ╰( ͡° ͜ʖ ͡° )つ──☆*:・ﾟ
###########################
param_list = None
assert len(param_list) == 3, f"We make a search only over 3 runs."

best_loss = np.inf
best_params = None
results = []
n_epoch = 3
batch_size = 32
criterion = ClassNLLCriterion()

for params in param_list:
    # YOUR CODE HERE:
    ###########################
    ### ╰( ͡° ͜ʖ ͡° )つ──☆*:・ﾟ
    ###########################
    # 1. Create network with the given hyperparameters.
    net = None
    # 2. Build the optimizer configuration for sgd_momentum.
    optimizer_config = None


    # Run training.
    net, loss_train_history, loss_val_history = train(
        net, criterion, 'sgd_momentum', optimizer_config,
        n_epoch, X_train, hot_y_train, X_val, hot_y_val, batch_size
    )
    # The final epoch's validation loss as the metric.
    final_val_loss = loss_val_history[-1]
    results.append((params, final_val_loss))

    if final_val_loss < best_loss:
        best_loss = final_val_loss
        best_params = params

print("Best hyperparameters:", best_params)
print("Best validation loss: {:.4f}".format(best_loss))

### Task 6

For the selected hyperparameters, you need to run 3-fold cross-validation. Use [`KFold`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html) for them (**It will also take about 50 minutes.**) \\
**Why is this necessary if we already make optimization over hyperparameter space?**

In [None]:
from sklearn.model_selection import KFold

# YOUR CODE HERE:
# Define a KFold Class for cross-validaton.
###########################
### ╰( ͡° ͜ʖ ͡° )つ──☆*:・ﾟ
###########################
kf = None

# YOUR CODE HERE:
# We need to merge images X_train and X_val
# and labels to make a random cross-validation
###########################
### ╰( ͡° ͜ʖ ͡° )つ──☆*:・ﾟ
###########################
X_all = None
y_all = None

best_nn_weights = None
best_cv_loss = np.inf
cv_losses = []
n_epoch = 3
batch_size = 32
criterion = ClassNLLCriterion()

for fold, (train_index, val_index) in enumerate(kf.split(X_all)):
    # Split the full dataset into training and validation subsets for this fold.
    ###########################
    ### ╰( ͡° ͜ʖ ͡° )つ──☆*:・ﾟ
    ###########################
    X_train_cv, X_val_cv = None, None
    y_train_cv, y_val_cv = None, None

    # Create a new instance of the network and optimizer using best hyperparameters.
    ###########################
    ### ╰( ͡° ͜ʖ ͡° )つ──☆*:・ﾟ
    ###########################
    net = None
    optimizer_config = None

    # Train the model for this fold.
    net, loss_train_history, loss_val_history = train(
        net, criterion, 'sgd_momentum', optimizer_config,
        n_epoch, X_train_cv, y_train_cv, X_val_cv, y_val_cv, batch_size
    )

    # Record the final validation loss for this fold.
    fold_val_loss = loss_val_history[-1]

    # Save best final weights
    if fold_val_loss < best_cv_loss:
        best_cv_loss = fold_val_loss
        # getParameters of all modules in a sequential container.
        # clue: see the corresponding method in `Sequential` Module.
        ###########################
        ### ╰( ͡° ͜ʖ ͡° )つ──☆*:・ﾟ
        ###########################
        best_nn_weights = None

### Task 7

The final step is testing the model on a separate subset. Fill in the gaps in the `test` function. Initialize the neural network with the best hyperparameters and load the best weights.

In [None]:
def test(net, criterion, X_test, y_test, batch_size):
    X_test, y_test = X_test[:3200], y_test[:3200]
    net.evaluate()
    num_batches = X_test.shape[0] / batch_size
    running_loss = 0.
    running_acc = 0.
    for x_batch, y_batch in get_batches((X_test, y_test), batch_size):
        net.zeroGradParameters()

        # Forward
        predictions = # Your code goes here
        loss = # Your code goes here
        running_loss += loss
        running_acc += (predictions.argmax(axis=1) == y_batch.argmax(axis=1)).astype(float).mean()

    epoch_loss = running_loss / num_batches
    epoch_acc = running_acc / num_batches
    print('Final Test Loss: {:.4f} Final Test Acc: {:.4f}'.format(epoch_loss, epoch_acc), flush=True)

    return epoch_loss, epoch_acc


###########################
### ╰( ͡° ͜ʖ ͡° )つ──☆*:・ﾟ
###########################
# 1. Init CNN with `best_params`
net = None
# setParameters of CNN with `best_nn_weights`.
# clue: see the corresponding method in `Sequential` Module.
None

epoch_loss, epoch_acc = test(net, criterion, X_test, hot_y_test, batch_size=32)

### Task 8

Now you need to apply your skills and creativity to improve the result:
1. Use more calculations (more detailed hyperparameter selection, more training epochs);
2. Create your own CNN with dropouts and batch normalization.

In [1]:
from IPython.display import Image
Image(url='https://memepedia.ru/wp-content/uploads/2017/08/%D0%B1%D0%B5%D0%BD%D0%B4%D0%B5%D1%80-%D1%84%D1%83%D1%82%D1%83%D1%80%D0%B0%D0%BC%D0%B0.png', width=700)

3. Add augmentations to the training set.
4. Use adaptive optimization. Any other techniques are welcome!


At the output, you should get at least 97% on the test set.

In [None]:
# YOUR CODE HERE:
###########################
### ╰( ͡° ͜ʖ ͡° )つ──☆*:・ﾟ
###########################

### Task 9 Feel the power of the GPU!

Let's achive the same results with [`PyTorch`](https://pytorch.org/). Use the framework syntax to create the [СNN, optimizer, training and testing loops](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html). Don't forget to [convert](https://pytorch.org/docs/stable/generated/torch.from_numpy.html) the data to the torch tensor format and wrap it with [Dataset and Dataloader](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html) for correct processing.

In [None]:
# YOUR CODE HERE:
###########################
### ╰( ͡° ͜ʖ ͡° )つ──☆*:・ﾟ
###########################