# Goals

Understand the basic principles underlying:

- Classification
- Multilayer perceptrons
- Backpropagation

# TODO

The most important goal is to understand.

At each part of the notebook, you will find hyper-parameters. Your task is to change these hyper-parameters and study their effect.

1) The dataset

Visualize the data and understand the task.

Hyper-parameters to change: 
- size of the dataset (n_samples)
- shape of the circles (factor)
- strength of the noise (noise)
- number of examples per batch (batch_size)


2) The network

If you are in "advanced coding mode", fill in the gaps.

Hyperparameters to change:
- the size of the hidden layer (dim_h)
- the scale of the initial random weights (w)

3) Training and testing the network


Hyper-parameters to change: 
- number of epochs (n_epochs)
- learning rate (learning_rate)




# 0) Loading useful packages and modules

In [None]:
import numpy as np  # to handle operations on arrays of numbers, similar to matlab
import matplotlib.pyplot as plt  # to plot figures
import matplotlib.cm as cm
import torch
import torch.nn as nn
import torch.optim
import sklearn as skl
import load_dataset as load  # home-made module with functions to load datasets

# 1) The dataset

We load the "circles" dataset.

For now, we set a batch_size of 1, meaning we will give the examples one by one to the network.

The loader gives us a train_loader and a test_loader, having 75 % and 25 % of the dataset examples respectively.

You can choose different "factor" values between the circles, and different "noise" levels.

If your computer is slow, you can reduce the number of examples n_samples.

In [None]:
batch_size = ...  # the number of examples per batch
train_loader, test_loader, dim_in, dim_out = load.load_circles(batch_size=batch_size, 
                                                               n_samples=..., 
                                                               shuffle=True, 
                                                               noise=..., 
                                                               factor=...)

Let's look into the dataset.

We can plot it and look at some examples of inputs and targets.

Note that the inputs are of the class "tensor" from pytorch.

In [None]:
X = train_loader.dataset.data  # we take the DATA (here all dataset)
targets = train_loader.dataset.targets  # we take the TARGET (here all dataset)
ax = plt.subplot()
ax.scatter(X[:, 0], X[:, 1], c=targets, cmap=cm.coolwarm)
plt.show()

# 2) The network

We first code useful functions, then assemble them into a network.

We chose the ReLU as the activation function.

In [None]:
def relu(x):
    y = x.clone()
    y[x <0] = ...
    return y

In [None]:
def d_relu(x):
    y = torch.ones_like(x)
    y[x<0]=...
    return y

In [None]:
def softmax(y):
    z = torch.exp(y)/(torch.exp(y).sum(axis=-1)).unsqueeze(dim=-1)
    return z

Here we define the network. A multi layer perceptron with one hidden layer.

In [None]:
class multi_layer_perceptron():
    def __init__(self, dim_in, dim_h, dim_out, w):
        self.dim_in = dim_in # number of inputs
        self.dim_h = dim_h # number of hidden neurons
        self.dim_out = dim_out # number of outputs
        
        # Here we define the parameters (weights and biases) and initialize them randomly
        # Help: to create a random array with normal distribution of stdv 1: torch.randn(dim0, dim1)
        self.weights_1 = torch.randn(..., ...) * w
        self.bias_1 = torch.randn(...) * w
        self.weights_2 = torch.randn(..., ...) * w
        self.bias_2 = torch.randn(...) * w
        
    # The forward function is what the network does: what transformations are applied to the inputs.
    # Help: tensor multiplication torch.matmul(tensor1, tensor2)
    def forward(self, x):
        self.y_1 = ...
        self.h = ...
        y_2 = ...
        return y_2
    
    # This function computes the gradients associated to each parameter
    def compute_grad(self, x, targets):
        y_2 = ... # we apply the network
        p = ...
        targets = torch.nn.functional.one_hot(targets, num_classes=2).float()
        loss = ... # cross entropy loss. Help: dot product torch.dot(vec1, vec2)
        
        #Layer 2 of weights and biases. Initialize arrays.
        #Initialize arrays.
        self.dL_db2 = torch.zeros_like(self.bias_2)
        self.dL_dw2 = torch.zeros_like(self.weights_2)

        for j in range(self.dim_out):
            for i in range(self.dim_h):
                self.dL_dw2[i, j] = ...
            self.dL_db2[j] = ...

        #Layer 1 of weights and biases
        #Initialize arrays.
        self.dL_db1 = torch.zeros_like(self.bias_1)
        self.dL_dw1 = torch.zeros_like(self.weights_1)
        
        for j in range(self.dim_h):
            for i in range(self.dim_in):
                self.dL_dw1[i, j] = ...
            self.dL_db1[j] = ...
            
        return loss
            
    # This function updates the parameters using the gradients
    def update_params(self, learning_rate):
        self.weights_1 = ...
        self.bias_1 = ...
        self.weights_2 = ...
        self.bias_2 = ...

        

We make a function to plot the results of the network on the task/

In [None]:
def plot_results(network, x, targets):
    y = network.forward(x)
    p = softmax(y)
    pred = y.argmax(dim=1) # the predicted class is the output neurons with the highest value
    idxs = torch.nonzero(((pred == targets[:]) == False)).squeeze() # indexes predicted in wrong class
    fig, ax = plt.subplots(1, 2)
    ax[0].scatter(x[:, 0], x[:, 1], c=p[:,1], cmap=cm.coolwarm, vmin=0, vmax=1) # proba to be class 1
    ax[0].set_title('Probability to be class 1')
    ax[1].scatter(x[:, 0], x[:, 1], c=pred, cmap=cm.coolwarm, vmin=0, vmax=1) # class
    ax[1].scatter(x[idxs, 0], x[idxs, 1], facecolors='none', edgecolor='green', linewidths=2) # misclassified
    ax[1].set_title('Predicted class (red = 1)')

We make a function to compute the accuracy on the task, i.e. the proportion of correctly classified examples.

In [None]:
def accuracy(network, x, targets):
    y = network.forward(x)
    p = softmax(y)
    pred = p.argmax(dim=-1)
    accuracy = (pred == targets).double().mean()
    return accuracy

We create a network suitable to solve the circles task. 

Hyperparameters to change:
- the size of the hidden layer (dim_h)
- the scale of the initial random weights (w)

In [None]:
net = multi_layer_perceptron(dim_in = ..., dim_h = ..., dim_out = ..., w=...)

# 3) Training and testing the network

Let's test the network before training. We observe that the results are poor: this is random guessing.

In [None]:
all_X_test = test_loader.dataset.data
all_targets_test = test_loader.dataset.targets

plot_results(net, all_X_test, all_targets_test)
print(f'accuracy is {accuracy(net, all_X_test, all_targets_test)*100:.2f} %')

Now let's train the network and track the evolution of the loss.

In [None]:
train_loss_list = []
test_accuracy_list = []

Here choose the learning rate and number of epochs

In [None]:
learning_rate = ...
n_epochs = ...

In [None]:
for epoch in range(n_epochs):
    loss = 0
    for idx in range(len(train_loader)):
        data = train_loader.dataset.data[idx]
        target = train_loader.dataset.targets[idx]
        loss += net.compute_grad(data, target)/len(train_loader)
        net.update_params(learning_rate = learning_rate)
    train_loss_list.append(loss)
    
    test_acc = accuracy(net, all_X_test, all_targets_test)*100
    test_accuracy_list.append(test_acc)
    plot_results(net, all_X_test, all_targets_test)
    plt.show()
    print(f'Epoch {epoch} loss: {loss}, accuracy: {test_acc:.2f} %')

In [None]:
plt.style.use('plot_style.mplstyle')  # A file with parameters on what you want the figure to look like
fig, ax = plt.subplots(1, 2)  # a figure with 1 row and 2 columns of subplots
# fig is the figure itself, ax is an array of two independent subplots inside fig
# on the left side, plot the losses. 
ax[0].plot(train_loss_list, c='red', ls='-')
ax[0].set_xlabel('Epochs')
ax[0].set_ylabel('Training Loss')
# on the right side, plot the accuracies. 
ax[1].plot(test_accuracy_list, c='red')
ax[1].set_xlabel('Epochs')
ax[1].set_ylabel('Test Accuracy (%)')
# We put a title for our figure.
fig_title = f'Training a perceptron'
fig.suptitle(fig_title)

In [None]:
plot_results(net, all_X_test, all_targets_test)