<h1>MALIS Lab Session 1 - Fall 2018</h1>

The aim of this lab is to practice with Neural Networks (Multi-Layer Perceptrons) via simple classification experiments and the (partial) implementation of the feedforward and backpropagation procedures. For this lab, the implementation of the MLP simulator is in Python 3.

Experiments should be made by groups of two students. Each group should produce a Jupyter Notebook with all their results and comments. We strongly encourage the addition of plots and visual representation to the report, bearing in mind that comments on the graphical data are still necessary. Code for adding images to your notebook: ```<img src="path/to/image.png" />```.

Submit your complete notebook as an archive (tar -cf groupXnotebook.tar lab1/) . Deadline for submitting your notebook: 30 November 2018.

<h2>Introduction</h2>
There are three parts to this lab session. 

1. A "theoretical" part: Given a set of training examples you have to decide on the architecture of the feed-forward neural network such as; number of layers, number of neuron per layers and finally the values of the weights. 

2. A "programming" part: Given the skeleton of the Python code of an MLP simulator, implement the missing functions (feedforward and backpropagation procedures). 

3. An "experimental" part: Having completed the implementation of the MLP simulator, the final step consist on training the network and testing its accuracy.

<h2>Part 1: Design a neural network</h2>
The aim of this part is to get a better understanding of the basics of Neural Networks construction. A number of sample points on a 128 by 128 grid have been assigned one out of three colors (red, green or blue). You should build a Neural Network with two inputs and three outputs which provides the exact coloring for these points. The problem can be visualized in the following figure: 

<img src="data_set.jpg" />

The file set30.x1x2rgb (in .\data\) contains the data corresponding to the problem defined above. The visual representation of the problem (above figure) is stored in data_set.jpg.

The problem:

Pairs of x1 and x2 coordinates (both ranging between 0 and 127) are associated with a specific color: 

* Red: output 1 0 0, 
* Green: output 0 1 0, 
* Blue: output 0 0 1. 

The objective of the network is to correctly determine for any given (x1, x2) coordinate pair the corresponding color. 
Your task is to <b>manually define a Neural Network which performs this task perfectly (do not forget to justify your answer)</b>. There is no need for programming or iterative training. The transfer function is assumed to be the step function: 

$f(t) = (t > 0)$ (it is equal to 1 if t is positive, 0 otherwise). 

Of course, it is your task to define the number of layers, the number of neurons per layer, and the exact values for the weights. 

<i>Hint: You may remember the XOR problem and how it was solved.</i>

First of all, we start to define our Neural Network. Our classes can be defined inside a convex sets. So we need just one hidden layer. We can devide all our sets just by 2 lines so our hidden layer contains two units.
<p>
<img src="Partition.PNG"/>
</p>
<table>
  <tr style="border: 1px solid #dddddd;">
    <th style="border: 1px solid #dddddd;">y1</th>
    <th style="border: 1px solid #dddddd;">y2</th>
    <th style="border: 1px solid #dddddd;">a1</th>
    <th style="border: 1px solid #dddddd;">a2</th>
    <th style="border: 1px solid #dddddd;">a3</th>
    <th style="border: 1px solid #dddddd;">Color</th>
  </tr>
  <tr style="border: 1px solid #dddddd;">
    <td style="border: 1px solid #dddddd;">1</td>
    <td style="border: 1px solid #dddddd;">1</td>
    <td style="border: 1px solid #dddddd;">0</td>
    <td style="border: 1px solid #dddddd;">1</td>
    <td style="border: 1px solid #dddddd;">0</td>
    <td style="border: 1px solid #dddddd;">Green</td>
  </tr>
  <tr style="border: 1px solid #dddddd;">
    <td style="border: 1px solid #dddddd;">1</td>
    <td style="border: 1px solid #dddddd;">0</td>
    <td style="border: 1px solid #dddddd;">0</td>
    <td style="border: 1px solid #dddddd;">0</td>
    <td style="border: 1px solid #dddddd;">1</td>
    <td style="border: 1px solid #dddddd;">Blue</td>
  </tr>
  <tr style="border: 1px solid #dddddd;">
    <td style="border: 1px solid #dddddd;">0</td>
    <td style="border: 1px solid #dddddd;">1</td>
    <td style="border: 1px solid #dddddd;">1</td>
    <td style="border: 1px solid #dddddd;">0</td>
    <td style="border: 1px solid #dddddd;">0</td>
    <td style="border: 1px solid #dddddd;">Red</td>
  </tr>
  <tr>
    <td style="border: 1px solid #dddddd;">0</td>
    <td style="border: 1px solid #dddddd;">0</td>
    <td style="border: 1px solid #dddddd;">1</td>
    <td style="border: 1px solid #dddddd;">0</td>
    <td style="border: 1px solid #dddddd;">0</td>
    <td style="border: 1px solid #dddddd;">Red</td>
  </tr>
</table>

This table gives us a list of equations to determine the weights and biases.
$$ w^{}_{11}+w{}_{21}-b{}_1<0$$
$$ w^{}_{11}-b{}_1<0$$
$$ w{}_{21}-b{}_1>=0$$
$$ -b{}_1>=0$$

$$ w^{}_{12}+w{}_{22}-b{}_2>=0$$
$$ w^{}_{12}-b{}_2<0$$
$$ w{}_{22}-b{}_2<0$$
$$ -b{}_2<0$$

$$ w^{}_{13}+w{}_{23}-b{}_3<0$$
$$ w^{}_{13}-b{}_3>=0$$
$$ w{}_{23}-b{}_3<0$$
$$ -b{}_3<0$$

We choose this list of solutions
$$ w^{}_{11}= -4 ; w{}_{21}= -2 ; b{}_1=-3$$
$$ w^{}_{12}= 2 ; w{}_{22}=2; b{}_2=3 $$
$$ w^{}_{13}=2 ; w{}_{23}=-3 ; b{}_3=1 $$

For the first layer we determine the equations of the two lines that define our hyperplanes.
y1: $$ 4x1+3x2-255=0 $$
y2: $$ -3x1+2x2+75=0 $$

<h3>Your answer:</h3>

<h2>Part 2: Implementation of the MLP simulator</h2>
The task here is to implement the missing parts of a code written to simulate multi-layer perceptrons. The code can be found in your directory under the filename utils.py (but you will not edit that file, all your code will be written in your notebook). Here is a brief explanation about the MLP simulator: 

A network description file has to be provided. This is a text file which contains information about the number of layers in the network and the number of units (neuron) for each layer. Here is an example of such a file: 

This example describes a 2 layer network with 2 hidden units and 3 output units. 
Additionally a pattern (or example set) file has to be provided. This file contains a number of example pattern with input and output values. For an example of such a file look at ./data/set30.x1x2rgb.

As you know, transfer functions of an MLP need to be differentiable to train it. Therefore, we replace the step function by a sigmoid function.

Now that you have a broad overview of the program your task is to <b>implement the feedforward function of the Neuron class</b>. Obviously, you can find help in the notes from the course.

In [1]:
# First run this cell to import relevant classes and functions
from utils import Neuron, Dataset, Layer, MLP, sigmoid, d_sigmoid

import numpy as np
np.random.seed(1)

<h3>Your answer:</h3>

In [2]:
def feedforward(self):
    res = 0. # Contains the weighted sum of the inputs of the neuron
    for i in range(len(self.inputs)):
        res += self.weights[i]*self.inputs[i] ### IMPLEMENTATION REQUIRED ###
    res += self.bias
    self.u = res
    self.out = sigmoid(res)

Neuron.feedforward = feedforward

Before implementing the Backpropagation function, <b>write the recursive formula for the partial derivative of the error with respect to the activation (neuron j of layer i) as a function of the weights and partial derivative of the error in layer i+1 from the course material</b>.

<h3>Your answer:</h3>

$$\frac{\partial L}{\partial u^{(i)}_j} = \quad 2*(y^{}_j-t{}_j) $$

Now, <b>implement the compute_gradients() and the apply_gradient() functions of the MLP class</b>.

<h3>Your answer:</h3>

In [3]:
def compute_gradients(self):
    # First compute derivatives for the last layer
    layer = self.layers[-1]
    for i in range(len(layer)):
        # Compute dL/du_i
        neuron = layer.neurons[i]
        o = neuron.out
        u = neuron.u
        t = self.gt[i]
        neuron.d_u =2*(sigmoid(u)-t)*o*(1-o) ### IMPLEMENTATION REQUIRED ###
        for j in range(len(neuron.weights)):
            # Compute dL/dw_ji
            neuron.d_weights[j] =2*(sigmoid(u)-t)*o*(1-o)*neuron.inputs[j] ### IMPLEMENTATION REQUIRED ###

    # Then compute derivatives for other layers
    for l in range(2, len(self.layers)):
        layer = self.layers[-l]
        next_layer = self.layers[-l+1]
        for i in range(len(layer)):
            # Compute dL/du_i
            neuron = layer.neurons[i]
            d_u = 0.
            u = neuron.u
            for j in range(len(next_layer)):
                d_u+=next_layer.neurons[j].d_u*next_layer.neurons[j].weights[i]*neuron.out*(1-neuron.out)
                ### IMPLEMENTATION REQUIRED ###
            neuron.d_u = d_u
            for j in range(len(neuron.weights)):
                # Compute dL/dw_ji
                neuron.d_weights[j] =neuron.d_u*neuron.inputs[j] ### IMPLEMENTATION REQUIRED ###

def apply_gradients(self, learning_rate):
    # Change weights according to computed gradients
    for i in range(1, len(self.layers)):
        layer = self.layers[i]
        for j in range(1, len(layer)):
            neuron = layer.neurons[j]
            for k in range(len(neuron.d_weights)):
                neuron.weights[k] -=learning_rate*neuron.d_weights[k]### IMPLEMENTATION REQUIRED ###
            neuron.bias -=learning_rate*neuron.d_u ### IMPLEMENTATION REQUIRED ###

MLP.compute_gradients = compute_gradients
MLP.apply_gradients = apply_gradients

<h2>Part 3: Training and Accuracy experiments</h2>

Train the network on the problem stated in Part 1, using the training set set120.x1x2rgb and the following parameters:
* learning rate: 2.0; 
* number of training cycles: 1000

In order to do so you will need to create a network definition file (as described in the introduction) containing the details of the network architecture. 
Evaluate the accuracy using set30.x1x2rgb as the test set (you can use the setdataset() function of the MLP class to change between training and test sets).

Experiment with the learning rate and the number of training cycles. What do you notice?

<h3>Your answer:</h3>

In [4]:
# This is an example code that you can adjust to your liking


train_datafile = "data/set120.x1x2rgb"
train_data = Dataset(train_datafile)

test_datafile = "data/set30.x1x2rgb"
test_data = Dataset(test_datafile)

nnfile = "data/NN.dat"

mlp = MLP(nnfile, train_data, print_step=100, verbose=False)

#mlp.train(1000, learning_rate=3)
#mlp.make_plot()
#print("=== Result on test data ===")
#mlp.setdataset(test_data)
#mlp.print_accuracy()



# Calculate average accuracy for 100 experience for different learning rates
for i in range(0,11):
    l=[]
    lrate=2+0.2*i
    for f in range(1,100):
        mlp.train(1000, learning_rate=lrate)
        mlp.setdataset(test_data)
        l.append(mlp.compute_accuracy()*100)
    print("For a learning rate equals to: "+str(lrate)+"is :" +str(sum(l)/len(l)))
    
    
# Calculate average accuracy for 100 experience for a given learning rate
# l=[]
# for f in range(1,100):
#     mlp.train(1000, learning_rate=3)
#     mlp.setdataset(test_data)
#     l.append(mlp.compute_accuracy()*100)
# print(sum(l)/len(l))


For a learning rate equals to: 2.0is :88.68686868686859
For a learning rate equals to: 2.2is :87.84511784511783
For a learning rate equals to: 2.4is :93.3333333333333
For a learning rate equals to: 2.6is :93.3333333333333
For a learning rate equals to: 2.8is :93.3333333333333
For a learning rate equals to: 3.0is :93.3333333333333
For a learning rate equals to: 3.2is :93.3333333333333
For a learning rate equals to: 3.4000000000000004is :93.3333333333333
For a learning rate equals to: 3.6is :93.3333333333333
For a learning rate equals to: 3.8is :93.3333333333333
For a learning rate equals to: 4.0is :93.3333333333333


<h3>Your comments</h3>
Whenever we repeat the experience, we keep getting different plots. This is due to the fact that each time we start from random values of the bias and weights.
We try different combinations of the number of hidden layers and the number of units for each layer.For each setting we repeat the experience 100 times and calculate the average accuracy for each one. The obtained results are displayed onthe table
<table>
  <tr style="border: 1px solid #dddddd;">
    <th style="border: 1px solid #dddddd;">NN.dat file </th>
    <th style="border: 1px solid #dddddd;">Accuracy</th>
  </tr>
  <tr style="border: 1px solid #dddddd;">
    <td style="border: 1px solid #dddddd;">0 3</td>
    <td style="border: 1px solid #dddddd;">56.90235690235693%</td>
  </tr>
  <tr style="border: 1px solid #dddddd;">
    <td style="border: 1px solid #dddddd;">1 3</td>
    <td style="border: 1px solid #dddddd;">59.66329966329965%</td>
  </tr>
  <tr style="border: 1px solid #dddddd;">
    <td style="border: 1px solid #dddddd;">2 3</td>
    <td style="border: 1px solid #dddddd;">96.8350168350169%</td>
  </tr>
  <tr>
    <td style="border: 1px solid #dddddd;">3 3</td>
    <td style="border: 1px solid #dddddd;">99.96632996632998%</td>
  </tr>
    <tr>
    <td style="border: 1px solid #dddddd;">4 3</td>
    <td style="border: 1px solid #dddddd;">100.0%</td>
  </tr>
</table>

This proves even though we are not 100% accurate, we can approximate the results to a pretty decent accuracy having less computations to do.

For the 2 3 model, we try different values of the learning rate going from 2 to 4 for a step of 0.2. We notice that starting from a certain value (2.4), we keep getting the same average accuracy le