<a href="https://colab.research.google.com/github/AlbertoMontanelli/Machine-Learning/blob/class_unit/neural_network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Notation conventions
* **net** = $X \cdot W+b$, $\quad X$ : input matrix, $\quad W$ : weights matrix, $\quad b$ : bias array ; \\
* **number of examples** = $l$ ;
* **number of features** = $n$ ;
* **input_size** : for the layer $i$ -> $k_{i-1}$ : number of the units of the previous layer $i-1$ ;
* **outputz_size** : for the layer $i$ -> $k_{i}$ : number of the units of the current layer $i$;
* **output_value** $=f(net)$ of the last layer, where $f$ is the activation function.

### Input Layer $L_0$ with $k_0$ units :
* input_size = $n$;
* output_size = $k_0$;
* net = $X \cdot W +b$, $\quad X$ : $l \ \textrm{x} \ n$ matrix, $\quad W: n \ \textrm{x} \ k_0$ matrix, $\quad b = 1 \ \textrm{x} \ k_0$ array; \\
$⇒$ net: $l \ \textrm{x} \ k_0$ matrix.

### Generic Layer $L_i$ with $k_i$ units :
* input_size = $k_{i-1}$ ;
* output_size = $k_i$ ;
* net = $X \cdot W+b$, $\quad X$ : $l \ \textrm{x} \ k_{i-1}$ matrix, $\quad W: k_{i-1} \ \textrm{x} \ k_i$ matrix, $\quad b = 1 \ \textrm{x} \ k_i$ array ; \\
$⇒$ net : $l \ \textrm{x} \ k_i$ matrix .

### Online vs mini-batch version :
* online version: $l$ = 1 ;
* mini-batch version: $l$ = number of examples in the batch .

#Activation functions
Definition of the activation functions and their derivatives. \\
Definition of the loss functions. \\
scrivere come cambiano le loss nel caso online vs batch


In [None]:
import numpy as np

# dobbiamo capire come funziona il minibatch vs online
# dobbiamo capire che funzioni di attivazione usare e quali derivate
# da iniziare a fare successivamente: cross validation, test vs training error, ricerca di iperparametri (grid search, n layer, n unit,
# learning rule), nr epochs/early stopping, tikhonov regularization, momentum, adaline e altre novelties

def sigmoid(net):
    return 1 / (1 + np.exp(-net))

def d_sigmoid(net):
    return np.exp(-net) / (1 + np.exp(-net))**2

def tanh(net):
    return np.tanh(net)

def d_tanh(net):
    return 1 - (np.tanh(net))**2

"""   DA RIVEDERE

def softmax(net):
    return np.exp(net) / np.sum(np.exp(net), axis = 1, keepdims=True)

def softmax_derivative(net):

    # batch_size is the number of the rows in the matrix net; current_neuron_size is the number of the columns
    batch_size, current_neuron_size = net.shape

    # initialization of Jacobian tensor: each example in the batch (batch_size) is the input to current_neuron_size neurons,
    # for each neuron we compute current_neuron_size derivatives with respect to the other neurons and itself. This results in a
    # batch_size x current_neuron_size x current_neuron_size tensor.
    jacobians = np.zeros((batch_size, current_neuron_size, current_neuron_size))

    for i in range(batch_size): # for each example i in the batch
        s = net[i].reshape(-1, 1)  # creation of a column vector of dimension current_neuron_size x 1, s contains all the features of
                                   # the example i
        jacobians[i] = np.diagflat(s) - np.dot(s, s.T)

    return jacobians
"""

def softplus(net):
    return np.log(1 + np.exp(net))

def d_softplus(net):
    return np.exp(net) / (1 + np.exp(net))

def linear(net):
    return net

def d_linear(net):
    return 1

def ReLU(net):
    return np.maximum(net, 0)

def d_ReLU(net):
    return 0 if(net<0) else 1

def mean_squared_error(y_true, y_pred):
    return np.sum((y_true - y_pred)**2)

def d_mean_squared_error(y_true, y_pred):
    return 2 * (y_true - y_pred)  # we'd get a minus but it's included in the computation of the learning rule

def mean_euclidian_error(y_true, y_pred):
    return np.sqrt(np.sum((y_true - y_pred)**2))

def d_mean_euclidian_error(y_true, y_pred):
    return (y_true - y_pred) / np.sqrt(np.sum((y_true - y_pred)**2))  # we'd get a minus but it's included in the computation of the learning rule

#  class Layer
**Constructor parameters :**
 * input_size : $k_{i-1}$ ;
 * output_size : $k_i$ ;
 * activation_function ;
 * activation_derivative . \\

**Constructor attributes :**
* self.weights : $k_{i-1} \ \textrm{x} \ k_i$ matrix . \\
Initialized extracting randomly from a uniform distribution [-1/a, 1/a], where a = $\sqrt{k_{i-1}}$ ;
* self.biases : $1 \ \textrm{x} \ k_i$ array. Initialized to zeros;
* self.activation_function;
* self.activation_derivative .

**Methods :**
* forward : DITEMI SE È GIUSTO
 * parameter :
   * input_array : matrix $X$ (see above for the case $L_0$ or $L_i$) .
 * attributes :
   * self.input : input_array ;
   * self.net : net matrix $X \cdot W + b$ (see above for the case $L_0$ or $L_i$) .
 * return -> output = $f(net)$, where $f$ is the activation function; $f(net)$ has the same dimensions of $net$.
* backward :
 * parameters :
   * d_Ep : target_value $-$ output_value, element by element; d_Ep is a $l \ \textrm{x} \ number\_of\_targets$ matrix;
   * learning_rate.
 * return -> sum_delta_weights $= \delta \cdot W^T$

In [None]:
class Layer:
    def __init__(self, input_size, output_size, activation_function, activation_derivative):
        self.weights = np.random.uniform(low=-1/np.sqrt(input_size), high=1/np.sqrt(input_size), size=(input_size, output_size))
        self.biases = np.zeros((1, output_size))
        self.activation_function = activation_function
        self.activation_derivative = activation_derivative

    # funzione che ci permette di calcolare gli output del layer. PRende come input l'output del layer precedente
    def forward(self, input_array):
        self.input = input_array # array 1D of previous unit or matrix with number of row = number of examples
        self.net = np.dot(self.input, self.weights) + self.biases # if I have more than 1 exaples, numpy uses brodcasting
        output = self.activation_function(self.net) #f(net)
        return output

    def backward(self, d_Ep, learning_rate):
        # d_Ep = target - output solo per output layer, il resto delle volte d_Ep = sum_delta_weights
        delta = d_Ep * self.activation_derivative(self.net)
        self.weights += learning_rate * np.dot(self.input.T, delta)
        self.biases += learning_rate * np.sum(delta, axis = 0, keepdims = True)
        sum_delta_weights = np.dot(delta, self.weights.T)
        return sum_delta_weights


#class NeuralNetwork

In [None]:
class NeuralNetwork:
    def __init__(self):
        self.layers = [] # questa riga serve ad inizializzare una lista vuota. tutti i layers che verranno creati verranno aggiunti a questa lista

    def add_layer(self, layer):
        self.layers.append(layer)

    # ora dobbiamo fare la backprop per tutti i layer
    def forward(self, input): # questo input array sono proprio i dati che abbiamo a disposizione
        for layer in self.layers:
            input = layer.forward(input) # restituisce l'array di output e lo inserisco in input così da usarlo per il layer dopo
        return input

    def backward(self, d_Ep, learning_rate):
        for layer in reversed(self.layers): # così attraversa la lista in ordine inverso. il gradiente dell'errore propaga all'inverso
            d_Ep = layer.backward(d_Ep, learning_rate)

    # x: dataset, examples x features
    # epochs:  quante volte passo attraverso la rete neurale. Lo scelgo io??
    # loss function and derivative: MSE

    def train(self, x_train, target, epochs, learning_rate, loss_function, loss_function_derivative):
        for epoch in range(epochs):
            # Forward propagation
            predictions = self.forward(x_train) # ritorna gli output dell'ultimo layer

            # Compute loss and loss gradient for backward function
            loss = loss_function(target, predictions)
            loss_gradient = loss_function_derivative(target, predictions)

            # Backward propagation
            self.backward(loss_gradient, learning_rate)

            # Print loss every 10 epochs
            if epoch % 1 == 0:
                print(f"Epoch {epoch}, Loss: {loss}")


#Unit Test

In [None]:
#test
x = np.random.rand(3, 3)

target = np.random.rand(3, 2)
print('target:', target)
print('\n')

layer_one = Layer(3, 2, linear, d_linear)

layer_two = Layer(2, 2, linear, d_linear)

NN = NeuralNetwork()
NN.add_layer(layer_one)
NN.add_layer(layer_two)
NN.train(x, target, 2, 0.01, mean_squared_error, d_mean_squared_error)

target: [[0.03530235 0.14780322]
 [0.59554787 0.44419099]
 [0.47828313 0.72748292]]


Epoch 0, Loss: 2.015579698952875
Epoch 1, Loss: 1.7046882498737232
