<hr style="border-top: 3px solid black"></hr>
<center>
<h1><span style="color:black">Coding an Autoassociator from Scratch</span></h1>
</center>
<hr style="border-top: 3px solid black"></hr>

# 2°) With the hyperbolic tangent as activation function (non binary inputs)

**Activation function and its derivative:**
*Hyperbolic tangent*

$ \text{tanh}(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} = \frac{\text{sinh}(x)}{\text{cosh}(x)} $

$\text{tanh}'(x) = 1 - \text{tanh}^2(x)  $

**Error function:** 
*The Mean Squared Error*

$\text{MSE}(y_{\text{true}}, y_{\text{pred}}) = \frac{1}{n} \sum_{i=1}^{n} (y_{\text{pred},i} - y_{\text{true},i})^2$

| Phase             | Formulas |
|-------------------|----------|
| **Forward pass:** | *Weighted Sum:* $z_1 = p \cdot w_1 + b_1$ <br> $a_1 = \text{tanh}(z_1)$ |
| **Backward pass:** | $\delta z_1 = (a_1 - t) \cdot \text{tanh'}(z_1)$ <br> $\frac{\partial w_1}{\partial \text{loss}} = \frac{p^T \cdot \delta z_1}{\text{len}(p)}$ <br> $\frac{\partial b_1}{\partial \text{loss}} = \frac{\text{sum}(\delta z_1)}{\text{len}(p)}$ |
| **Delta Rule:**   | $w_1 = w_1 - \text{lr} \cdot \frac{\partial w_1}{\partial \text{loss}}$ <br> $b_1 = b_1 - \text{lr} \cdot \frac{\partial b_1}{\partial \text{loss}}$ |


In the *backward pass* the input pattern $p$ is transposed to a column vector such that it can be multiplied with the gradient $\delta z_1$. In linear algebra, the number of columns of the first matrix has to correspond to the number of lines of the second matrix during matrix multiplication. Then, $p$, wich is first defined as a line vector $(1, elements)$, has to change to be a column vector of shape $(elements, 1)$.
<br><br>
A dot product takes two sequences of the same lenght and outputs a single number. It is the sum of the products of the corresponding elements between the two vectors. Then, here, with p = (0,1) and w = (0.5, 0.2) we compute $z = p \cdot w = 0 \times .5 + 1 \times .2 = .2$
<br><br>
We divide by $len(p)$ to compute a mean of the gradients. This normalizes the gradients relatively to the number of elements in the input vector.

In [1]:
import numpy as np
import matplotlib.pyplot as plt

In [2]:
#define the patterns:
p1 = [0,0]
p2 = [0,1]
p3 = [1,0]
p4 = [1,1]
p5 = [0.5,0.5]
p6 = [0.1,0.5]
p_list = [p1,p2,p3,p4,p5,p6]
patterns = np.asarray(p_list)

t1 = 0.3
t2 = 0.3
t3 = 0.1
t4 = 0.3
t5 = 0.7
t6 = 0.4
targets = np.asarray([t1,t2,t3,t4,t5,t6])

input_shape = 2
output_shape= 1
epochs = 500
learning_rate = 0.01
lr = learning_rate

In [3]:
#including two stopping criteria
training_patterns = np.array([
    [-1, -1, 1],
    [1, 1, -1],
    [-1, -1, -1]])

test_patterns = np.array([
    [-1, -1, 1],
    [1, 1, -1],
    [-0.7, -0.8, 0.6]])

input_shape = 3
output_shape = 3
epochs = 90000000000
learning_rate = 0.001
def tanh(x):
    return np.tanh(x)
def tanhprime(x):
    return (1 - np.tanh(x)**2)

def initialize_parameters(input_shape, output_shape):
    wlayer1 = np.random.randn(input_shape, output_shape) * 0.1
    wlayer1 = (wlayer1 - np.min(wlayer1)) / (np.max(wlayer1) - np.min(wlayer1))
    blayer1 = np.zeros((1, input_shape))
    return {'w1': wlayer1, 'b1':blayer1}

def mse_loss(y_true, y_pred):
    return np.mean((y_pred - y_true) ** 2)
    
def fpass(p, parameters, activation_function):
    z1 = np.dot(p, parameters['w1']) + parameters['b1']
    activation1 = activation_function(z1)
    return {'z1': z1, 'a1': activation1}
    
def bpass(p, t, parameters, cache, activation_prime):
    dz1 = (cache['a1'] - t) * activation_prime(cache['z1'])
    p_reshaped = p.reshape(-1, 1)  #p doit être un vecteur colonne !!!!!!!!
    dw1 = np.dot(p_reshaped, dz1) / len(p)
    db1 = np.sum(dz1, axis=0, keepdims=True) / len(p)
    return {'dw1': dw1, 'db1': db1}

def delta_rule(parameters, grads, lr=learning_rate):
    parameters['w1'] -= lr * grads['dw1']
    parameters['b1'] -= learning_rate * grads['db1']
    return parameters

def train(activation_function, activation_prime, inputs=training_patterns, num_epochs=epochs, lr=learning_rate, input_shape=input_shape, output_shape=output_shape):
    parameters = initialize_parameters(input_shape, output_shape)
    for epoch in range(num_epochs):
        total_loss = 0
        list_loss = []
        for p in inputs:
            cache = fpass(p, parameters, activation_function=activation_function)
            loss = mse_loss(p, cache['a1'])
            total_loss += loss
            list_loss.append(loss)
            grads = bpass(p, p, parameters, cache=cache, activation_prime=activation_prime)
            parameters = delta_rule(parameters, grads=grads, lr=lr)
        average_loss = total_loss / len(inputs)
        if epoch % 200 == 0:
            print("Epoch number:", epoch, "MSE_loss:", average_loss)
        # #stopping criterion 1: 
        # if average_loss < 0.001:
        #     print("Epoch number:", epoch, "MSE_loss:", average_loss)
        #     break
        #stopping criterion 2: PREFERABLE
        if np.max(list_loss) < 0.0005:
            print("Epoch number:", epoch, "MSE_loss:", average_loss, "Max_loss:", np.max(list_loss))
            break
    return parameters

trained_parameters = train(activation_function=tanh, activation_prime=tanhprime, inputs=training_patterns)

Epoch number: 0 MSE_loss: 0.6293899143777136
Epoch number: 200 MSE_loss: 0.35668531122245944
Epoch number: 400 MSE_loss: 0.20922796654479184
Epoch number: 600 MSE_loss: 0.1429252548297336
Epoch number: 800 MSE_loss: 0.10812875419078753
Epoch number: 1000 MSE_loss: 0.08685133923075596
Epoch number: 1200 MSE_loss: 0.07247581729221853
Epoch number: 1400 MSE_loss: 0.062108081050278374
Epoch number: 1600 MSE_loss: 0.05428183574540587
Epoch number: 1800 MSE_loss: 0.04817056867827115
Epoch number: 2000 MSE_loss: 0.04327113482844266
Epoch number: 2200 MSE_loss: 0.03925916760494565
Epoch number: 2400 MSE_loss: 0.03591606145383371
Epoch number: 2600 MSE_loss: 0.03308915184649297
Epoch number: 2800 MSE_loss: 0.030668653428367654
Epoch number: 3000 MSE_loss: 0.02857364419173133
Epoch number: 3200 MSE_loss: 0.026743203514149994
Epoch number: 3400 MSE_loss: 0.025130619758512784
Epoch number: 3600 MSE_loss: 0.02369949532289635
Epoch number: 3800 MSE_loss: 0.02242106240361816
Epoch number: 4000 MSE_lo

In [4]:
# testing on input patterns, and reinjecting on the third to denoise it
test_patterns = np.array([
    [-1, -1, 1],
    [1, 1, -1],
    [-0.7, -0.8, 0.6]])
attractor = fpass(test_patterns[0], trained_parameters, activation_function=tanh)
print('SUPPOSED ATTRACTOR: ',np.round(attractor['a1'],4))
p = test_patterns[2]
cache = fpass(p, trained_parameters, activation_function=tanh)
print('STILL NOT REINJECTED: ',np.round(cache['a1'],4))
print('#########################################################')
cache = fpass(p, trained_parameters, activation_function=tanh)
for reinjection_number in range(10):
    reinjection_number +=1
    cache = fpass(cache['a1'], trained_parameters, activation_function=tanh)
    mse_error = np.mean((cache['a1'] - attractor['a1']) ** 2)
    euclidian_distance = np.absolute(np.subtract(cache['a1'], attractor['a1']))
    mean_distance = np.mean(euclidian_distance)
    print('Reinjection n°:', reinjection_number, ' Output: ',np.round(cache['a1'],4),
          'MSError:', np.round(mse_error,6))
    print('Euclidian distance:', np.round(euclidian_distance,4), 'Mean distance:',np.round(mean_distance,4))
    print('#########################################################')

SUPPOSED ATTRACTOR:  [[-0.9861 -0.9851  0.97  ]]
STILL NOT REINJECTED:  [[-0.9543 -0.9535  0.8247]]
#########################################################
Reinjection n°: 1  Output:  [[-0.9824 -0.9814  0.937 ]] MSError: 0.000373
Euclidian distance: [[0.0037 0.0037 0.033 ]] Mean distance: 0.0135
#########################################################
Reinjection n°: 2  Output:  [[-0.9848 -0.9837  0.9608]] MSError: 3e-05
Euclidian distance: [[0.0013 0.0014 0.0093]] Mean distance: 0.004
#########################################################
Reinjection n°: 3  Output:  [[-0.985  -0.9839  0.9644]] MSError: 1.1e-05
Euclidian distance: [[0.0011 0.0012 0.0056]] Mean distance: 0.0026
#########################################################
Reinjection n°: 4  Output:  [[-0.985  -0.9839  0.965 ]] MSError: 9e-06
Euclidian distance: [[0.0011 0.0012 0.005 ]] Mean distance: 0.0024
#########################################################
Reinjection n°: 5  Output:  [[-0.985  -0.9839  0.9651]