# Neural Network from Scratch

In this file we develop a Neural Network from scratch, just using mathematical libraries of numpy for implemenatation.

In [1]:
import math
import numpy as np

np.set_printoptions(precision=2, suppress=True)

Let us define a couple of activation functions (sigmoid and relu) and their derivatives.

In [2]:
##############################################
# activation functions
##############################################

def sigmoid(x): return 1 / (1 + math.exp(-x))

def sigderiv(x): return (sigmoid(x)*(1-sigmoid(x)))

def relu(x):
  if x >= 0: return x
  else: return 0

def reluderiv(x):
  if x >= 0: return 1
  else: return 0

def activate(x): return sigmoid(x)  #relu(x)
def actderiv(x): return sigderiv(x) #reluderiv(x)

A neural network is just a collection of numerical vectors describing the weigths of the links at each layer. For instance, a dense layer between n input neurons and m output neurons is defined by a matrix w of dimension nxm for the weights and a vector b of dimension m for the biases. 

Supposing the network is dense, its architecture is fullly specified by the number of neurons at each layer. For our example, we define a shallow network with 8 input neurons,
3 hidden neurons, and 8 output neurons, hence with dimension [8,3,8].

We initialize weights and biases with random values.

In [3]:
##############################################
# net parameters - Initializating the parameters
##############################################

dim = [8,3,8]
l = len(dim)

w,b = [],[]

for i in range(1, l):
  w.append(np.random.rand(dim[i-1], dim[i]))
  b.append(np.random.rand(dim[i]))

## Creating the Network 
For the **backpropagation algorithm** we also need to compute, at each layer, **the weighted sum z** (inputs to activation), **the activation a**, and **the partial derivative d** of the error relative to z.

We define a version of the backpropagation algorithm working "**on line**", processing a single training sample (x,y) at a time, and updating the nework parameters at each iteration. The backpropagation function also return the current error  relative to (x,y).

An epoch, is a full pass of the error update on all training data; it returns the cumulative error on all data.


**Below code Explanation:**

This code is a basic feedforward neural network with gradient descent training using stochastic gradient descent (SGD) on-line learning. 
- **Initialization**: The code initializes lists `z`, `a`, and `d`, where `z` stores the activations before applying the activation function, `a` stores the activated values after applying the activation function, and `d` stores the errors at each layer.

- **Feedforward Pass**: The `update` function implements the feedforward pass of the neural network. It computes the activations of each layer using the dot product of weights and activations from the previous layer, then applies an activation function to get the activated values.

- **Backpropagation**: After computing the output error (`d[l-2]`) using the desired output (`y`) and the actual output (`a[l-1]`), the code performs backpropagation to compute the errors at each hidden layer (`d[i]`). It updates these errors by propagating them backward through the network.

- **Weight Update**: Finally, the code updates the weights (`w`) and biases (`b`) using the computed errors and activations from each layer.

- **Training**: The `epoch` function seems to implement a single epoch of training. It iterates through the provided data, calling the `update` function for each input-output pair, accumulating the total error.

This code performs stochastic gradient descent (SGD) by updating the weights after each individual training example `(x, y)` is presented. This approach is called "on-line" learning because it updates the model parameters for each data point.

This code implements a basic neural network training loop with a single hidden layer and a configurable activation function.

In [14]:
##############################################
# training - on line, one input data at a time
##############################################

mu = 1

z,a,d=[],[],[]

## Parameter initialization
for i in range(0,l): 
    a.append(np.zeros(dim[i]))

for i in range(1,l):
    z.append(np.zeros(dim[i]))
    d.append(np.zeros(dim[i]))

def update(x,y):
    #input                
    a[0] = x
    
    #feed forward
    for i in range(0,l-1):
        z[i] = np.dot(a[i],w[i])+b[i]
        a[i+1] = np.vectorize(activate)(z[i])
  
    #output error
    d[l-2] = (y - a[l-1])*np.vectorize(actderiv)(z[l-2])
  
    #back propagation
    for i in range(l-3,-1,-1):
        d[i]=np.dot(w[i+1],d[i+1])*np.vectorize(actderiv)(z[i])
  
    #updating
    for i in range(0,l-1):
        for k in range (0,dim[i+1]):
            for j in range (0,dim[i]):
                w[i][j,k] = w[i][j,k] + mu*a[i][j]*d[i][k]
            b[i][k] = b[i][k] + mu*d[i][k]
        
        if False:
          print("d[%i] = %s" % (i,(d[i],)))
          print("b[%i] = %s" % (i,(b[i],)))
      #print("error = {}".format(np.sum((y-a[l-1])**2)))  
    return np.sum((y-a[l-1])**2)

def epoch(data):
    e = 0
    for (x,y) in data:
        e += update(x,y)
    return e

## Training data on this Model
Now we define same data and fit the network over them. 

- We want to define a simple example of autoencoder, taking in input a one-hot representation of the numbers between 0 and 7, and trying to compress them to a
- Boolean internal representation on 3 bits.

**The following code:**
1. Defines a matrix `X` which represents a set of one-hot encoded vectors. Each row of `X` represents a different class, with each column representing a binary feature indicating the presence or absence of that feature for the corresponding class.

2. Defines a function `data()` that returns a generator of paired data points, where each input is paired with itself. This function essentially creates a dataset where each input is paired with its own label.

3. Defines a stopping criterion `final_error` and a loop that repeatedly calls the `epoch` function with the dataset generated by `data()` until the error (`dist`) falls below `final_error`.
    - We have defined the epoch function previously
    
4. After training, it runs a forward pass for each input vector in `X`, printing out the activations of the hidden layer.


In [19]:
X = [[1,0,0,0,0,0,0,0],
     [0,1,0,0,0,0,0,0],
     [0,0,1,0,0,0,0,0],
     [0,0,0,1,0,0,0,0],
     [0,0,0,0,1,0,0,0],
     [0,0,0,0,0,1,0,0],
     [0,0,0,0,0,0,1,0],
     [0,0,0,0,0,0,0,1]]

def data(): 
    return zip(X,X)  

final_error = .002
dist = epoch(data()) 

while dist > final_error:
#     print("distance= %f" % dist)
    dist = epoch(data())

print("distance= %f" % dist)

for x in X:
    print("Input = %s" % (x,))
    a[0] = x

    #feed forward
    for i in range(0,l-2):
        z[i] = np.dot(a[i],w[i])+b[i]
        a[i+1] = np.vectorize(activate)(z[i])
  
    print("Hidden level = %s" % (a[i+1],),  "\n")
    
    z[l-2] = np.dot(a[l-2],w[l-2])+b[l-2]
    a[l-1] = np.vectorize(activate)(z[l-2])
    #print("output = %s" % (a[l-1],))

distance= 0.002000
Input = [1, 0, 0, 0, 0, 0, 0, 0]
Hidden level = [0.01 0.02 0.01] 

Input = [0, 1, 0, 0, 0, 0, 0, 0]
Hidden level = [0.99 1.   1.  ] 

Input = [0, 0, 1, 0, 0, 0, 0, 0]
Hidden level = [0.94 0.01 0.01] 

Input = [0, 0, 0, 1, 0, 0, 0, 0]
Hidden level = [0.99 0.   0.97] 

Input = [0, 0, 0, 0, 1, 0, 0, 0]
Hidden level = [0.02 0.   0.91] 

Input = [0, 0, 0, 0, 0, 1, 0, 0]
Hidden level = [0.01 0.97 0.01] 

Input = [0, 0, 0, 0, 0, 0, 1, 0]
Hidden level = [0.98 0.98 0.  ] 

Input = [0, 0, 0, 0, 0, 0, 0, 1]
Hidden level = [0.   0.96 0.99] 



### Exercises

1.   change the specification of the network to allow a different activation function for each layer;
2.   modify the backpropagation algorithm to work on a minibatch of samples.



