# Back Propagations/Neural NetworkNotes

Neural networks have a input layer. The input layer is of course the layer that takes in the inputs. We have as many input neurons as we do inputs. For example if we have a dataset of cats and dogs some example inputs would be `ear length`, `fur length`, `weight`, `height`. This would mean we would have a input layer with 4 neurons. These inputs would be in the input layer and would then be passed forward to the next layer. We pass them by getting the sum of the weights times the inputs and adding a bias(sum number). That value is then passed into an activation function. A common function is the sigmoid function so our values will be between 0 and 1. Depening on the value we get from the activation function the neuron will feed forward that information to the next layer or it will not (depends on value). 
<!-- Each nueron has a bias which is a number between 0 and 1. -->
Depending on how many hidden layers we have this would happen again passing from one layer to another. But eventually we would get to the output layer which in this case we have a output layer for 2 neurons. One for dog and one for cat. These would output the probability from 1 to 0 of it being a dog or cat. 

The math the sum of the weights times the neurosn plus the bias looks like this: Where $w$ is a weight, $n$ is a neuron and $b$ is the bias of the neuron we are moving the data forward to
$$ = (w_0*n_0 + w_1*n_1 + w_2*n_2 + w_3*n_3 .... + w_n*n_n + b) $$
we can also write this using vectors: 
# TODO: Seperate the math
$$\vec{w} =
\begin{bmatrix}
w_{0,0} & w_{0,1} & w_{0,2} & 0\\
 & .... & & 0  \\
  &   w_{n,3} & & 0 
\end{bmatrix},
\vec{n}=[n_0, n_1, n_2, n_3 ..... i_n],
\vec{b}=[b_0, b_1, b_2, b_3 ..... b_n]$$
$$activation = \sigma(\vec{w} * n + b)$$

In [1]:
from matplotlib.colors import ListedColormap # for grgphing decision boundaries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import random
from sklearn.model_selection import train_test_split

In [2]:
data_folder = 'data'
X_train = pd.read_csv(f'./{data_folder}/X_train.csv')
y_train = pd.read_csv(f'./{data_folder}/y_train.csv')
X_test = pd.read_csv(f'./{data_folder}/X_test.csv')
y_test = pd.read_csv(f'./{data_folder}/y_test.csv')

In [3]:
X_train

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.5,2.4,3.7,1.0
1,4.8,3.0,1.4,0.1
2,5.5,2.6,4.4,1.2
3,5.0,3.2,1.2,0.2
4,6.9,3.1,5.1,2.3
...,...,...,...,...
107,6.3,2.7,4.9,1.8
108,7.2,3.0,5.8,1.6
109,5.8,4.0,1.2,0.2
110,5.2,3.4,1.4,0.2


In [4]:
[X_train[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']].iloc[0]]

[sepal length (cm)    5.5
 sepal width (cm)     2.4
 petal length (cm)    3.7
 petal width (cm)     1.0
 Name: 0, dtype: float64]

# Defining the network

In [5]:
# Selecting the inputs
inputs = [X_train.iloc[1]['sepal length (cm)'], X_train.iloc[1]['sepal width (cm)'], X_train.iloc[1]['petal length (cm)'], X_train.iloc[1]['petal length (cm)']]
f'Number of input neurons: {len(inputs)}'

'Number of input neurons: 4'

In [6]:
# Creating a second layer
second_layer = [random.uniform(0, 1)] * 4
f'Number of neurons at second layer: {len(second_layer)}'

'Number of neurons at second layer: 4'

In [7]:
def create_weight_matrix(prev_layer, current_layer) -> list:
    """create a matrix of weights"""
    len_prev_layer, len_current_layer = len(prev_layer), len(current_layer)
    matrix_weights = [[random.uniform(0, 1) for x in range(len_prev_layer)] for y in range(len_current_layer)]
    return np.array(matrix_weights)

In [8]:
# The weights connecting the input layer to second layer
input_to_second_layer_weights = create_weight_matrix(inputs, second_layer)
f'Number of weights from input layer to second: {(input_to_second_layer_weights.size)}'

'Number of weights from input layer to second: 16'

In [9]:
# Creating a third layer
third_layer = [random.uniform(0, 1)] * 3
f'Number of neurons at third layer: {len(third_layer)}'

'Number of neurons at third layer: 3'

In [10]:
# The weights connecting the second layer to third layer
second_to_third_layer_weights = create_weight_matrix(second_layer, third_layer)
f'Number of weights from second layer to third: {second_to_third_layer_weights.size}'

'Number of weights from second layer to third: 12'

In [11]:
# Select the output values to get number of outpts
output_layer_length = len(set((y_train.iloc[:, 0].tolist())))
output_layer = [random.uniform(0, 1)] * output_layer_length
f'Number of neurons at output layer: {len(output_layer)}'

'Number of neurons at output layer: 3'

In [12]:
# The weights connecting the third layer to output layer
third_to_output_layer_weights = create_weight_matrix(third_layer, output_layer)
f'Number of weights from third layer to output: {third_to_output_layer_weights.size}'

'Number of weights from third layer to output: 9'

# Feeding forward the inputs

In [13]:
def sigmoid(num):
    return (1/(1+np.exp(-(num))))

In [14]:
def feed_forward(weights, layer) -> list:
    activations = []
    for w in weights:
        activation = sigmoid(np.dot(w, layer))
        activations.append(activation)
        print(activation)
        print()
    return activations

In [15]:
input_to_second_activations = feed_forward(input_to_second_layer_weights, inputs)

0.9997468190219861

0.9984373971531718

0.9521088299663936

0.9955099760013053



In [16]:
second_to_third_layer_activations = feed_forward(second_to_third_layer_weights, input_to_second_activations)

0.7652229881119174

0.8459972723071569

0.8035680228908066



In [17]:
feed_forward(third_to_output_layer_weights, second_to_third_layer_activations)

0.7399904950710097

0.7907503747433197

0.7422785462576861



[0.7399904950710097, 0.7907503747433197, 0.7422785462576861]

# Back Propagation

This is of course how the neural network works but there is a lot going on and how do we know what are weights and biases are. How is it trained, how do we fix the weight and biases if we have a lot of error. 

How does the cost function change with respect to weights, bias, activation

https://www.youtube.com/watch?v=tIeHLnjs5U8&list=PL_h2yd2CGtBHEKwEH5iqTZH85wLS-eUzv&index=4

The way to figure this out is using back propagation. Back propagation is a algorithm that helps us figure out how we change the weights and biases. 

Similar to linear regression with gradient descent we want to find the the paramaters that will minimize the error in our cost funtion. We of course do this with calculus by taking the gradient of our cost function. Gradient descent will give us the direction in which a function is increasing its fastest. If we take the negative gradient of our function we will get the direction in which the function is decreasing and that is where our error will be minimal. 

In our first example we set the weights and biases to some random number. What we now want to do is change our weights and biases to minimize the cost function. 

For example if we take an example of `iris setosa` as one of the flowers we are trying to classify. We want the output neuron to have a high activation for the output neruon of `iris setosa` by the time the data has been feed forward. We can do this by chaning the weights and by chaning the bias. 

One thing to keep in mind is changing the size of certain weights can have a better outcome on the output. For example we can have two weights but only one of them will have a signinficant effect on the activation. While changing the other one will little to no effect. 

Now that we know that if we want a output neuron to have a high activation for an example we can look back at the weights that connect it to the previous layer and change those weights and biases. All these changes made for the example are the way that the example wants to change the weights and biases to get that desired output. We can also do that for the layer before and the layer before that.

Of course we have to do this for every example in our dataset that we we are classifying. If we only do that for the `iris setosa`. We will only be able to classify the `iris setosa`. Lastly all the changes for each example in our dataset to each weight and biases are added and average for each training example. All of those changes to the weights and biases are the negative gradient. 

Now what we want to do is find out by how much to change each weight and bias by. Knowing that we can direclty affect the impact of the output neuron by chaning the weights and biases of the layer connecting them with also that changing certain weights and biases will have a greater affect we can now. 

In [18]:
# class NeuralNetwork:
#     def __init__(self, X, y):
#         self.__input_layer, self.__output_layer = self.initialize_input_output_layer(X, y)
#         self.__y = y
#         self.__weights = []
#         self.__layers = []
#         self.__biases = []

#     @classmethod
#     def initialize_input_output_layer(self, X:list, y,neuron_type='input'):
#         """ function to give a neuron a activation value """
#         input_layer = [i for i in X]
#         output_layer = [0] * 3
#         if y == 'Iris-setosa':
#             output_layer[0] = 1
#         elif y == 'Iris-versicolor':
#             output_layer[1] = 1
#         else: 
#             output_layer[2] = 1
#         return input_layer, output_layer
    
#     def get_input_layer(self):
#         """ function to get the input layer """
#         return self.__input_layer

#     def get_output_layer(self):
#         """ function to get the input layer """
#         return self.__output_layer
    
#     def get_weights(self, num_weights:int):
#         return self.__weights[num_weights]
    
#     def set_layers(self, new_layers):
#         self.__layers = new_layers.copy()

#     def merge_layers(self):
#         """function to merge layers into one list"""
#         all_layers = []
#         layers = self.__layers
#         input_layer = self.get_input_layer()
#         output_layer = self.get_output_layer()
#         all_layers.append(input_layer)
#         all_layers.extend(layers)
#         all_layers.append(output_layer)
#         return all_layers

#     def create_biases(self):
#         """ create biases """
#         # create a list of biases for each layer except output
#         self.__biases = [[0] for i in range(len(self.__layers)-1)]
#         return 0
    
#     def create_hidden_layers(self, num_layers:int, neurons_in_layer:list):
#         """ function to create hidden layers """
#         # for num of layers
#         for i in range(num_layers):
#             # create a empty list for neurons
#             current_layer = []
#             # create a neurons
#             for j in range(neurons_in_layer[i]):
#                 # neuron = self.create_neuron(0.0 ,f'hidden layer #{i+1}' )
#                 current_layer.append(0.0)
#             # add list of neurons to list of layers
#             self.__layers.append(current_layer)
#         # create the weights now that we have layers
#         self.create_weights()
#         return 0
 

#     def create_weights(self):
#         # merge all the layers into one big list
#         layers = self.merge_layers()
#         # set the merged layers to the class layers
#         self.set_layers(layers)
#         # create the biases for the layers
#         self.create_biases()
#         # create weights (matrix) connecting the layers to each other
#         # for every layer
#         for layer in range(len(layers)-1):
#             # get the length of the current layer
#             current_layer = len(layers[layer])
#             # get the length of the next layer
#             next_layer = len(layers[layer+1])
#             # create a matrix and append to list using the lengths of the layers 
#             self.__weights.append(np.random.rand(current_layer,next_layer))
        
#         return 0 

    
#     def sigmoid(self, activation):
#         return (1/(1+np.exp(-(activation)))) 

#     def feed_forward(self):
#         """ feed the neural network forward """
#         layers = self.__layers
#         for i in range(len(layers)-1):
#             # calculate the activations
#             activations = self.sigmoid(np.dot(layers[i], self.get_weights(i)) + self.__biases[i])
#             # pass the activations to the next layer
#             layers[i+1] = activations.copy()
#         return layers[-1]

#     def back_propogation(self):
#         predicted = self.feed_forward()
#         layers = self.__layers # layer before output
#         layer = layers[-2]
#         print(layer)
#         print(self.__y)
#         cost_function_derivative = 2*(layer-self.__y)
#         print(cost_function_derivative)
#         return 0

In [19]:
example = [X_train.iloc[1]['sepal length (cm)'], X_train.iloc[1]['sepal width (cm)'], X_train.iloc[1]['petal length (cm)'], X_train.iloc[1]['petal length (cm)']]
example

[4.8, 3.0, 1.4, 1.4]

In [20]:
output = y_train.iloc[1]['class']
output

'Iris-setosa'

In [21]:
# nn = NeuralNetwork(example, output)

In [22]:
# nn.create_hidden_layers(2, [4, 3])

In [23]:
# nn.back_propogation()

In [24]:
class Network:
    def __init__(self, sample_input, num_layers, neurons_per_layer):
        self.layers = self.create_layers(sample_input, num_layers, neurons_per_layer)
        self.weights = self.create_weights()
        self.biases = [[0] for i in range(num_layers+1)]
        
    @classmethod
    def create_layers(self, sample_input, num_layers, neurons_per_layer):
        """ function to create hidden layers """
        input_layer = [i for i in sample_input]
        output_layer = [0 for i in range(3)]
        layers = []
        layers.append(input_layer)
        # for num of layers
        for i in range(num_layers):
            # create a empty list for neurons
            current_layer = []
            # create a neurons
            for j in range(neurons_per_layer[i]):
                # neuron = self.create_neuron(0.0 ,f'hidden layer #{i+1}' )
                current_layer.append(0.0)
            # add list of neurons to list of layers
            layers.append(current_layer)
        layers.append(output_layer)
        print(layers)
        return layers

    def create_weights(self):
        weights = []
        layers = self.layers
        for i in range(len(layers)-1):
            # get the length of the current layer
            current_layer = len(layers[i])
            # get the length of the next layer
            next_layer = len(layers[i+1])
            # create a matrix and append to list using the lengths of the layers 
            weights.append(np.random.rand(current_layer,next_layer))
        return weights 
        
        
    def sigmoid(self, activation):
        return (1/(1+np.exp(-(activation))))
    
    def feed_forward(self, X):
        layers = self.layers
        for i in range(len(layers)-1):
            activations = self.sigmoid(np.dot(layers[i], self.weights[i]) + self.biases[i])
            # pass the activations to the next layer
            layers[i+1] = activations.copy()
        return (layers[-1], max(layers[-1]))

    def back_propagation(self, output, y):
        layers = self.layers
        print(layers)
        print()
        for i in range((len(layers)), 0,-1):
            cost_function_derivative = 2*(output - y)
            print(i-1)
            sigmoid_derivative = cost_function_derivative * (-(np.exp(-(output))))/((1+np.exp(-(output))) ** 2)
            prev_layer = layers[i-1]
            print(prev_layer)
            print()
        return 0

In [25]:
new_class = Network(example,2, [4,3])

[[4.8, 3.0, 1.4, 1.4], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0, 0, 0]]


In [26]:
X = [X_train.iloc[1]['sepal length (cm)'], X_train.iloc[1]['sepal width (cm)'], X_train.iloc[1]['petal length (cm)'], X_train.iloc[1]['petal length (cm)']]

In [27]:
last_layer, predicted = new_class.feed_forward(X)

In [28]:
y = y_train.iloc[1]['class']
y

'Iris-setosa'

In [29]:
new_class.back_propagation(predicted, 1)

[[4.8, 3.0, 1.4, 1.4], array([0.99436741, 0.99754766, 0.99790173, 0.99129996]), array([0.80639164, 0.72572826, 0.8432921 ]), array([0.84828101, 0.84991827, 0.70959594])]

5


IndexError: list index out of range

In [None]:
# def feed_forward_and_back_propogate()