# Neural Network Conversion

In this jupyter notebook I will convert a 2d grid into a neural network.


Code reference: http://neuralnetworksanddeeplearning.com/chap1.html

I used Micheal Nielsen's online book Neural Networks and Deep Learning as a code reference on how to set up a neural network. The code examples in the text are written in python 2 so I am line by line testing, editing, and commenting. 

In [None]:
#standard import cell
import random as random
import numpy as np
import matplotlib.pyplot as plt

In [None]:
class Network(object):
    
    def __init__(self,sizes):
        
        """This creates an initializes the neural network as a python class. The neural 
        network will have different callable qualities and for the ease of shifting between
        different network structures it makes sense to make the network a class.
        
    Parameters: 
    sizes: a list of the number of nodes in each layer (list of int)
    
    Returns: A network of layers with the specified number of nodes, 
    initially having randomized weights and biases"""
        
        
        self.num_layers = len(sizes)
        #the list of sizes has one value per layer
        self.sizes = sizes
        #the sizes just are themselves
        self.biases = [np.random.normal(size =(y,1),scale=1) for y in sizes [1:]]
        #initially the biases are randomized and a bias is generated for every layer after
        #the first layer which is simply the input
        self.weights = [np.random.normal(size=(y,x)) 
                        for x, y in zip(sizes[:-1], sizes[1:])]
        #there is a weight for every node, exept for the input and output layers
        
        
        def feedforward(self, a):
            """This is a function of the Network class where we get the output based
            off of the previous nodes input. This function assumes a neural network where 
            only previous layers can contribute to future layers and there are no inputs going to 
            previous layers. 

            Parameters: 
            self: The neural network object in question
            a: the input to the node which is a float
            returns: a an output which is also a float"""

            for b, w in zip(self.biases, self.weights): # we get the weights and biases
                a = sigmoid(np.dot(w, a)+b) # we put the inputs through the sigmoid function to 
                #get our output
            return a
        

## How do neural networks improve?

In my (edited) prospectus I mentioned that neural network nodes which contribute to correct answers are promoted and given more weight, while incorrect nodes have their weight decreased. 

This makes sense logically, but how do we optimize this process. In a binary system with Y/N outputs it might be doable to just make correct nodes more important and incorrect nodes less important without overshooting the correct answer. However the ground water model is more complex than that. 

So how much more important do we make a node, and how do we optimize this process?

## What are sigmoid neurons and why do we need them?

So in the optimization of our neural network, as we train the network on the data we want incremental shifts to be made in response to each data sample for learning. If we made large shifts based off of each data point we could either: A. Overshoot the needed bias and get an answer that is less correct than the randomized weight. B. Overtrain our model to very accurately understand our training data, but be too attached to that data to generalize to other applications.

So it is fair to say that we want to make small changes in the weights and biases of our neurons that produce small changes in the final answer they produce.

So instead of having a threshold for our neurons to switch from true(1) to false(0), we need a gradient of outputs for each neuron(0..0.1...0.1112...) so that small changes can be reccorded at all layers of the network.

We can say that true (1) or false (0) on either side of a threshold value is a step function. A similar fucntion which is smooth and continuous is a sigmoid function.

So we have our inputs:
The number of the neuron is j, the input value is x, and the weight multiplied by that input value is w. The bias or threshold to overcome is b, since if the sum is less than b we get an answer less than zero, and if the sum is greater than b we get an answer greater than zero.

\begin{equation*}
\left(\sum_{k=1}^j w_j x_j \right)-b
\end{equation*}

And we have the sigmoid function 
The height s(z) is based off of the input z.

\begin{equation*}
s(z)=  \frac{1}{(1+ e^-z)}
\end{equation*}

So all we need to do is subsitute our inputs into the sigmoid equation to get our new
output values

\begin{equation*}
s(input)=  \frac{1}{(1+ e^-(\left(\sum_{k=1}^j w_j x_j \right)-b))}
\end{equation*}

If we consider the extremes of this equation, if the sum of the weighted inputs is much larger than the threshold our value is nearly 1, and if the sum is much less we get roughly zero. 




In [None]:
x_domain=np.linspace(-10,10,100)

step_line =[]
for i in range(50):
    step_line.append(0)
for i in range(50):
    step_line.append(1)
   
    
def sigmoid(x):
    return 1/(1+np.exp(-x))

#step_line = step_function(x_domain) 
sigmoid_line= sigmoid(x_domain)

fig,ax = plt.subplots()

ax.plot(x_domain,step_line)
ax.plot(x_domain,sigmoid_line)
ax.set_ylabel("neuron output")
ax.set_xlabel("neuron input")

#Great now I can add the sigmoid function to the Neural Network's functions!

So because the difference of any shifting variable is based off its derivating multiplied by the change in that variable, the small shift in output can be described by:

\begin{equation*}
\Delta (output) =\left(\sum_{k=1}^j \frac{\delta output}{\delta w_j}\Delta w_j+  \frac{\delta output}{\delta b_j}\Delta b_j\right)
\end{equation*}

This is very useful because a change in the output is a linear function of changes in the weights and biases. So for any desired correction in the output we can find a change in the weights and biases to make that correction. 

### So what corrections do we need to make?

Well a start is figuring out how far off we are from our desired target. And we can find the exact error since in the training set of our data we have the actual value (in this case percent water saturation)

So to do this we need a function which describes the error which we can minimize. In machine learning this is also called the cost function.

For my cost function I will use Mean Squared Error or MSE. 

\begin{equation*}
Cost(w,b) =\frac{1}{2n}\left(\sum_{k=1}^j abs (y(x) - guess)^2\right)
\end{equation*}

where w and b are the weights and biases, and n is the number of the training data points. We are dividing by 2n since we are optimizing for two variables the weights and the bias.

The guess is the value based off of w and b, which is generated based off of the neural network's interactions.

### Now since we are trying to find the minimum of a function of multiple variables we can use a gradient (flashback to Calc III)!

So what we need to do is to find where the gradient of the cost function is zero.




