# Week 8
## Toy Neural Network

_This code was all pretty much copied from [this guide](https://iamtrask.github.io/2015/07/12/basic-python-network/) by "i am trask". I also followed [this video](https://www.youtube.com/watch?v=aircAruvnKk) to understand it better._

I did this during my free time and I realised that this is rather important to understand. So I decided to write part of my Week 8 journal as this.

Fundamentally, how I understand the neural network now is that essentially, at it's core, it's just one giant `y = mx+b`.

#### For Predictions

To extract a binary prediction out of a neural network with 1 layer, we actually just:

1. Pass in a matrix of features 
2. We product that matrix of features with a matrix of weights (our synapse) 
3. We apply the sigmoid function to our result to end up at the hidden layer
4. Then we product the layer results with another matrix of weights to make our outcome

Said in psuedo-code terms:
```
sigmoid([features]*[weights0]) = hidden_layer
sigmoid([hidden_layer]*[weights1]) = prediction
```

#### For learning

To create the architecture and features above, we:

- For our weights/synapses:
    - We randomly assign random values to a matrix of shape
        
        ```(layer_n, layer_n+1)```
    - E.g.
    
        ```(num_of_features, num_of_nodes_in_layer0)``` 

- For our algorithm:


1. Create layers by producting our layers with the last synapse before the current layer.
    - E.g.
    
    ```layer_2 = [layer_1][synapse_2]```
    
    ```layer_n = [layer_n-1][synapse_n]```

2. Compare the last layer (our predictions layer) with the actual values
    
    ```error = actual_y - last_layer``` 

3. Get that error and times it by the derivative of the sigmoid function applied onto the last layer to get how much we should shift each weight by.

    ```delta = error*deriv_sigmoid(last_layer)```
    
    - We have to use the derivative of the sigmoid function, because we want to know the gradient of the value, this gradient is essentially how confident our algorithm is with our given features.
    - The derivative is producted onto the error values because it is essentially controlling how much the weight should change by.

4. And apply the last two steps for every layer excluding our input layer

5. Apply this as many times to get the network to correct itself.

That aside, let view the...

## Code

In [1]:
import numpy as np

# The sigmoid function
def sig(x, deriv=False):
    if(deriv==True):
        return x*(1-x)
    return 1/(1+np.exp(-x))

In [2]:
# 3 feature columns
X = np.array([[0,0,1],
              [0,1,1],
              [1,0,1],
              [1,1,1]])
                
y = np.array([[0],
              [1],
              [1],
              [0]])

In [3]:
def ToyNN1Layer(X, y, nodes, epochs):
    # Create synapses as Matrices 
    synapse0 = 2*np.random.random((X.shape[1],nodes)) - 1
    synapse1 = 2*np.random.random((nodes,1)) - 1
    
    # Apply as many epochs as you want
    for i in np.arange(epochs):
        # FORWARD PROPAGATION 
        layer0 = X
        layer1 = sig(np.dot(layer0, synapse0))
        layer2 = sig(np.dot(layer1, synapse1))
        
        # BACKWARD PROPAGATION
        layer2_error = y - layer2
        layer2_delta = layer2_error * sig(layer2, deriv=True)
        
        layer1_error = layer2_delta.dot(synapse1.T)   
        layer1_delta = layer1_error * sig(layer1, deriv=True)
        
        # UPDATING WEIGHTS IN SYNAPSES
        synapse1 += layer1.T.dot(layer2_delta)
        synapse0 += layer0.T.dot(layer1_delta)
    
    return layer2

In [4]:
ToyNN1Layer(X, y, 4, 1000)

array([[0.07537822],
       [0.93151652],
       [0.93027104],
       [0.06121874]])

Which shows essentially, how much it's confident in sways in either direction, essentially our array is of [0, 1, 1, 0].

Unfortunately this neural network only takes in a training sample and spits out how much it's learnt from the training sample, but it can be easily augmented to predict values and generate an accuracy score, it isn't much of a jump.