# Neural Networks | Jasamrit Rahala
---

In this notebook we will cover the basics of neural networks ranging from the most basic network (a perceptron) to training your own fully connected neural network. It is useful to have some knowledge of calculus for this notebook. Through this notebook we will specifically look at:

- Theory behind Neural Networks
- Perceptrons
- Derivatives and training perceptrons
- Linearly seperable problems
- Activation functions
- Backpropogation in networks

We will be using a minimum number of libraries to reinforce the maths and processes going behind the scenes, hopefully demistifying Neural Networks. Please work through this notebook at your own pace and ask any questions you may have on the discord.

Github (useful links + resources) : https://github.com/JRahala

Thanks Jam

## Theory
---


### Biological Theory
---

Neural networks derive themselves from the human brain, specifically the idea of neurons and sending messages or electricity through a network. Before we look at code lets look into the components of a neuron.

<br>
    
![Neuron](https://res.cloudinary.com/dyd911kmh/image/upload/f_auto,q_auto:best/v1547672259/1_a74o1a.png)

<br>

Above we have the diagram of a neuron. A neuron's job is to process and transmit information (in the form of electricity) from one neuron to the next. I does this biologically through two main components, the dendrites and the axon. 

The dendrites on the left recieve incoming signals and the axons help send out signals, based on the dendrite. i.e. if the dendrite has a strong signal incoming, then the axon is likely to also send out a strong signal.

Neurons are linked up into networks, therefore one neuron's output leads to another neuron's input creating a chain of firing messages. Through firing messages, certain messages can become stronger and others weaker. If we reward and punish the network's outputs, over time the network will begin to learn recognise certain patterns and adjust how strong certain messages become. By doing so it starts to train itself to perform certain tasks.

Biologically a neuron is a lot more complex however we abstract the concept of a neuron for AI. As far as we are concerned, neurons (aka nodes) have inputs and outputs and can weight certain inputs / outputs in order to achieve tasks. In AI we represent the electrical signals as numbers to simplify the process.

Now that we understand the theory we can start to understand the maths.

### Mathematical Theory
---

In order to model a network, we need to model a neuron. Neurons are the building blocks of a neural network and without them we cannot create one. We will take an object oriented approach (we will use classes) towards neurons.

In order to model a neuron we have to suffice three key functionalities:

- a neuron can take in input (from preceeding neurons)
- a neuron can process data 
- a neuron can return output (to succeeding neurons)

As we delve further into neural networks, we can start to add more functionalities.
Before we start to look at these processes lets go over the structure of a network.

#### Neural Network | Structure

The neural networks that we will be coding are known as fully connected networks, I shall explain why. Networks are comprised of layers of neurons. Input is fed into the input layer of the network, then each layer successively fires after the next until we reach the final 'output layer'. (Layer that are not output nor input layers are known as 'hidden layers') At this layer we can recieve an answer from our neural network. Each layer can consist of any number of neurons. Between each layer there exists connections (which act as the dendrites and axons for our network), these connections pass messages forward from each layer and can weight certain connections between neurons (in order to allow certain messages to have more impact). For each layer in our network, each neuron will have one connection to each neuron in the next layer, hence the term fully connected network.

Now that we have specified our structure we can start formalising the processes we started talking about above.
It is worth noting that a single neuron can act as a neural network, though it has limited capabilities.

#### Neuron | Input data

In a neural network each node is associated with some inputs (numbers). The way a node recieves information is through a connection which inturn has a weight. The stronger the weight of a connection, the more impact that input will have on the node. 

<blockquote>
    
For example, I have a simple network that predicts the likelihood for consumers to order a new mattress. It takes in two inputs on the input layer:

- size of their current mattress
- how old their current mattress is 
    
Whilst the size of their mattress may have some effect, I think we can agree that the age of a mattress is much more important when figuring out how likely people are towards buying a new mattress. As a result the size input should have less effect than the age input. We can control this through weights. Therefore the connection between the size input and the network have a much smaller weight than the connection between the age input and the network.
    
</blockquote>

In order to formalise the inputs and weights we can use the following formula.

![Node summation]

![Equation summation]


By summing the product of each input with each respective weight we can retain the information sent from the inputs, whilst representing the significance of each inputs.


#### Neurons | Processing data

Neurons can process the input data they recieve. They achieve this by using an 'activation function'. Activation functions are simply functions that are generally used to model complex relations. I will use a one neuron network to demonstrate why activation functions are important.

<blockquote>

Continuing with the example of a consumer's probability to buy a mattress, if we were to use a neuron without an activation function, this neuron would fail to predict the probability as the neuron may start to return values that exceed the range of 0 - 1. As well as this the neuron would only be capable of representing some linear combination of the inputs, this may not be helpful since certain inputs may become less important over time and not follow a linear scale of importance. Now let's try this example with a neuron, that uses an activation function. To do this I will define an activation function below.

![Activation function](https://qph.fs.quoracdn.net/main-qimg-6b67bea3311c3429bfb34b6b1737fe0c)

This function (known as the sigmoid function) will squeeze a value between 0 and 1, therefore giving us output that makes sense. Since this function is not a linear function, our network will now be able to predict non-linear relations which can open up a whole new world of problems, that our network can solve. This network will now be able to return some sensible output to the user. 

</blockquote>


The inputs to a neuron are multiplied and summed, then fed into the activation function. The output of this activation function is used as the output of the neuron.


#### Neurons | Output data

Outputting data from a neuron is very simple. Since, we are using a fully connected network the output of a node will simply be the output of its activation function. This output will be passed through each of the neurons connections (that point to the next layer). The node highlighted in purple has all of its fowards connections highlighted in red. As the outputs of this neuron are passed on the connected neurons, the process repeats until the final layer is reached.

![INSERT PURPLE AND RED CONNECTIONS]

It is common to see a special type of activation on the last layer of a neural network. The last layer is changed according to whatever problem we are trying to solve with the newtork. For example a if we had created a network to classify hand-drawn numbers then we might want one output for final layer. We could then create an activation function that returns a the node with the highest input. Our network could predict the following possibilities:

- [0] 54%
- [1] 4%
- [2] 3%
- [3] 6%
- [4] 5%
- [5] 7%
- [6] 2%
- [7] 0%
- [8] 15%
- [9] 4%

Here our network can output 0 as its answer since it has the highest probability in the output layer. Activation functions can be created by yourself however for our networks we will go over two widely used functions: the heaviside function and the sigmoid function.

Now that we understand the fundamental structure of a network lets go through programing the most basic type of network. A perceptron.

## Perceptron
---

A perceptron is an extremely simple neural network that uses one neuron. It is used for classification. It takes a set of inputs and feeds the weighted sum through an activation function known as the heaviside step function, this is also referred to as the unit-step function. 

Lets go over some worked examples of processes with percpetrons


### Perceptron | Initialisation
---

For this example we will use two inputs for our model. We will calculate the weighted sum and pass this through our activation function for our output. Based on this output we will adjust the weights that link the inputs to the perceptron. Below is a visualisation of the network that we are creating.

![Network Visualisation goes here]

In order to predict and train our network we will need to assign some initial weights for our perceptron. Initially we can assign these weights randomly, over time these weights will become fine tuned.

### Perceptron | Predicting

For this perceptron I will be using the iris dataset which contains information on 3 types of flowers: Iris Setosa, Iris Versicolour, Iris Virginica. For our example we will be simply using a perceptron to classify a flower as either an Iris Setosa type flower or not.

[More information on the dataset](https://archive.ics.uci.edu/ml/datasets/iris)