# Neural networks explained
> "An explanation of how neural networks work"

- toc: false
- branch: master
- badges: true
- comments: true
- categories: [jupyter, explanation]
- image: images/explanation_neural_net/neural_net.PNG
- hide: false
- search_exclude: true

# What is a neural network?
A neural network is a stack of neurons who work together towards a certain goal.
![](my_icons/neural_net.PNG)

# Neurons
So basically, a neural net is a collection of these neurons. But what are neurons?
A neuron is a variable that holds a value. 
What kind of value can a neuron hold?
Most of the times a neuron just holds a value:

    - it can be a value from 0 to 255 for a pixel value(for images)
    - it can be a value of 0 or 1 for a classifier
    - or it can be whatever value you want it to be, dependent to what you want
    
## basically, a neuron just holds a value

# Layers
The neurons work in layers. What is a layer? A layer is an array of neurons on the same level. In the image above ![](my_icons/layer.PNG) could be considered a layer.

So a layer is just the same level of neurons. NN(Neural Networks) can have a lot of layers, different in size or length, but the basic thing to remember is that layers propagate the information further down the network. ![](my_icons/more_neural_nets.PNG)

# Weights and biases
The next thing you need to understand are weights and biases. What are these? 
Weights and biases are part of a mathematical equation (m\*x+n), where m IS a certain weight, n IS a certain bias and x is the value of the neuron.


The weights and biases connect the layers together. Reffering back to the first image, you can see that every neuron in a certain layer is connected to every neuron in the next layer. These connection have certain values. 

Imagine a network of people. You can have a strong connection to your relatives and a weak connection to me, let's say. Well, a "strong" connection would have a high weight and a "weak" connection has a small weight. A bias also has an effect on the connection, but not quite as big as a weight since multiplication has a higher exponent than addition.

```
ex. 10 * 10 = 100 and 10 + 10 = 20
but 10 * 100.000 = 1.000.000 and 10 + 100.000 = 100.010
```
So, to sum up:
Weights and biases connect the layers together and they decide whether a neuron in the first layer has a strong connection with a neuron in the second layer.

# Learning
Now, this is where the magic happens. What neural networks allow us to do is <b> LEARN </b> from our mistakes. We can change the parameters( weights and biases) of a neuron based on a reward function. So, if you tell the network that a connection between 2 neurons is good the network will remember that and not change the parameters a lot. But if you tell it a connection is bad, then the NN will change the parameters of that connection so the next time you go through the same neurons you will get a better result.

This type of learning is called incremental learning where you take small steps by changing small things in your parameters so that only a few things will get changed. This is slow, but given enough time the network will <b> LEARN </b> to improve it's parameters and the end result.

# Loss and reward
You're probably asking yourself what determines if a connection is good or bad? What is a reward function?
Basically, a reward function is a simple function that tells you if the output is the one that you expected.

You see, neural networks rely on a type of Machine Learning called Supervised Learning.

# Supervised Learning
Supervised learning refers to a type of machine learning where you give it a set of inputs, you tell the computer what that set of input is and you tell it "now predict what these outputs are". 

For example you can give it images of hand drawn photos(inputs) to learn from. The algorithm tries to predict what the photo is and checks the result with the label you gave it. If you say a photo is a 3 and initially it predicts 1 the parameters will change so that the next time you will get a different result. We do not know what the network changes, but we do know the output is good(this is one of the reasons it is sometimes calleda black box). You know the input, the output, but do not know how it is produced. 

# Going back to loss and reward
So, the loss function is the one that determines if the connection is good or bad. And the loss functions is usually based on SGD(Stochastic Gradient Descend) - which requires more math to explain.

But, in short, the loss function just says "this output is not what i expected, change it" or "this is good, do not change it".
The next cell tries to explain SGD, so you can skip it if you want. 

# Stochastic Gradient Descend

Gradient Descent is a iterative optimization algorithm for finding a local minimum of a differentiable function. In simpler terms, it find the minimum of a function - in our case the loss function. 

As you can imagine, finding the minimum of the loss functions means there is no loss. If there is no loss there is nothing left to change, thus the algorithm is complete. This would be the ideal case, but in reality usually we can't find the minimum loss.
But how does <b> SGD </b> work?

Basically, you choose a direction and a step at which you advance and try to find the minimum.
![](my_icons/loss1.PNG) 

1. How do you choose the step? From my understand thus far, the step is important. Choose a too small step and you will never get to the end as the increase will be really small, choose a too big step and you risk of it "bouncing" and still never reaching the end.![](my_icons/loss2.PNG) I do not know yet how to choose a step.
2. How to you choose the direction? In order to choose the direction you try to go in both directions first and see which one is closer to the minimum. for example you choose a value of x + 1 and x - 1 and see which one gets you closer to the minimum. and you choose that one.

# Epochs and learning

I said the NN learns incrementally. It does so in epochs. An epoch is the time it takes a NN to go through the whole dataset a single time. After it goes through an epoch, it goes back, with the loss function and Gradient descend and makes some small adjustments to the weights and biases, thus getting closer, step by step to the minimum loss function.

# Conclusion
 
There are a lot more things to explain such as activation functions, back propagation, feed forward etc, but this is my understanding this far of NN.

Neural Networks learn by changing the weights and biases between different neurons. These values are changed based on the loss function which is calculated using a method called Stochastic Gradient Descend. 


In conclusion, neural networks are extremly complicated and my "explanation" doesn't even come close to the reality, but it could be considered a starting point towards understand these neural networks.