# Neural Networks

### Goal of this series
In this series, we will learn how to build a neural network from scratch. We will use python ans its libraries to build the network. I plan to explain a bit of the math, but I will not go into too much details. There also will be a code and some theory. This presentation is created as we go. Each chapter meeting I will create a new one. Everything will be available in git.

### Prerequisites
* You will need python. I will use python 3.10.
* You can use Jupiter notebook (anaconda, intellij) as I do, or copy the cody into your IDE.

- __[First Lecture](#First-Lecture)__
    - __[What is Neural Network](#What-is-Neural-Network)__
    - __[Single Neuron](#Single-Neuron)__
    - __[Activation Function](#Activation-Function)__
    - __[Forward Propagation](#Forward-Propagation)__


# First Lecture
In the first lecture I will explain what a neural network is and how it works in basic terms. I will introduce some terminology and we will show how to calculate single neuron and we put it together to achieve forward propagation.

### What is Neural Network
A Neural Network is a type of machine learning model inspired by the structure and function of the human brain. It is a multi-layered network of artificial "neurons," each of which performs a simple calculation on the input data and passes it on to the next layer.

Each neuron in a layer takes in a weighted sum of its inputs, applies an activation function to the result, and outputs the result as its own activation.

The weights of the connections between the neurons are adjusted during the training process using algorithms such as backpropagation. The goal of this training is to find the optimal values for the weights such that the network accurately predicts the target outputs for a given set of inputs.

Neural networks are used for a wide range of applications, including image and speech recognition, natural language processing, and game playing. They are particularly well-suited for problems where there is a lot of complex, non-linear data, and where a traditional, rule-based approach would be too cumbersome to develop.


### Single Neuron
A forward pass, also known as a forward propagation, is a key operation in the evaluation of a neural network. It refers to the process of computing the outputs of a neural network, given a set of inputs and the current values of the weights and biases.

The forward pass starts with the input layer and propagates the inputs forward through the network, layer by layer, until it reaches the output layer. At each neuron, the weighted sum of its inputs is computed and passed through an activation function to produce an output, which is then passed on to the next layer as input. The outputs of the final layer are the predictions of the network.

Now how do we calculate this forward pass? To make things simple lets focus on just one neuron. Each neuron has an input and bias associated to the neuron. And each neuron in one layer is connected to all neurons in the next layer. This connection is represented by a weight associated to the connection. The output of the neuron is calculated as follows:
\begin{equation}
    \hat{y} = wx + b
\end{equation}
Where $w$ is the weight, $x$ is the input and $b$ is the bias.
But considering that we usually have multiple neurons connected. We need to calculate something called weighted sum. This is the sum of all the outputs of the neurons in the previous layer. This is calculated as follows:
\begin{equation}
    \hat{y} = (w_1 * x_1 + w_2 * x_2 + ... + w_n * x_n) + b
\end{equation}
But as you can see this will quickly become tedious. So to simplify this we can use vectors and linear algebra to calculate the sum. This would be denoted like this: (note that instead of using the sum we are using a dot product of two vectors which achieves the same result)
\begin{equation}
    \hat{y} = b + \textbf{y} \cdot \textbf{x}
\end{equation}
Where $x$ is the vector of inputs: $x = [x_1, x_2, ..., x_n]$, $w$ is the vector of weights: $w = [w_1, w_2, ..., w_n]$ and $b$ is the same.

Now lets see some code:

In [8]:
import numpy as np # NumPy is the fundamental package for linear algebra and multidimensional arrays.

inputs = [1.0, 2.0, 3.0, 2.5]
weights = [0.2, 0.8, -0.5, 1.0]
bias = 2.0

output = np.dot(weights, inputs) + bias # Using numpy the weighted sum is very simple. Using the dot product we get exactly what we want.

print('Our first neuron output yay!: ', output)

Our first neuron output yay!:  4.8


### Activation Function
On key part of the neuron that I skipped over is the activation function. This is a function that is applied to the output of the neuron. This function is used to introduce non-linearity into the network. This is important because without it the network would be just a linear function. This would be very limiting. Now what does that actually mean? Let's look at some [graphs](https://www.desmos.com/calculator/con0g5kv6z). In this graph we are using something called ReLu function or Rectified Linear Unit. This function is defined as follows:
\begin{equation}
    f(x) = max(0, x)
\end{equation}
This function is very simple. It takes the input and returns it if it is positive. If it is negative it returns 0. This function is very simple and easy to calculate. But it is also very powerful. It is used in many neural networks. It is also very easy to calculate. This is why it is used so much.

In the graph we can clearly see the difference between linear function and non-linear function like ReLu. You can go to folder "W and B Linear" and  "W and b Relu" to play with values of biases and weights. The goal is to fine tune the values to fit the displayed data as perfectly as possible. If you play with the nobs for a minute. You can clearly see that the linear function is very limited in how it can fit the data. But the ReLu function can fit the data very well. This is because it is non-linear. This is why it is so important to use non-linear functions in neural networks.

Some other activation functions are:
* Sigmoid
* Tanh
* Softmax
* Leaky ReLu

So if we now want to adjust our neuron to use the ReLu function we can do it like this:
\begin{equation}
    g(x) = max(0, x)
\end{equation}
\begin{equation}
    \hat{y} = g(b + \textbf{w} \cdot \textbf{x})
\end{equation}
Now we have fully calculated a single neuron. This is the basic building block of neural networks. We can now use this to do something called forward propagation.

In [6]:
import numpy as np # NumPy is the fundamental package for linear algebra and multidimensional arrays.

inputs = [1.0, 2.0, 3.0, 2.5]
weights = [0.2, 0.8, -0.5, 1.0]
bias = 2.0

g = lambda x: max(0, x) # This is the ReLu function. We can use lambda to define it.

output = g(np.dot(weights, inputs) + bias) # Using numpy the weighted sum is very simple. Using the dot product we get exactly what we want.

print('Our first neuron output yay!: ', output)

Our first neuron output yay!:  4.8


### Forward Propagation
So how do we put it together? Putting it together isn't as hard as it may seem. We have seen how we can use vectors to simplify some calculations. Now we will step it up a notch and use matrices. This will allow us to calculate the output of multiple neurons at once. This is very important because we usually have multiple neurons in a layer. This is how we can calculate the output of a layer:
\begin{equation}
    y = g(\textbf{b} + \textbf{W} \cdot \textbf{X})
\end{equation}

Looks very similar right? But now we use matrices for inputs and weights and vector for biases.
\begin{document}
\[
X = \begin{bmatrix}
    x_{11} & \dots  & x_{1j}\\
    \vdots & \ddots & \vdots\\
    x_{n1} & \dots  & x_{nj}
    \end{bmatrix}
\qquad
W = \begin{bmatrix}
    w_{11} & \dots  & w_{1j}\\
    \vdots & \ddots & \vdots\\
    w_{n1} & \dots  & w_{nj}
    \end{bmatrix}
\]
\qquad
b = \begin{bmatrix}
    b_1\\
    \vdots\\
    b_n
    \end{bmatrix}
\end{document}

In [23]:
import numpy as np

class Activation:
    def relu(self, x):
        return np.maximum(0, x)

class Layer:
    def __init__(self, n_inputs: int, n_neurons: int):
        self.weights = 0.10 * np.random.randn(n_inputs, n_neurons)
        self.biases = np.zeros((1, n_neurons))

    def linear(self, inputs_x):
        return np.dot(inputs_x, self.weights) + self.biases


inputs_x = np.random.randn(1, 10)

layer1 = Layer(10, 3)
layer2 = Layer(3, 2)

layer1Out = layer1.linear(inputs_x)
layer2Out = layer2.linear(layer1Out)

print(layer2Out)

[[ 0.03637784 -0.04531111]]
