# Neural Networks

## Biological Neuron
It is an unusual-looking cell mostly found in animal brains. It’s composed of a cell body containing the nucleus and most of the cell’s complex components, many branching extensions called dendrites, plus one very long extension called the axon. The axon’s length may be just a few times longer than the cell body, or up to tens of thousands of times longer. Near its extremity the axon splits off into many branches called telodendria, and at the tip of these branches are minuscule structures called synaptic terminals (or simply synapses), which are connected to the dendrites or cell bodies of other neurons. Biological neurons produce short electrical impulses called action potentials (simply signals) which travel along the axons and make the synapses release chemical signals called neurotransmitters. When a neuron receives a sufficient amount of these neurotransmitters within a few milliseconds, it fires its own electrical impulses

<img src='images\neurons.png' height='200px'>
<img src='images\bnn.png' height='200px'>

## Artificial Neurons

Artificial networks are inspired by the human brain and how it is interconnected. They both have neurons, activations, and large interconnectivity. However, it is not a perfect comparison as the underlying process is different.


<img src='images\bio vs ai.png' height='300px'>

This is similar to what we saw in logistic regression, the only difference is that in logsitic regresssion what we call the activation function was a sigmoid. In neural networks, there is a wide variety of activation functions to choose from. We will discuss activation function later in this lesson. 

Within a neural network, each neuron performs a computation akin to logistic regression. It takes input from the previous layer, applies weights to these inputs, sums them up, and then applies an activation function. This process resembles the calculation in logistic regression where features are weighted and summed before being passed through a sigmoid activation function.

A shallow neural network would have only 2 layers, one hidden and one output layer. 

<img src='images\shallow network.png' height='300px'>

Adding more hidden layers will make it a deep neural network.

<img src='images\2hidden.jpg' height='300px'>


**- Learning Complex Patterns:** By linking multiple neurons together in layers, neural networks can capture intricate patterns and relationships in the data. While logistic regression is limited to linear relationships, neural networks excel at modeling nonlinearities due to their layered structure and the activation functions applied at each neuron.

**- Scaling Complexity:** Just as complex structures can be built from simple building blocks, neural networks scale the capabilities of logistic regression by layering multiple logistic regression units. Each layer adds another level of abstraction, allowing the network to learn increasingly complex representations of the data.

### Anatomy of a Neuron:

**1. Inputs:** Neurons receive input signals from the previous layer or directly from the input data. These inputs represent the features of the dataset.

**2. Weights:** Each input is associated with a weight, which determines its importance in the computation. Just like logistic regression assigns coefficients to features, neural networks adjust weights during training to optimize performance.

**3. Activation Function:** After summing the weighted inputs, the neuron applies an activation function. This function introduces nonlinearity into the model, allowing neural networks to learn complex mappings between inputs and outputs. Common activation functions include the sigmoid function, which is used in logistic regression, as well as others like Softmax, ReLU (Rectified Linear Unit) and tanh.

**3. Output:** Finally, the neuron produces an output value, which is transmitted to the next layer of neurons. In classification tasks, this output typically represents the probability of a certain class, just as logistic regression outputs probabilities for binary classification problems.

#### Notation

$$ f_i^{[l]}(x) = w_{i, 1}^{[l]}.f_1^{[l-1]}(x)+ w_{i, 2}^{[l]}.f_2^{[l-1]}(x) + b_1^{[l]} $$

- The superscript $[l]$ indicates the layer
- The subscript $i, j$, $"i"$ indicates the neuron number, $"j"$ indicates the coeffiecent number 

## Activation Functions

Activation functions play a vital role in neural networks by transforming the input signal of a node into an output signal, which is then forwarded to the next layer. They enables neural networks to learn intricates patterns in data, breaking away from solely linear relationships. By introducing non-linearities, activation functions empower neural networks to capture and understand complex mappings between inputs and outputs. Without them, the network's capacity to learn would be limited.

So why do we need it in the first place? A neural network with layers only having linear activations will be the same as having a single linear layer.

<img src='images\linear activation.png' height='200px'>

$$

Layer 1, neuron 1: f_1^{[1]}(x) = w_{11}^{[1]}.x + b_1^{[1]}

\\[10pt]

Layer 1, neuron 2: f_2^{[1]}(x) = w_{21}^{[1]}.x + b_2^{[1]}

\\[20pt]

Layer 2, neuron 1: f_1^{[2]}(x) = w_{11}^{[2]}.f_1^{[1]}(x)+ w_{12}^{[2]}.f_2^{[1]}(x) + b_1^{[2]}

\\[10pt] 

f_1^{[2]}(x) = w_{11}^{[2]}.(w_{11}^{[1]}.x + b_1^{[1]})+ w_{12}^{[2]}.(w_{21}^{[1]}.x + b_2^{[1]}) + b_1^{[2]}

\\[10pt] 

f_1^{[2]}(x) = (w_{11}^{[2]}w_{11}^{[1]} +  w_{12}^{[2]}w_{21}^{[1]})x + w_{11}^{[2]}b_1^{[1]} + w_{12}^{[2]}b_2^{[1]} + b_1^{[2]}

\\[10pt] 

f_1^{[2]}(x) = w.x + b

$$

We are back to a linear model... That's why we need activations to fit more complex relations. Here is what a neural network, with two hidden layers and only linear activations, would fit for a sine function

A (1 - input) (8 neurons, 1st hidden layer) (8 neurons, 2nd hidden layer) (1 neuron, output layer)

<img src='images\1881 nn.png' height='300px'>

With only linear activations, here is how it will fit a sine function:

<img src='images\1881 linear activation nn.png' height='200px'>

Where as this is the result when we use a ***ReLU*** activation function:

<img src='images\relu function.png' height='200px'> &nbsp; <img src='images\1881 relu activation nn.png' height='200px'>


let's see how using the ***ReLU*** activation function can help a neural network learn a sine function:

<video controls src="videos\how activations work.mp4">

### Backpropagation

For many years researchers struggled to find a way to train neural networkss without success. But in 1986, David Rumelhart, Geoffrey Hinton, and Ronald Williams published a groundbreaking paper that introduced the backpropagation training algorithm, which is still used today.

In short, it is Gradient Descent using an efficient technique for computing the gradients automatically in just two passes through the network (one forward, one backward), the backpropagation algorithm is able to compute the gradient of the network’s error with regard to every single model parameter. 

In other words, it can find out how each connection weight and each bias term should be tweaked in order to reduce the error. Once it has these gradients, it just performs a regular Gradient Descent step, and the whole process is repeated until the network converges to the solution.