<a href="https://colab.research.google.com/github/MaralAminpour/IVM_supplementary_materials/blob/main/NN_Introduction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neuron structure

- **Dendrites**: Think of these as the neuron's 'inbox' – they receive messages in the form of chemical signals from other neurons.
- **Soma**: This is the 'main office' of the neuron where all the incoming messages get read and sorted out. It's like the boss deciding if there's enough reason to pass on a message.
- **Axon**: Consider this as the 'delivery guy' – if the soma gives a thumbs up, the axon carries the 'go' signal down the line to other neurons.
- **Myelin Sheath**: Some axons have this special 'insulation tape' wrapped around them, helping the 'go' signal travel faster and further without getting weaker.
- **Synapses**: These are the 'meet-and-greet' spots where one neuron connects to another. They're the social hubs of the neuron world.
- **Chemical Synapses**: Picture a tiny space where the pre-synaptic neuron (the sender) doesn't actually touch the post-synaptic neuron (the receiver). Instead, it sends chemical messengers across this gap to pass on its message.
- **Neurotransmitters**: These are the 'text messages' sent by the pre-synaptic neuron, which tell the receiving neuron what to do next – either get excited and send its own message or chill out and remain quiet.
- **Excitatory vs. Inhibitory Synapses**: It's like having two kinds of text messages – one kind gets you all hyped up to do something ('Let's go!'), and the other is more like a calming message to take it easy ('Relax, no rush').

In summary, neurons are like tiny, busy offices that take in information, decide what's important, and then pass on messages to the next neuron in line. The whole process is a mix of receiving signals, making decisions, and sending responses.


- **Dendrites**: These are extensions of the neuron that act as the input channels. They receive chemical signals from other neurons and convert them into electrical signals.
- **Soma**: Also known as the cell body, this part of the neuron integrates the electrical signals received from dendrites to determine if the neuron will activate and send a signal along to other neurons.
- **Axon**: This is a long, slender projection that transmits the electrical signal (action potential) away from the neuron's soma toward other neurons.
- **Myelin Sheath**: Some axons are wrapped in this protective sheath, which helps speed up the transmission of the action potential over long distances.
- **Synapses**: These are the junctions where neurons communicate with each other, transferring information from one neuron to the next.
- **Chemical Synapses**: In this type, there's a small gap between neurons where the signal is transferred using chemical messengers called neurotransmitters, which are released from one neuron and bind to receptors on the next.
- **Neurotransmitters**: These chemicals can either excite the next neuron, prompting it to send a signal, or inhibit it from sending a signal.
- **Excitatory and Inhibitory Synapses**: These are the two types of chemical synapses. Excitatory synapses encourage the next neuron to send a signal, while inhibitory synapses discourage it from doing so.

Each component of the neuron plays a critical role in processing and transmitting information throughout the nervous system.

Biological Neural Networks (BNNs) and Artificial Neural Networks (ANNs) have distinct parts that correspond to each other, underpinning their conceptual similarities:

**Parts of Biological Neural Networks:**

- **Neurons**: The fundamental cells that process and transmit information through electrical and chemical signals.
- **Dendrites**: Receive signals from other neurons.
- **Soma (Cell Body)**: Integrates incoming signals to determine if the neuron will activate.
- **Axon**: Transmits the electrical signal to other neurons.
- **Synapses**: Junctions where neurons communicate, using neurotransmitters to send signals.
- **Myelin Sheath**: Insulation around some axons that speeds up signal transmission.
- **Neurotransmitters**: Chemicals that transmit signals across synapses.

**Parts of Artificial Neural Networks:**

- **Artificial Neurons (Nodes)**: Basic processing units that simulate biological neurons.
- **Inputs**: Analogous to dendrites, they receive data to be processed.
- **Weights**: Equivalent to the strength of synaptic connections, determining the influence of inputs.
- **Activation Function**: Serves a similar purpose as the soma, deciding the level of output signal based on input strength.
- **Outputs**: Correspond to the axon, transmitting the signal to the next layer or as a final output.
- **Layers**: Structured groupings of nodes; including input, hidden, and output layers.
- **Learning Algorithm (e.g., Backpropagation)**: Method for adjusting weights in the network, similar to how experiences rewire synaptic connections.

**Similarities:**

- **Signal Processing**: Both BNNs and ANNs process information through a network of interconnected units (neurons/nodes).
- **Adaptation**: Neurons in BNNs adapt through changes in synaptic strength, while ANNs adapt through changes in weights.
- **Integration and Activation**: Neurons integrate signals and fire based on a threshold; similarly, nodes calculate weighted sums and apply an activation function.
- **Transmission**: Just as axons transmit signals to other neurons, ANNs transmit processed data from one node to the next.
- **Learning**: Both networks learn from repeated exposure to stimuli (data), although the mechanisms differ (biological processes vs. computational algorithms).

The conceptual similarity is rooted in the inspiration ANNs take from BNNs, using an abstracted and simplified model to replicate the complex patterns of data processing and learning observed in biological systems.

- **Artificial Neuron Basics**: They're simplified versions of biological neurons, where the complex brain processes are approximated in a limited fashion.
- **Signal Multiplication**: Inputs (\(x_i\)) are multiplied by corresponding weights (\(w_i\)) to represent the synaptic strength in the network.
- **Adding Bias**: A bias term is added to shift the activation threshold away from zero, adjusting the response of the neuron.
- **Activation Function**: An artificial neuron 'fires' if the weighted sum of inputs, after including the bias, passes through a nonlinear activation function (\(f\)) and exceeds a certain threshold.
- **Learning Process**: Neural networks learn by adjusting weights and biases to improve performance on specific tasks.
- **Engineering vs. Biology**: While artificial neurons are inspired by biology, their design in fields like computer vision is often driven by engineering needs rather than an attempt to closely simulate brain function.
- **Distinct Research**: There are networks explicitly designed to mimic biological processes, but these biologically inspired networks are a different branch of study within the broader field of artificial intelligence.

Artificial neurons in a neural network are meant to somewhat replicate the way biological neurons work, but the key word here is "somewhat." They do this by taking inputs (think of these as signals or pieces of information) and assigning each one a weight—this is similar to how biological neurons have different strengths of connections. These weights are like the volume knobs for each input, determining how much each signal should be amplified or dialed down.

Then there's the bias. You can think of the bias as setting the baseline or starting point for where the decision-making begins. It's not just about whether the signals add up to a positive or negative number; the bias adjusts the level at which the neuron's output becomes significant enough to be considered.

After all the inputs have been adjusted by their weights and the bias is added, we don't just add them up to get a result. Instead, we pass this sum through a special function called the activation function. This function is designed to introduce complexity into the equation, allowing the neuron to make more nuanced decisions than just a simple yes or no. It's like deciding whether to forward an email based on a quick scan—it needs to be interesting enough, not just the first one in your inbox.

The whole goal of a neural network is to figure out the best weights and biases to use for making predictions or decisions, which is done during the learning phase. The network adjusts these weights and biases little by little, each time it looks at new data, trying to get better at whatever job it's supposed to do.

While all this is inspired by how our brains work, artificial neurons are a lot simpler and more focused on specific tasks than the vast complexity of human neurons. The design choices in artificial neural networks, especially in areas like computer vision, are more about engineering the best solution for a problem rather than mimicking the human brain's functionality. There are some neural networks that really try to get close to biological reality, but they're a different kind of project and aren't as common in everyday tech.

Artificial neurons are the building blocks of machine learning models, and they draw inspiration from the human brain's neurons. But let's be clear, they're a simplified version. Each artificial neuron acts like a tiny filter for whatever data you throw at it, deciding what's important and what's not by assigning weights to the inputs—kind of like giving different levels of attention to the information it receives.

Now, let's break down the process:

- We start with inputs, which are your raw data points or features. Each one is paired with a weight, representing its importance in the decision-making process. It's a bit like focusing on specific aspects of a situation to make a decision.
- Next up is the bias. This is like a built-in judgment that shifts the starting point of the calculation. It ensures that the neuron doesn't activate (or fire) for just any input combination. There's a threshold that needs to be crossed, which the bias helps to set.
- After the inputs are weighted and the bias is added, we don't just sum it up. This total goes through an activation function, which decides whether the neuron should activate. This function allows the neuron to pick up on more complex patterns by introducing nonlinearity to the system.
- The real magic happens during the learning phase. The network tweaks the weights and biases in response to the errors it makes, slowly improving its accuracy and decision-making capabilities. This is somewhat similar to learning from mistakes and experiences, although it's all about calculations and adjustments in the artificial setup.

Artificial neurons are focused on efficiency and solving specific tasks, often taking a more engineered approach rather than trying to mimic the exact way our brains work. While some neural networks aim to be more like our brain's neural networks, they're usually part of specialized research and not what you typically see in everyday tech applications.

The human brain is an incredibly complex network comprised of around 86 billion neurons, all linked together by an estimated 100 trillion to 1 quadrillion synapses. The accompanying illustration simplifies this complexity by showing a sketch of a biological neuron next to its mathematical representation.

In essence, a neuron processes incoming signals at its dendrites and sends outgoing signals down its axon. This axon then splits and forms connections, through synapses, with the dendrites of other neurons. In the mathematical model we use for computation, the signals (like $ x_0 $) moving along the axon are modified by weights $( w_0 $) representing the synapse's strength, which determines how much one neuron will influence another. These weights are adjustable, allowing the system to learn and determine the impact of one neuron's activity on another, whether it be stimulating (positive weight) or dampening (negative weight).

In this model, signals received by the dendrites are summed up in the cell body. If this total surpasses a specific limit, the neuron 'fires', sending a signal down its axon. Simplifying further for computational purposes, we don't consider the exact timing of these signals; instead, we focus on the rate of firing. This rate is modeled using an activation function, denoted as $ f $, to represent the frequency of the output signals.

Historically, the sigmoid function $( \sigma $) has been a popular choice for an activation function because it effectively compresses a real-valued input—the accumulated signal strength—into a range between 0 and 1. Later on, we'll dive into the specifics of various activation functions and their roles in neural computation.

# Biological motivation and connections
The basic computational unit of the brain is a neuron. Approximately 86 billion neurons can be found in the human nervous system and they are connected with approximately 10^14 - 10^15 synapses. The diagram below shows a cartoon drawing of a biological neuron (left) and a common mathematical model (right). Each neuron receives input signals from its dendrites and produces output signals along its (single) axon. The axon eventually branches out and connects via synapses to dendrites of other neurons. In the computational model of a neuron, the signals that travel along the axons (e.g. x0) interact multiplicatively (e.g. w0x0) with the dendrites of the other neuron based on the synaptic strength at that synapse (e.g. w0). The idea is that the synaptic strengths (the weights w) are learnable and control the strength of influence (and its direction: excitory (positive weight) or inhibitory (negative weight)) of one neuron on another. In the basic model, the dendrites carry the signal to the cell body where they all get summed. If the final sum is above a certain threshold, the neuron can fire, sending a spike along its axon. In the computational model, we assume that the precise timings of the spikes do not matter, and that only the frequency of the firing communicates information. Based on this rate code interpretation, we model the firing rate of the neuron with an activation function f, which represents the frequency of the spikes along the axon. Historically, a common choice of activation function is the sigmoid function σ, since it takes a real-valued input (the signal strength after the sum) and squashes it to range between 0 and 1. We will see details of these activation functions later in this section.

https://www.itechcreations.in/artificial-intelligence/artificial-neural-network-for-dummies-an-introduction/

https://www.upgrad.com/blog/neural-networks-for-dummies-a-comprehensive-guide/

https://ai.plainenglish.io/neural-networks-for-dummies-841a404be413

https://talendor.io/neural-networks-for-dummies

https://www.freecodecamp.org/news/neural-networks-for-dummies-a-quick-intro-to-this-fascinating-field-795b1705104a/

https://vidyaesampally1998.medium.com/artificial-neural-network-v-s-biological-neural-network-a0862d12e9a8

https://cs231n.github.io/neural-networks-1/

https://dilanbakr.medium.com/introduction-to-artificial-intelligence-part-1-db89f5e81a22


https://www.geeksforgeeks.org/artificial-neural-networks-and-its-applications/

https://www.mdpi.com/2076-3425/12/7/863

https://www.xenonstack.com/blog/artificial-neural-network-applications



# Modeling one neuron

Neural Networks started off trying to mimic how the brain's neurons work, but they've since shifted towards more of an engineering approach to improve how we do machine learning. Even so, it's useful to have a quick look at the biological inspiration behind it all before we dive deeper.

# Biological motivation and connections

The basic computational unit of the brain is a neuron. Approximately 86 billion neurons can be found in the human nervous system and they are connected with approximately 10^14 - 10^15 synapses. The diagram below shows a cartoon drawing of a biological neuron (left) and a common mathematical model (right). Each neuron receives input signals from its dendrites and produces output signals along its (single) axon. The axon eventually branches out and connects via synapses to dendrites of other neurons.

In a computational model, a neuron is represented as a node that processes incoming signals and produces an output. Here's how it works: signals travel through pathways analogous to axons in the biological sense, where each signal is assigned a variable (like \( x_0 \)). This signal is then altered by a corresponding weight (such as \( w_0 \)) before it reaches the next neuron. The weight \( w_0x_0 \) represents the strength and type of connection—whether it amplifies the signal (excitatory with a positive weight) or diminishes it (inhibitory with a negative weight). This mimics the way biological synapses control the influence of one neuron on another.

The weights—these crucial elements of the neural network—are adjustable. They are 'learned' through repeated exposure to data during the training process. As the network processes more data, it adjusts the weights to improve its predictions, much like how our brains strengthen or weaken synaptic connections based on experiences.

Once the signals reach a neuron, they are collected by structures akin to dendrites and brought together in the neuron's main body, just like the summing junction in biological neurons. Here, all the incoming weighted signals are totaled. If this total surpasses a specific threshold—a predefined limit that determines whether the neuron should activate—the neuron outputs a signal. This output then travels down what would be considered the neuron's axon in a biological context.

One simplification in this computational model is that we don't consider the exact timing of each signal. Instead, we focus on how often the neuron fires, or its firing rate. This rate is assumed to carry the essential information, rather than the precise pattern of spikes. This approach is based on the 'rate code' theory of neural communication, which postulates that it's the number of spikes over time, not the exact timing of them, that's most important for conveying information.

# Artificial neurons vs Biological neurons

The concept of artificial neural networks comes from biological neurons found in animal brains So they share a lot of similarities in structure and function wise.

**Structure:** The structure of artificial neural networks is inspired by biological neurons. A biological neuron has a cell body or soma **to process the impulses**, dendrites **to receive them**, and an axon that **transfers them to other neurons**.  The input nodes of artificial neural networks receive input signals, the hidden layer nodes compute these input signals, and the output layer nodes compute the final output by processing the hidden layer’s results using **activation functions**.

**Synapses:** Synapses are the links between biological neurons that enable the **transmission of impulses from dendrites to the cell body**. Synapses are the **weights that join the one-layer nodes to the next-layer nodes** in artificial neurons. The strength of the links is determined by the weight value.

**Learning:** In biological neurons, learning happens in the cell body nucleus or soma, which has a nucleus that helps to **process the impulses**. An action potential is produced and travels through the axons if the impulses are powerful enough to reach the threshold. This becomes possible by synaptic plasticity, which represents the ability of synapses to become stronger or weaker over time in reaction to changes in their activity.

In artificial neural networks, **backpropagation** is a technique used for learning, which **adjusts the weights between nodes according to the error or differences between predicted and actual outcomes.**

**Activation:** In biological neurons, activation is the firing rate of the neuron which happens when the **impulses are strong enough to reach the threshold**. In artificial neural networks, A mathematical function known as an activation function maps the input to the output, and executes activations.

# How do Artificial Neural Networks learn?

Artificial neural networks are trained using a training set. For example, suppose you want to teach an ANN to recognize a cat. Then it is shown thousands of different images of cats so that the network can learn to identify a cat. Once the neural network has been trained enough using images of cats, then you need to check if it can identify cat images correctly. This is done by making the ANN classify the images it is provided by deciding whether they are cat images or not. The output obtained by the ANN is corroborated by a human-provided description of whether the image is a cat image or not. If the ANN identifies incorrectly then back-propagation is used to adjust whatever it has learned during training. Backpropagation is done by fine-tuning the weights of the connections in ANN units based on the error rate obtained. This process continues until the artificial neural network can correctly recognize a cat in an image with minimal possible error rates.

In the realm of artificial neural networks, each neuron—or rather, each node—calculates the dot product of its inputs and the corresponding weights. This process is akin to taking several pairs of numbers, multiplying each pair together, and then summing up all the products. This sum is then adjusted by adding a bias, a unique value for each neuron that helps to fine-tune the output.

Following this, the neuron applies an activation function, which in many introductory cases is the sigmoid function. The sigmoid function, expressed mathematically as $ \sigma(x) = \frac{1}{1 + e^{-x}} $, has a distinctive 'S' shaped curve. **It smoothly maps the input values, which can be any real number, to a range between 0 and 1.** This is useful because it **converts the dot product—a potentially large or small value—into something manageable** that tells the network how 'activated' or 'fired up' the neuron should be.

The choice of the sigmoid function isn't arbitrary. It's historically popular because it closely resembles the way biological neurons seem to work: they either fire or they don't, with a gradual buildup as inputs increase. However, it's not the only activation function used in neural networks. Towards the end of this section, we'll explore other activation functions that can be employed, each with its own mathematical characteristics and use cases, tailored to different aspects of learning and pattern recognition that the network aims to achieve.

**Coarse model**: It's crucial to acknowledge that the way we model neurons in artificial neural networks is quite simplified compared to their biological counterparts. Real neurons come in various types, each with unique characteristics and functions. In a living brain, dendrites are responsible for intricate nonlinear calculations, far beyond the simple signal processing we typically model in artificial networks. Also, synapses in nature are not merely single values representing strength; they're dynamic and complex, involving a multitude of factors that affect their behavior.

Furthermore, in certain biological systems, the precise timing of a neuron's firing—down to the exact millisecond—is critical, which challenges the assumption that the frequency of firing is all that matters (the rate code model). Given these complexities, and many others we simplify or overlook, it's common for neuroscientists to express a bit of frustration when direct comparisons are made between the functioning of neural networks in AI and the workings of the human brain.

# Single neuron as a linear classifier

The operational principle of a single neuron in a neural network is mathematically analogous to **linear classifiers** you may have encountered before. A neuron can exhibit a preference for certain input patterns, showing a high level of activation (close to 1) for favored patterns or a low level of activation (close to 0) for others. By coupling this neuron with a suitable **loss function**, we can mold it into a linear classifier.

**Binary Softmax classifier**: Take the Binary Softmax classifier as an instance. Here, the output of the neuron after applying the sigmoid function, denoted as $ \sigma(\sum_{i} w_i x_i + b) $, can be interpreted as the probability that a given class label $ y_i $ is 1, given the input features $ x_i $ and the learned parameters $ w $ (the weights) and $ b $ (the bias). The probability of the alternative class (where $ y_i $ is 0) is simply $ 1 - P(y_i=1 | x_i; w) $, ensuring that the total probability sums up to 1. The **cross-entropy loss function**, familiar from linear classification contexts, can then be applied to optimize this neuron, yielding a binary Softmax classifier, also recognized as **logistic regression**. The predictions hinge on whether the neuron's output is above or below the threshold of 0.5, given that the sigmoid function's output is confined within the range from 0 to 1.

**Binary SVM classifier**: Alternatively, if we choose to attach a **max-margin hinge loss** to the neuron's output, we steer the training process towards a binary Support Vector Machine (SVM) classifier. This type of classifier aims for the largest margin between the data points of different classes, which can be advantageous for generalization.

**Regularization interpretation**: From another angle, if we consider regularization, common to both SVM and Softmax classification, it can be likened to a process of **gradual forgetting within a biological framework**. Regularization penalizes large weights, effectively nudging them towards zero with each update during training, akin to how a biological system might gradually diminish the synaptic strengths that are not frequently used.

In essence, even a solitary neuron within a neural network has the potential to act as a binary classifier—whether it's a Softmax or an SVM—by drawing on these mathematical principles and interpretations.

# How do Artificial Neural Networks learn?

Artificial Neural Networks (ANNs) learn through a process called training, during which they adjust their internal parameters to make better predictions or decisions based on input data. Here's a step-by-step breakdown of how this learning process typically works:

1. **Initialization**: Before learning begins, the weights (the parameters that determine the importance of input signals) and biases (parameters that allow the model to fit better with training data) in the network are usually initialized with small random values.

2. **Feedforward**: During the feedforward phase, input data is passed through the network. Each neuron in the network processes the input by performing a weighted sum of the inputs, adds a bias, and then applies an activation function to the result. The activation function's output determines the neuron's output signal, which then becomes the input for the next layer in the network.

3. **Loss Calculation**: The output of the network is compared to the desired output, and the difference between them is calculated using a loss function. The loss function measures the error of the network's predictions and provides a single value that the network aims to minimize through training.

4. **Backpropagation**: Backpropagation is used to calculate the gradient of the loss function with respect to each weight and bias in the network. This process involves applying the chain rule from calculus to compute the gradients systematically from the output layer back through to the input layer.

5. **Weight Update**: Once the gradients are computed, the weights and biases are updated, typically using an optimization algorithm like gradient descent. This involves nudging the weights and biases in the opposite direction of the gradient by a small amount, proportional to a learning rate parameter. The learning rate controls how big a step is taken during each update and is crucial for the convergence and performance of the network.

6. **Iterate**: Steps 2-5 are repeated for many iterations over the training dataset, with the network continuing to adjust its weights and biases to reduce the loss.

7. **Evaluation**: After the training is complete, the network's performance is evaluated on a separate dataset not seen during training, called the validation set, to ensure that the network generalizes well to new data.

8. **Fine-tuning**: Based on the network's performance on the validation set, further fine-tuning of the model may occur, which can involve adjusting the learning rate, trying different architectures, or using regularization techniques to prevent overfitting.

Through these iterative processes, ANNs learn the complex relationships within the data they are trained on, allowing them to make predictions or decisions when presented with new, unseen data.


# Overview of a Neural Network’s Learning Process

The learning (training) process of a neural network is an iterative process in which the calculations are carried out forward and backward through each layer in the network until the loss function is minimized.

<img src="https://raw.githubusercontent.com/MaralAminpour/ML-BME-Course-UofA-Fall-2023/main/Week-8-Neural-Networks/imgs/learning_image1.webp" width = "700" >

The entire learning process can be divided into three main parts:


*   Forward propagation (Forward pass)
*   Calculation of the loss function
*   Backward propagation (Backward pass/Backpropagation)

We’ll begin with forward propagation.


## Forward propagation (Feed Forward Networks)

A feedforward network consists of an input layer, one or more hidden layers, and an output layer. The input layer receives the input into the neural network, and each input has a weight attached to it.

The weights associated with each input are numerical values. These weights are an indicator of the importance of the input in predicting the final output. For example, an input associated with a large weight will have a greater influence on the output than an input associated with a small weight.

<img src="https://raw.githubusercontent.com/MaralAminpour/ML-BME-Course-UofA-Fall-2023/main/Week-8-Neural-Networks/imgs/learning2.png" width = "700" >


When a neural network is first trained, it is first fed with input. Since the neural network isn’t trained yet, we don’t know which weights to use for each input. And so, each input is randomly assigned a weight. Since the weights are randomly assigned, the neural network will likely make the wrong predictions. It will give out the incorrect output.

When the neural network gives out the incorrect output, this leads to an output error. This error is the difference between the actual and predicted outputs. A cost function measures this error.

The cost function (J) indicates how accurately the model performs. It tells us how far-off our predicted output values are from our actual values. It is also known as the error. Because the cost function quantifies the error, we aim to minimize the cost function.

<img src="https://raw.githubusercontent.com/MaralAminpour/ML-BME-Course-UofA-Fall-2023/main/Week-8-Neural-Networks/imgs/minimum_loss.png" width = "400" >


What we want is to reduce the output error. Since the weights affect the error, we will need to readjust the weights. We have to adjust the weights such that we have a combination of weights that minimizes the cost function.


## This is where Backpropagation comes in…

Backpropagation allows us to readjust our weights to reduce output error. The error is propagated backward during backpropagation from the output to the input layer. This error is then used to calculate the gradient of the cost function with respect to each weight.

<img src="https://raw.githubusercontent.com/MaralAminpour/ML-BME-Course-UofA-Fall-2023/main/Week-8-Neural-Networks/imgs/back_propogation.png" width = "700" >

Essentially, backpropagation aims to calculate the negative gradient of the cost function. This negative gradient is what helps in adjusting of the weights. It gives us an idea of how we need to change the weights so that we can reduce the cost function.

Backpropagation uses the chain rule to calculate the gradient of the cost function. The chain rule involves taking the derivative. This involves calculating the partial derivative of each parameter. These derivatives are calculated by differentiating one weight and treating the other(s) as a constant. As a result of doing this, we will have a gradient.

Since we have calculated the gradients, we will be able to adjust the weights.

## Gradient Descent

The weights are adjusted using a process called gradient descent.

Gradient descent is an optimization algorithm that is used to find the weights that minimize the cost function. Minimizing the cost function means getting to the minimum point of the cost function. So, gradient descent aims to find a weight corresponding to the cost function’s minimum point.

<img src="https://raw.githubusercontent.com/MaralAminpour/ML-BME-Course-UofA-Fall-2023/main/Week-8-Neural-Networks/imgs/cost_min.png" width = "400" >


To find this weight, we must navigate down the cost function until we find its minimum point.

But first, to navigate the cost function, we need two things: the direction in which to navigate and the size of the steps for navigating.

### The Direction

The direction for navigating the cost function is found using the gradient.

### The Gradient

To know in which direction to navigate, gradient descent uses backpropagation. More specifically, it uses the gradients calculated through backpropagation. These gradients are used for determining the direction to navigate to find the minimum point.

<img src="https://raw.githubusercontent.com/MaralAminpour/ML-BME-Course-UofA-Fall-2023/main/Week-8-Neural-Networks/imgs/gradient.png" width = "500" >

Specifically, we aim to find the negative gradient. This is because a negative gradient indicates a decreasing slope. A decreasing slope means that moving downward will lead us to the minimum point. For example:

### The Step Size

The step size for navigating the cost function is determined using the learning rate.

###Learning Rate

The learning rate is a tuning parameter that determines the step size at each iteration of gradient descent. It determines the speed at which we move down the slope.

The step size plays an important part in ensuring a balance between optimization time and accuracy. The step size is measured by a parameter alpha (α). A small α means a small step size, and a large α means a large step size. If the step sizes are too large, we could miss the minimum point completely. This can yield inaccurate results. If the step size is too small, the optimization process could take too much time. This will lead to a waste of computational power.

<img src="https://raw.githubusercontent.com/MaralAminpour/ML-BME-Course-UofA-Fall-2023/main/Week-8-Neural-Networks/imgs/learning_rate.png" width = "600" >

The step size is evaluated and updated according to the behavior of the cost function. The higher the gradient of the cost function, the steeper the slope and the faster a model can learn (high learning rate). A high learning rate results in a higher step value, and a lower learning rate results in a lower step value. If the gradient of the cost function is zero, the model stops learning.

### Descending the Cost Function

Navigating the cost function consists of adjusting the weights. The weights are adjusted using the following formula:

This is the formula for gradient descent. As we can see, to obtain the new weight, we use the gradient, the learning rate, and an initial weight.

Adjusting the weights consists of multiple iterations. We take a new step down for each iteration and calculate a new weight. Using the initial weight and the gradient and learning rate, we can determine the subsequent weights.

Let’s consider a graphical example of this:

