# **Training Neural Networks**

## 1. **Backpropagation**
- **Definition**: Backpropagation, or backward propagation of errors, is a fundamental algorithm for training neural networks. It involves calculating the error (loss) and then propagating this error backward from the output layer to the input layer to adjust weights.
- **Process**:
  1. Perform a **forward pass** to compute the output.
  2. Calculate the error between the actual output and predicted output using a **loss function**.
  3. Compute the **gradient** (partial derivative of the error with respect to each weight) using the chain rule.
  4. Update the weights in the opposite direction of the gradient to reduce the error.
  5. Repeat the process iteratively over several epochs to minimize the error.
![1_J-v2B6T9RKxdvwThtQ1NVg.png](attachment:1_J-v2B6T9RKxdvwThtQ1NVg.png)

## 2. **Weight Updates**
- **Importance**: Weight updates are essential for reducing prediction errors and improving model accuracy.
- **Process**:
  1. **Gradient Calculation**: Compute how much the error changes with a small change in weight.
  2. **Weight Adjustment**: Adjust the weights using the formula:  
     $ \text{New Weight} = \text{Old Weight} - \text{Learning Rate} \times \text{Gradient} $
  3. **Learning Rate**: Controls the speed of learning. A high learning rate can lead to rapid convergence but risks overshooting the minimum error, while a low rate can make learning slower.



## 3. **Optimizers**
### a. **Gradient Descent**
- **Description**: A fundamental optimization algorithm that updates weights to minimize the loss function.
- **Variants**:
  1. **Batch Gradient Descent**: Uses the entire dataset for each weight update. Slow but accurate.
  2. **Stochastic Gradient Descent (SGD)**: Updates weights for each training example, making it faster but noisier.
  3. **Mini-Batch Gradient Descent**: Uses small batches of data for updates, balancing the speed and stability of learning.
![Gradient-Descent-of-Machine-Learning.jpg](attachment:Gradient-Descent-of-Machine-Learning.jpg)


### b. **Adam Optimizer (Adaptive Moment Estimation)**
- **Description**: An advanced optimization algorithm that combines the benefits of **momentum** and **RMSprop**. It uses adaptive learning rates for each parameter, adjusting them dynamically.
- **Advantages**: Handles noisy gradients well, faster convergence, effective with sparse data.
- **Parameters**: 
  - **Learning Rate**: Step size for weight updates.
  - **Beta 1**: Decay rate for moving average of the gradient.
  - **Beta 2**: Decay rate for moving average of the squared gradient.



## 4. **Loss Functions (Cost Functions)**
- **Purpose**: Quantifies how well the model’s predictions match the actual outcomes, guiding weight updates.
- **Common Types**:
  1. **Mean Squared Error (MSE)**: Measures average squared difference between actual and predicted values. Common in regression tasks.
     $ \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $
  2. **Cross-Entropy Loss**: Measures the difference between two probability distributions. Common in classification tasks.
     $ L(y, \hat{y}) = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i)] $

## 5. **Regularization Techniques**
### a. **Dropout Regularization**
- **Definition**: A technique to prevent overfitting by randomly setting a fraction of neurons to zero during training, forcing the network to learn more robust features.
- **Benefits**: Enhances generalization, reduces reliance on specific neurons.
- **Dropout Rate**: Fraction of neurons to drop (e.g., 0.5 means 50% neurons are turned off).
![Neural-Networks-with-Dropout-for-Effective-Regularization.webp](attachment:Neural-Networks-with-Dropout-for-Effective-Regularization.webp)

## 6. **Learning Parameters**
### a. **Batch Size**
- **Definition**: The number of training samples processed before the model’s weights are updated.
- **Impact**:
  - **Small Batch**: Faster updates but noisier.
  - **Large Batch**: Slower updates but more stable.
  
### b. **Iterations**
- **Definition**: A single pass of a batch through the network.
- **Formula**: $ \text{Iterations} = \frac{\text{Total Samples}}{\text{Batch Size}} $




### c. **Epochs**
- **Definition**: One full cycle through the entire training dataset.
- **Impact**: Training with multiple epochs allows the model to learn patterns better.

An **epoch** in neural networks refers to one complete cycle of training the neural network using the entire training dataset. During an epoch, the model processes every data point exactly once, involving a forward pass and a backward pass for each input.

- **Forward Pass**: Data goes through the network, predictions are made.
- **Backward Pass**: The error is calculated, and weights are adjusted accordingly.
![epoch-fwd-bwd-pass.webp](attachment:epoch-fwd-bwd-pass.webp)
#### **Key Points**
- **One Epoch** = One full cycle through the entire training dataset.
- **Batches**: An epoch can be broken down into smaller batches (subsets of the dataset). A batch is used to calculate updates to the weights.
  - A **batch** is a portion of the dataset processed before updating the model’s weights.
  - **Iteration**: One pass through a single batch is called an **iteration**. Multiple iterations make up one epoch.
- The goal is to repeat the training process for multiple epochs to improve the model's accuracy and minimize the error.


## 7. **How Weights Update During Backpropagation**
- **Step-by-Step Process**:
  1. **Forward Pass**: The network computes outputs based on current weights.
  2. **Calculate Loss**: Error is measured using a loss function.
  3. **Backward Pass**: Compute gradients using backpropagation, determining how to adjust each weight.
  4. **Gradient Descent**: Adjust weights to minimize the error.
  5. **Repeat**: Continue for multiple epochs to refine weights.


