<details>
  <summary>Supervised Learning Steps</summary>
    
1. Data Collection
   * 1.1\. Data Sources
   * 1.2\. Data Collection Considerations
2. Data Exploration and Preparation
   * 2.1\. Data Exploration
   * 2.2\. Data Preparation/Cleaning
3. Split Data into Training and Test Sets
   * 3.1\. Holdout Method
   * 3.2\. Cross Validation
   * 3.3\. Data Leakage
   * 3.4\. Best Practices
4. Choose a Supervised Learning Algorithm
   * 4.1\. Consider algorithm categories
   * 4.2\. Evaluate algorithm characteristics
   * 4.3\. Try multiple algorithms
5. Train the Model
   * 5.1\. Objective Function (Loss/Cost Function)
   * 5.2\. Optimization Algorithms
   * 5.3\. Overfitting and Underfitting
6. Evaluate Model Performance
   * 6.1\. Evaluate Model Performance
   * 6.2\. Performance Metrics for Classification Models
   * 6.3\. Interpreting and Reporting Model Performance
7. Model Tuning and Selection
   * 7.1\. Hyperparameter Tuning
   * 7.2\. Ensemble Methods
</details>

# 5. Train the Model

![image.png](https://pbs.twimg.com/media/D3SwgeEWAAAaSEv.jpg)

## 5.1. Objective Function (Loss/Cost Function)

The objective function, also known as the loss or cost function, measures how well the model's predictions match the true labels in the training data. The goal is to minimize this function during training. Common loss functions include:

- **For regression**: Mean Squared Error (MSE), Mean Absolute Error (MAE)
- **For classification**: Cross-entropy loss, hinge loss

The loss is calculated over the entire training dataset and is a key metric to monitor during training.

## 5.2. Optimization Algorithms

To find the model parameters (weights and biases) that minimize the loss function, optimization algorithms are used. Some popular ones:

- **Gradient Descent**: Updates parameters in the direction of the negative gradient of the loss function.
- **Stochastic Gradient Descent (SGD)**: Estimates the gradient from a single example or subset of examples instead of the full dataset, allowing faster iterations.
- **Adam Optimizer**: An extension of SGD that adapts the learning rate for each parameter, providing faster convergence.

## 5.3. Overfitting and Underfitting

It's crucial to address the concepts of overfitting and underfitting when training models. Underfitting is caused by high bias (oversimplified model), while overfitting is caused by high variance (overly complex model that captures noise). The goal is to find the right balance between bias and variance by selecting an appropriate model complexity that can capture the true patterns in the data without overfitting.

### Overfitting (high variance and low bias)

Overfitting occurs when the model learns the training data too well, including the noise, and fails to generalize to new unseen data. This leads to poor performance on the test/validation set despite high training accuracy.

Variance refers to the amount that the model's predictions fluctuate when trained on different subsets of the training data. High variance indicates that the model is overly complex and sensitive to noise in the training data.

- High variance models are overly complex and can capture noise and random fluctuations in the training data instead of the true signal, leading to overfitting.

- Overfitting occurs when a model has high variance and low bias. The model fits the training data too well, including noise, but fails to generalize well to new unseen data, resulting in poor test performance despite high training accuracy.

### Underfitting (high bias and low variance)

Underfitting happens when the model is too simple and cannot capture the underlying patterns in the data, resulting in poor performance on both training and test sets.

Bias refers to the error introduced by overly simplistic assumptions in the learning algorithm. It is the inability of the model to capture the true underlying relationship between the input features and target variable.

- High bias models are oversimplified and cannot learn the complex patterns in the data, leading to underfitting.

- Underfitting occurs when a model has high bias and low variance. The model is too simple and fails to accurately represent the training data, resulting in poor performance on both training and test sets.


### Preventing Overfitting

Methods to prevent overfitting:

- **Regularization techniques**:
    - **L1 (Lasso) Regularization**: Adds the sum of absolute values of weights, driving some weights to zero for sparse models.
    - **L2 (Ridge) Regularization**: Adds the sum of squared weights, keeping all weights non-zero but small.
- **Dropout**: Randomly drops units from the neural network during training to prevent co-adaptation of features.
- **Early Stopping**: Stop training when validation error starts increasing
- **Cross-validation**: Splitting the data into training, validation, and test sets. The validation set is used to tune hyperparameters and monitor for overfitting during training.
- **Data augmentation**: Increasing the size and diversity of the training data by applying transformations like flipping, rotating, or adding noise. This helps the model generalize better.
- **Reducing model complexity**: Using a simpler model with fewer parameters, or techniques like pruning to remove unnecessary connections.
- **Ensemble methods**: Combining multiple models, such as bagging or boosting, to reduce variance and overfitting

### Preventing Underfitting

Methods to prevent underfitting:

- **Increasing model complexity**: Using a more complex model with more parameters or layers to capture the underlying patterns in the data.
- **Feature engineering**: Adding more relevant features or transforming existing ones to better represent the data.
- **Removing noise**: Cleaning and preprocessing the data to remove irrelevant or noisy features.
- **Increasing training time**: Training the model for more epochs or iterations to allow it to learn the patterns better.
- **Reducing regularization**: Decreasing the regularization strength if it is causing underfitting by overly constraining the model.