<details>
  <summary>Supervised Learning Steps</summary>
    
1. Data Collection
   * 1.1\. Data Sources
   * 1.2\. Data Collection Considerations
2. Data Exploration and Preparation
   * 2.1\. Data Exploration
   * 2.2\. Data Preparation/Cleaning
3. Split Data into Training and Test Sets
   * 3.1\. Holdout Method
   * 3.2\. Cross Validation
   * 3.3\. Data Leakage
   * 3.4\. Best Practices
4. Choose a Supervised Learning Algorithm
   * 4.1\. Consider algorithm categories
   * 4.2\. Evaluate algorithm characteristics
   * 4.3\. Try multiple algorithms
5. Train the Model
   * 5.1\. Objective Function (Loss/Cost Function)
   * 5.2\. Optimization Algorithms
   * 5.3\. Overfitting and Underfitting
6. Evaluate Model Performance
   * 6.1\. Performance Metrics for Regression Models
   * 6.2\. Performance Metrics for Classification Models
7. Model Tuning and Selection
   * 7.1\. Hyperparameter Tuning
   * 7.2\. Ensemble Methods
</details>

## 6. Evaluate Model Performance

![image](https://miro.medium.com/v2/resize:fit:1280/0*6syXK-mCaQnvmjgS)

### 6.1. Performance Metrics for Regression Models

#### Mean Squared Error (MSE)

\begin{equation*}
\frac{1}{n} \Sigma_{i=1}^n({y}-\hat{y})^2
\end{equation*}

- Calculates the average squared difference between predicted and actual values
- Squaring the errors gives more weight to larger errors
- Sensitive to outliers, as squaring amplifies the effect of large errors

#### Root Mean Squared Error (RMSE)

\begin{equation*}
\sqrt{\frac{1}{n} \sum_{i=1}^n(y_i - \hat{y}_i)^2}
\end{equation*}

- Square root of MSE, providing the same units as the target variable
- Easier to interpret than MSE, as it represents the typical magnitude of error

#### Mean Absolute Error (MAE)

\begin{equation*}
\frac{1}{n} \Sigma_{i=1}^n |{y}-\hat{y}|
\end{equation*}

- Calculates the average absolute difference between predicted and actual values
- Less sensitive to outliers compared to MSE/RMSE
- Easier to interpret than MSE, as it represents the typical magnitude of error

#### R-squared (Coefficient of Determination)

\begin{equation*}
R^2 = 1 - \frac{\sum_{i=1}^{n}(y_i - \hat{y}i)^2}{\sum_{i=1}^{n}(y_i - \bar{y})^2}
\end{equation*}

- Measures the proportion of variance in the target variable that is explained by the model
- Ranges from 0 to 1, with 1 indicating a perfect fit
- Useful for comparing different models, but can be misleading in some cases

#### Residual Analysis

\begin{equation*}
y_i - \hat{y}_i
\end{equation*}

- Residuals: Differences between predicted and actual values
- Residual plots: Visualize residuals against predicted values or other features
- Identify patterns, outliers, and violations of regression assumptions
- Useful for diagnosing issues with the model or data


-----

### 6.2. Performance Metrics for Classification Models

#### Confusion Matrix

![image](https://miro.medium.com/v2/0*-oGC3SE8sPCPdmxs.jpg)

(Sensitivity = recall)

A few notes:
- Accuracy is simple and easy to understand. It can be misleading for imbalanced datasets, as it doesn't provide insight into types of errors
- Precision is the proportion of true positives out of predicted positives (how many positives were actually correct)
- Recall is the roportion of true positives out of actual positives (how many actual positives were correctly identified)


#### F1 Score

The F1 score is a measure that combines precision and recall into a single metric for evaluating the performance of a classification model. It is calculated as the harmonic mean of precision and recall.

The formula for the F1 score is:

\begin{equation*}
F_1 = 2 \cdot \frac{\text{precision} \cdot \text{recall}}{\text{precision} + \text{recall}}
\end{equation*}

The F1 score ranges from 0 to 1, with 1 being the best possible value, indicating perfect precision and recall

- It provides a balanced way to combine precision and recall into a single metric.
- It is particularly useful when there is an uneven class distribution (imbalanced dataset) since accuracy alone can be misleading in such cases.
- A high F1 score indicates that the model has both high precision (minimizing false positives) and high recall (minimizing false negatives).
- It is widely used in areas like information retrieval, natural language processing, and machine learning classification tasks.
- The F1 score can be adjusted to give more weight to precision (F0.5 score) or recall (F2 score) based on the specific requirements of the problem.

#### Receiver Operating Characteristic (ROC) Curve and Area Under the Curve (AUC)

![image](https://i0.wp.com/sefiks.com/wp-content/uploads/2020/12/roc-curve-original.png?ssl=1)

The ROC curve and AUC are useful for evaluating and comparing the performance of binary classification models.

**ROC Curve**

The ROC curve is a graphical representation of the performance of a binary classification model at different classification thresholds. It plots the True Positive Rate (TPR) or Recall on the y-axis against the False Positive Rate (FPR) on the x-axis.
- True Positive Rate (TPR) or Recall = $\frac{\text{TP}}{\text{TP}+\text{FN}}$
- False Positive Rate (FPR) =  $\frac{\text{FP}}{\text{FP}+\text{TN}}$

The ROC curve is created by varying the classification threshold from 0 to 1 and calculating the TPR and FPR at each threshold. The curve shows the trade-off between the TPR and FPR for different thresholds.

The ROC curve provides a visual representation of the trade-off between the TPR and FPR, allowing you to choose an appropriate classification threshold based on the desired balance between these two metrics.

**AUC**

The Area Under the ROC Curve (AUC) is a single scalar value that summarizes the overall performance of the binary classifier across all possible classification thresholds. It ranges from 0 to 1, with higher values indicating better performance.

- An AUC of 1.0 represents a perfect classifier that can correctly classify all instances.
- An AUC of 0.5 represents a random classifier, where the model's predictions are no better than a random guess.
- An AUC of 0.0 represents a classifier that is completely wrong in its predictions.

The AUC can be interpreted as the probability that the classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance.

The AUC is a single metric that summarizes the overall performance of the model, making it easier to compare different models or different configurations of the same model.

#### Precision-Recall Curve

![image](https://miro.medium.com/v2/resize:fit:1400/0*1z69voTBb04MIzig)

A Precision-Recall Curve is a plot that visualizes the trade-off between precision and recall for different probability thresholds in a binary classification model. The curve shows the precision (y-axis) against the recall (x-axis) at various thresholds.

A high precision indicates a low false positive rate, while a high recall indicates a low false negative rate. The ideal model would have both precision and recall equal to 1, but typically there is a trade-off where increasing precision reduces recall and vice versa.

So this curve is useful for imbalanced datasets and applications where recall is more important than precision (or vice versa), as it helps visualize this trade-off and select an appropriate threshold based on the desired balance of precision and recall.