<details>
  <summary>Supervised Learning Steps</summary>
    
1. Data Collection
   * 1.1\. Data Sources
   * 1.2\. Data Collection Considerations
2. Data Exploration and Preparation
   * 2.1\. Data Exploration
   * 2.2\. Data Preparation/Cleaning
3. Split Data into Training and Test Sets
   * 3.1\. Holdout Method
   * 3.2\. Cross Validation
   * 3.3\. Data Leakage
   * 3.4\. Best Practices
4. Choose a Supervised Learning Algorithm
   * 4.1\. Consider algorithm categories
   * 4.2\. Evaluate algorithm characteristics
   * 4.3\. Try multiple algorithms
5. Train the Model
   * 5.1\. Objective Function (Loss/Cost Function)
   * 5.2\. Optimization Algorithms
   * 5.3\. Overfitting and Underfitting
6. Evaluate Model Performance
   * 6.1\. Evaluate Model Performance
   * 6.2\. Performance Metrics for Classification Models
   * 6.3\. Interpreting and Reporting Model Performance
7. Model Tuning and Selection
   * 7.1\. Hyperparameter Tuning
   * 7.2\. Ensemble Methods
</details>

## 6. Evaluate Model Performance

### 6.1. Performance Metrics for Regression Models

**Mean Squared Error (MSE)**
- Calculates the average squared difference between predicted and actual values
- Squaring the errors gives more weight to larger errors
- Sensitive to outliers, as squaring amplifies the effect of large errors

**Root Mean Squared Error (RMSE)**
- Square root of MSE, providing the same units as the target variable
- Easier to interpret than MSE, as it represents the typical magnitude of error

**Mean Absolute Error (MAE)**
- Calculates the average absolute difference between predicted and actual values
- Less sensitive to outliers compared to MSE/RMSE
- Easier to interpret than MSE, as it represents the typical magnitude of error

**R-squared (Coefficient of Determination)**
- Measures the proportion of variance in the target variable that is explained by the model
- Ranges from 0 to 1, with 1 indicating a perfect fit
- Useful for comparing different models, but can be misleading in some cases

**Residual Analysis**
- Residuals: Differences between predicted and actual values
- Residual plots: Visualize residuals against predicted values or other features
- Identify patterns, outliers, and violations of regression assumptions
- Useful for diagnosing issues with the model or data
 
### 6.2. Performance Metrics for Classification Models

**Accuracy**
- Definition: Proportion of correct predictions out of total predictions
- Pros: Simple and easy to understand
- Cons: Can be misleading for imbalanced datasets, doesn't provide insight into types of errors

**Precision, Recall, and F1-score**
- Precision: Proportion of true positives out of predicted positives (how many positives were actually correct)
- Recall: Proportion of true positives out of actual positives (how many actual positives were correctly identified)
- F1-score: Harmonic mean of precision and recall, provides a balanced measure
Confusion Matrix

**Tabular representation of correct and incorrect predictions**
- Rows represent actual classes, columns represent predicted classes
- Provides insights into types of errors (false positives, false negatives)

**Receiver Operating Characteristic (ROC) Curve and Area Under the Curve (AUC)**
- ROC curve: Plots true positive rate (recall) against false positive rate at different classification thresholds
- AUC: Area under the ROC curve, ranges from 0 to 1 (higher is better)
- Useful for evaluating binary classification models and comparing different models
- AUC of 0.5 indicates a random classifier, while 1.0 indicates a perfect classifier

**Precision-Recall Curve**
- Plots precision against recall at different classification thresholds
- Useful for imbalanced datasets and applications where recall is more important than precision (or vice versa)
- Precision-recall curves are more informative than ROC curves for imbalanced datasets

**Metrics for Multi-class Classification**
- Micro-averaging: Calculates metrics globally by treating all classes as one
- Macro-averaging: Calculates metrics for each class and takes the average
- Weighted averaging: Calculates metrics for each class and takes the weighted average based on class frequencies

### 6.3. Interpreting and Reporting Model Performance

**Analyzing and Interpreting Evaluation Metrics**
- Understand the strengths and weaknesses of different metrics
- Consider the problem context and business requirements
- Analyze trade-offs between different metrics (e.g., precision vs. recall)

**Visualizing Model Performance**
- Learning curves: Plot performance metric against training set size to diagnose bias and variance issues
- Residual plots: Plot residuals (actual - predicted) against predicted values or other features to identify patterns and outliers

**Comparing Model Performance**
- Use appropriate statistical tests (e.g., paired t-test, McNemar's test) to determine if performance differences are statistically significant
- Consider confidence intervals and effect sizes in addition to point estimates

**Reporting Model Performance**
- Report performance metrics on a held-out test set (not the training or validation set)
- Include confidence intervals or standard deviations for performance estimates
- Describe the evaluation methodology (data splitting, cross-validation, etc.)
- Discuss limitations, assumptions, and potential sources of bias or error
Follow best practices and guidelines for reporting machine learning results (e.g., MLCommons, NIST guidelines)