# Supervised Learning - Model Evaluation

Model evaluation is a critical step in understanding how well a machine learning model performs on unseen data. It involves using various metrics to assess the model's accuracy and reliability.

---

## Key Concepts

### Metrics

#### Accuracy
The ratio of correct predictions to total predictions.

**Formula:**
$$
\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
$$

Where:
- **TP**: True Positives  
- **TN**: True Negatives  
- **FP**: False Positives  
- **FN**: False Negatives  

---

#### Precision
The ratio of true positives to the total predicted positives.

**Formula:**
$$
\text{Precision} = \frac{TP}{TP + FP}
$$

---

#### Recall (Sensitivity)
The ratio of true positives to the total actual positives.

**Formula:**
$$
\text{Recall} = \frac{TP}{TP + FN}
$$

---

#### F1-Score
The harmonic mean of precision and recall.

**Formula:**
$$
\text{F1-Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
$$

---

#### ROC Curve (Receiver Operating Characteristic)
Plots the **True Positive Rate (TPR)** against the **False Positive Rate (FPR)** at various threshold values.

- **True Positive Rate (TPR)**:
$$
\frac{TP}{TP + FN}
$$

- **False Positive Rate (FPR)**:
$$
\frac{FP}{FP + TN}
$$

---

#### AUC (Area Under Curve)
The **AUC** measures the ability of the model to distinguish between classes. A higher AUC indicates better performance:
- **AUC = 1.0**: Perfect model.
- **AUC = 0.5**: Random guessing.

---

## Practical: Evaluate Models in Python

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, roc_curve, auc, RocCurveDisplay
import matplotlib.pyplot as plt

data = pd.read_csv("sample_dataset.csv")  # Replace with your dataset
X = data.drop("target", axis=1)
y = data["target"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]  # Probabilities for ROC curve

print("Classification Report:")
print(classification_report(y_test, y_pred))

fpr, tpr, thresholds = roc_curve(y_test, y_prob)
roc_auc = auc(fpr, tpr)

plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='blue', label=f"ROC Curve (AUC = {roc_auc:.2f})")
plt.plot([0, 1], [0, 1], color='gray', linestyle='--')  # Diagonal line
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend(loc="lower right")
plt.show()

### Explanation of the Code
- **Model Training:**

    - Trains a decision tree model on the dataset.
- **Classification Report:**

    - `classification_report`: Provides a summary of precision, recall, F1-score, and accuracy for each class.
- **ROC Curve:**

    - `roc_curve:` Computes the FPR and TPR for different thresholds.
    - `auc:` Calculates the area under the ROC curve.
    - Plots the ROC curve, highlighting the trade-off between sensitivity and specificity.

### Example Output
#### Classification Report:

| Metric      | Class 0 | Class 1 | Weighted Avg |
|-------------|---------|---------|--------------|
| Precision   | 0.85    | 0.90    | 0.87         |
| Recall      | 0.80    | 0.92    | 0.85         |
| F1-Score    | 0.82    | 0.91    | 0.86         |

**ROC Curve:**
- A graph with TPR (True Positive Rate) on the Y-axis and FPR (False Positive Rate) on the X-axis.
- AUC = 0.95 (indicating excellent model performance).

---

## Summary
- Accuracy works well with balanced datasets.
- Precision and Recall are crucial for imbalanced datasets, focusing on relevant and missed predictions, respectively.
- F1-Score balances precision and recall.
- ROC Curve and AUC help understand the trade-off between sensitivity and specificity.
This evaluation workflow ensures a comprehensive understanding of model performance!