# Module 4 — Session 2: Practical Exercises (Classification Metrics)

This notebook follows the instructions from your **M4, S2_ Practical Exercises** file.


## Exercise 1: The Metrics Calculator (Conceptual)

A model was tested on a set of 100 patient records to predict a disease. The results are summarized in the following confusion matrix:

|                | Predicted: No Disease | Predicted: Disease |
|----------------|----------------------:|-------------------:|
| **Actual: No Disease** | 85 | 5 |
| **Actual: Disease**    | 8  | 2 |

Tasks:
1) Identify TP, TN, FP, FN  
2) Calculate Accuracy  
3) Calculate Precision  
4) Calculate Recall  
5) Calculate F1-Score  
Show formulas and calculations.


In [None]:
# Exercise 1 (optional) — compute the metrics in Python (matches the manual work)
TP = 2
TN = 85
FP = 5
FN = 8

accuracy = (TP + TN) / (TP + TN + FP + FN)
precision = TP / (TP + FP) if (TP + FP) else 0
recall = TP / (TP + FN) if (TP + FN) else 0
f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) else 0

accuracy, precision, recall, f1


## Exercise 2: The Scikit-learn Report Card (Coding)

**Setup:** Use the same notebook/code from Session 1. You should already have:
- `X_train, X_test, y_train, y_test`
- a trained `model`
- `predictions = model.predict(X_test)`

Import the necessary evaluation functions below, then run the tasks:
- `classification_report`
- confusion matrix + seaborn heatmap
- ROC curve + AUC (needs probabilities)


In [None]:
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, roc_auc_score
import matplotlib.pyplot as plt
import seaborn as sns


### 2.1 Classification Report

Run:
- `print(classification_report(y_test, predictions))`

Then in a Markdown cell, write down the **Precision, Recall, F1-score** for class **'1' (Survived)**.


In [None]:
# Make sure these variables exist from Session 1:
# X_train, X_test, y_train, y_test, model
# predictions = model.predict(X_test)

print(classification_report(y_test, predictions))


### 2.2 Confusion Matrix (Seaborn Heatmap)

In [None]:
cm = confusion_matrix(y_test, predictions)

plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()


### 2.3 ROC Curve and AUC

ROC needs predicted probabilities:
- `y_pred_proba = model.predict_proba(X_test)[:, 1]`


In [None]:
y_pred_proba = model.predict_proba(X_test)[:, 1]

fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
auc = roc_auc_score(y_test, y_pred_proba)

plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label=f'AUC = {auc:.2f}')
plt.plot([0, 1], [0, 1], linestyle='--')  # random guess line
plt.title('ROC Curve')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend()
plt.show()

auc


## Exercise 3: The Strategist (Critical Thinking)

Write 2–3 sentences each:

1) A situation where **Precision** is much more important than **Recall**  
2) A situation where **Recall** is much more important than **Precision**


### 3.1 Precision > Recall

- **Scenario:** Spam detection / email filtering  
- **Reasoning (2–3 sentences):** We want to be very confident before labeling an email as spam, because a false positive can hide an important message (bank alerts, password resets, job offers). It’s better to let a bit of spam into the inbox than to mistakenly block a critical email, so precision matters most.

### 3.2 Recall > Precision

- **Scenario:** Screening for a serious disease (e.g., cancer screening)  
- **Reasoning (2–3 sentences):** In screening, missing a true case (false negative) can delay treatment and cause serious harm. It’s acceptable to flag some healthy patients for additional tests (false positives) if it ensures that most real cases are detected, so recall matters most.