# Session 57: Classification Metrics Beyond Accuracy

**Unit 5: Basics of Predictive Analytics**
**Hour: 57**
**Mode: Practical Lab**

---

### 1. Objective

This lab dives deeper into classification evaluation, moving beyond accuracy. We will learn how to calculate and interpret **Precision**, **Recall**, and the **F1-Score** from the confusion matrix. These metrics give us a more nuanced understanding of our model's performance, especially on imbalanced datasets.

### 2. Setup

Let's recreate our Logistic Regression model, predictions, and confusion matrix.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score

# Load and prep data
url = 'https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv'
df = pd.read_csv(url)
df_subset = df[['tenure', 'MonthlyCharges', 'Contract', 'Churn']].copy()
df_subset.dropna(inplace=True)

# Prep data and train model
X = df_subset.drop('Churn', axis=1)
y = df_subset['Churn']
X_encoded = pd.get_dummies(X, columns=['Contract'], drop_first=True)
X_train, X_test, y_train, y_test = train_test_split(X_encoded, y, test_size=0.2, random_state=42)

# Fit model and make predictions
log_model = LogisticRegression(max_iter=1000)
log_model.fit(X_train, y_train)
y_pred = log_model.predict(X_test)

# Get our TN, FP, FN, TP values from last session
tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel()
print(f"TN: {tn}, FP: {fp}, FN: {fn}, TP: {tp}")

### 3. Understanding the Metrics

#### 3.1. Precision
*   **Question it answers:** Of all the customers the model **predicted** would churn, what percentage actually did?
*   **Formula:** `Precision = TP / (TP + FP)`
*   **Business Relevance:** High precision means you have a low **False Positive Rate**. When your model predicts a customer will churn, it's very likely to be correct. This is important when the cost of acting on a prediction is high (e.g., you don't want to waste money giving discounts to happy customers).

In [None]:
# Calculate using the formula
precision_manual = tp / (tp + fp)
print(f"Manual Precision: {precision_manual:.4f}")

# Calculate using scikit-learn
# pos_label='Yes' tells the function that 'Yes' is our positive class
precision_sklearn = precision_score(y_test, y_pred, pos_label='Yes')
print(f"Scikit-learn Precision: {precision_sklearn:.4f}")

**Interpretation:** When our model predicts a customer will churn, it is correct about **63.6%** of the time.

#### 3.2. Recall (or Sensitivity)
*   **Question it answers:** Of all the customers who **actually** churned, what percentage did our model correctly identify?
*   **Formula:** `Recall = TP / (TP + FN)`
*   **Business Relevance:** High recall means you have a low **False Negative Rate**. Your model is good at "finding" all the true churners. This is crucial when the cost of missing a case is very high (like missing a fraudulent transaction or a patient with a disease).

In [None]:
# Calculate using the formula
recall_manual = tp / (tp + fn)
print(f"Manual Recall: {recall_manual:.4f}")

# Calculate using scikit-learn
recall_sklearn = recall_score(y_test, y_pred, pos_label='Yes')
print(f"Scikit-learn Recall: {recall_sklearn:.4f}")

**Interpretation:** Our model successfully identified only **51.7%** of all the customers who actually churned. This is a weakness; we are missing almost half of the churners (high False Negative rate).

#### 3.3. The Precision-Recall Trade-off

Often, improving precision will lower recall, and vice-versa. A model that is very cautious will have high precision but might miss some cases (low recall). A model that tries to catch every case will have high recall but might have more false alarms (low precision). The best model finds a good balance.

#### 3.4. F1-Score
*   **Question it answers:** What is the harmonic mean of Precision and Recall?
*   **Formula:** `F1 = 2 * (Precision * Recall) / (Precision + Recall)`
*   **Business Relevance:** The F1-score is a single metric that combines both precision and recall. It's useful when you need to balance both concerns and find a model that is both accurate in its positive predictions and good at finding all the positive cases.

In [None]:
f1 = f1_score(y_test, y_pred, pos_label='Yes')
print(f"F1-Score: {f1:.4f}")

**Interpretation:** The F1-score of 0.57 provides a balanced summary of our model's performance on the positive class. 

### 4. Conclusion

In this lab, you went beyond accuracy to evaluate your classification model:
1.  **Precision:** Measures the accuracy of positive predictions.
2.  **Recall:** Measures the model's ability to find all actual positive cases.
3.  **F1-Score:** Provides a single, balanced metric between Precision and Recall.

For our churn model, we learned it has decent precision but poor recall (it misses too many churners). This tells us that if we wanted to improve this model, we should focus on techniques that reduce False Negatives.

**Next Session:** We will learn how to interpret our model to understand *why* it is making the predictions it does.