# Session 55: Classification Evaluation (Accuracy)

**Unit 5: Basics of Predictive Analytics**
**Hour: 55**
**Mode: Practical Lab**

---

### 1. Objective

This lab introduces the most intuitive metric for classification tasks: **Accuracy**. We will learn what it represents, how to calculate it, and discuss its limitations.

### 2. Setup

Let's recreate our Logistic Regression model and predictions from the previous session.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load and prep data
url = 'https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv'
df = pd.read_csv(url)
df_subset = df[['tenure', 'MonthlyCharges', 'Contract', 'Churn']].copy()
df_subset.dropna(inplace=True)

# Prep data and train model
X = df_subset.drop('Churn', axis=1)
y = df_subset['Churn']
X_encoded = pd.get_dummies(X, columns=['Contract'], drop_first=True)
X_train, X_test, y_train, y_test = train_test_split(X_encoded, y, test_size=0.2, random_state=42)

# Fit model and make predictions
log_model = LogisticRegression(max_iter=1000)
log_model.fit(X_train, y_train)
y_pred = log_model.predict(X_test)

### 3. Understanding Accuracy

**Definition:** Accuracy is the proportion of predictions that the model got correct.

**Formula:** `Accuracy = (Number of Correct Predictions) / (Total Number of Predictions)`

It's simple to understand and is a good starting point for most classification problems.

### 4. Calculating Accuracy

Scikit-learn provides the `accuracy_score` function. We compare the true labels (`y_test`) with our model's predictions (`y_pred`).

In [None]:
acc = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {acc:.4f}")
print(f"Model Accuracy as a percentage: {acc*100:.2f}%")

**Interpretation:** Our model correctly predicts whether a customer will churn or not approximately **79.53%** of the time on the unseen test data. This is a respectable first result.

### 5. The Limitation of Accuracy: Imbalanced Datasets

Accuracy can be misleading when you have an **imbalanced dataset**. An imbalanced dataset is one where one class is much more frequent than the other.

Let's check the balance of our target variable, `Churn`.

In [None]:
y_test.value_counts(normalize=True) * 100

**The Problem:** In our test set, about 73% of customers did **not** churn. This means that a lazy, "dumb" model that **always** predicts "No" for churn would still achieve an accuracy of 73%!

Our model's accuracy is `79.53%`, which is better than the lazy model's `73%`, so we know it has learned something useful. However, this illustrates why accuracy alone isn't enough.

**Analogy:** Imagine you have a test to detect a rare disease that only affects 1 in 1000 people. A model that always predicts "No Disease" would be 99.9% accurate, but it would be completely useless because it would never find the one person who is actually sick.

### 6. Conclusion

In this lab, you learned:
1.  **Accuracy** is the percentage of correct predictions.
2.  How to calculate it using Scikit-learn's `accuracy_score`.
3.  The critical limitation of accuracy, especially on **imbalanced datasets**, where it can be a misleading metric.

Because accuracy can be misleading, data scientists need more nuanced tools to evaluate classification models.

**Next Session:** We will introduce the **Confusion Matrix**, a powerful tool that breaks down a model's performance and gives us a much deeper understanding of its strengths and weaknesses.