# Logistic Regression — Classification Workflow

This notebook implements a **metric-first classification workflow** for logistic regression.

**We will focus on:**
- Class imbalance awareness
- Confusion matrix interpretation
- Precision / Recall / F1
- Simple threshold tuning
- (Optional) ROC–AUC

**Goal:** show why accuracy alone is not enough.

## 1. Setup and Data Loading

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
    accuracy_score,
    confusion_matrix,
    precision_score,
    recall_score,
    f1_score,
)

sns.set_style("whitegrid")
plt.rcParams["figure.figsize"] = (10, 6)

# TODO: replace with your real dataset (e.g., churn / loan default)
from sklearn.datasets import make_classification

X, y = make_classification(
    n_samples=2000,
    n_features=5,
    n_informative=3,
    n_redundant=2,
    n_classes=2,
    weights=[0.9, 0.1],  # imbalanced
    random_state=42,
)

df = pd.DataFrame(X, columns=[f"feature_{i+1}" for i in range(X.shape[1])])
df["target"] = y

print("Shape:", df.shape)
print("Class balance:\n", df["target"].value_counts(normalize=True))
df.head()

## 2. Train / Validation / Test Split (Stratified)

In [None]:
X = df.drop("target", axis=1)
y = df["target"]

X_temp, X_test, y_temp, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)

X_train, X_val, y_train, y_val = train_test_split(
    X_temp, y_temp, test_size=0.25, stratify=y_temp, random_state=42
)

print("Train / Val / Test sizes:")
print(len(X_train), len(X_val), len(X_test))