<a href="https://colab.research.google.com/github/QusaiALBahri/Day_13_logistic_regression/blob/main/Day_13_logistic_regression_masterclass.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Logistic Regression — Super Comprehensive Masterclass (English Only)

*Built on September 27, 2025 (UTC).*  
This notebook places a **full explanation before each code cell**. For plotting, all commands live in a **single cell per plot** so figures render correctly in Jupyter.

### Explanation
**Learning Goals**
- Build intuition for probabilities, odds, log-odds, and the sigmoid function.
- Define the logistic regression model and the decision rule.
- Train from scratch with NumPy (log-loss, gradients, gradient descent).
- Use scikit-learn for regularized logistic regression and pipelines.
- Evaluate with accuracy, precision, recall, F1, ROC/AUC, PR curves.
- Tune thresholds, handle class imbalance, and interpret coefficients (odds ratios).
- Understand multiclass extensions (OvR vs. multinomial/softmax).
- Avoid common pitfalls (data leakage, scaling, convergence, interpretation).

### Explanation
Import core libraries and set a seed for reproducibility.

In [None]:
import numpy as np

In [None]:
import pandas as pd

In [None]:
import matplotlib.pyplot as plt

In [None]:
np.random.seed(42)

### Explanation
Define the sigmoid and sample it for visualization.

In [None]:
def sigmoid(z):
    return 1.0 / (1.0 + np.exp(-z))

In [None]:
z = np.linspace(-8, 8, 400)

In [None]:
s = sigmoid(z)

### Explanation
Plot the sigmoid curve (single cell for a stable figure).

In [None]:

plt.figure()
plt.plot(z, s)
plt.xlabel('z')
plt.ylabel('sigmoid(z)')
plt.title('Sigmoid Function')
plt.grid(True)
plt.show()


### Explanation
Define odds and logit; logistic regression models log-odds as linear in features.

In [None]:
def odds(p):
    return p / (1 - p)

In [None]:
def logit(p):
    return np.log(odds(p))

In [None]:
probs = np.array([0.1, 0.25, 0.5, 0.75, 0.9])

In [None]:
probs_odds = odds(probs)

In [None]:
probs_logit = logit(probs)

### Explanation
Show a comparison table.

In [None]:
pd.DataFrame({'p': probs, 'odds': probs_odds, 'logit(p)': probs_logit})

### Explanation
Model: z = w^T x + b; p = σ(z). Predict class 1 if p ≥ 0.5 by default.

In [None]:
def linear_score(X, w, b):
    return X @ w + b

In [None]:
def predict_proba(X, w, b):
    return sigmoid(linear_score(X, w, b))

In [None]:
def predict_class(X, w, b, threshold=0.5):
    return (predict_proba(X, w, b) >= threshold).astype(int)

### Explanation
Log-loss with probability clipping for numerical stability.

In [None]:
def log_loss(y_true, p):
    eps = 1e-15
    p = np.clip(p, eps, 1 - eps)
    return -np.mean(y_true * np.log(p) + (1 - y_true) * np.log(1 - p))

### Explanation
Closed-form gradients enable efficient gradient descent.

In [None]:
def gradients(X, y, w, b):
    p = predict_proba(X, w, b)
    error = p - y
    grad_w = X.T @ error / X.shape[0]
    grad_b = np.mean(error)
    return grad_w, grad_b

### Explanation
Synthesize a clean 2D dataset (two Gaussian blobs).

In [None]:
n = 600

In [None]:
X0 = np.random.multivariate_normal(mean=[-1.5, -1.0], cov=[[1.0, 0.2],[0.2, 1.0]], size=n//2)

In [None]:
X1 = np.random.multivariate_normal(mean=[1.3, 1.0], cov=[[1.0, -0.1],[-0.1, 1.0]], size=n//2)

In [None]:
X = np.vstack([X0, X1])

In [None]:
y = np.hstack([np.zeros(n//2, dtype=int), np.ones(n//2, dtype=int)])

### Explanation
Shuffle rows so classes are mixed.

In [None]:
perm = np.random.permutation(n)

In [None]:
X = X[perm]

In [None]:
y = y[perm]

### Explanation
Scatter plot (single cell).

In [None]:

plt.figure()
plt.scatter(X[y==0,0], X[y==0,1], alpha=0.6, label='class 0')
plt.scatter(X[y==1,0], X[y==1,1], alpha=0.6, label='class 1')
plt.legend()
plt.title('Synthetic Dataset')
plt.xlabel('x1')
plt.ylabel('x2')
plt.grid(True)
plt.show()


### Explanation
Standardize features to zero mean and unit variance.

In [None]:
X_mean = X.mean(axis=0)

In [None]:
X_std = X.std(axis=0)

In [None]:
Xz = (X - X_mean) / X_std

### Explanation
Initialize parameters and choose training hyperparameters.

In [None]:
w = np.zeros(Xz.shape[1])

In [None]:
b = 0.0

In [None]:
lr = 0.1

In [None]:
epochs = 300

### Explanation
Training loop; record log-loss periodically.

In [None]:

history = []
for epoch in range(epochs):
    gw, gb = gradients(Xz, y, w, b)
    w -= lr * gw
    b -= lr * gb
    if (epoch+1) % 10 == 0:
        p = predict_proba(Xz, w, b)
        loss = log_loss(y, p)
        history.append(loss)


### Explanation
Plot log-loss vs epochs (single cell).

In [None]:

plt.figure()
plt.plot(np.arange(10, epochs+1, 10), history)
plt.xlabel('Epoch')
plt.ylabel('Log-Loss')
plt.title('From-Scratch Training Loss')
plt.grid(True)
plt.show()


### Explanation
Training accuracy.

In [None]:
yhat = predict_class(Xz, w, b)

In [None]:
acc = (yhat == y).mean()

In [None]:
acc

### Explanation
Decision boundary with a robust colorbar (pass the mappable explicitly).

In [None]:

xx, yy = np.meshgrid(
    np.linspace(Xz[:,0].min()-0.5, Xz[:,0].max()+0.5, 200),
    np.linspace(Xz[:,1].min()-0.5, Xz[:,1].max()+0.5, 200),
)
grid = np.c_[xx.ravel(), yy.ravel()]
pp = predict_proba(grid, w, b).reshape(xx.shape)

fig, ax = plt.subplots()
cf = ax.contourf(xx, yy, pp, levels=30, alpha=0.3)
ax.scatter(Xz[y==0,0], Xz[y==0,1], alpha=0.6, label='class 0')
ax.scatter(Xz[y==1,0], Xz[y==1,1], alpha=0.6, label='class 1')
cbar = fig.colorbar(cf, ax=ax)
cbar.set_label('P(y=1)')
ax.set_title('Decision Boundary (From Scratch)')
ax.set_xlabel('x1 (standardized)')
ax.set_ylabel('x2 (standardized)')
ax.legend()
plt.show()


### Explanation
Switch to scikit-learn; stratified train/test split.

In [None]:
from sklearn.linear_model import LogisticRegression

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, roc_curve, auc, confusion_matrix

In [None]:
X_train, X_test, y_train, y_test = train_test_split(Xz, y, test_size=0.25, random_state=42, stratify=y)

### Explanation
Instantiate and fit regularized LR; then get predictions and probabilities.

In [None]:
clf = LogisticRegression(max_iter=1000)

In [None]:
clf.fit(X_train, y_train)

In [None]:
y_pred = clf.predict(X_test)

In [None]:
test_acc = accuracy_score(y_test, y_pred)

In [None]:
test_acc

In [None]:
y_proba = clf.predict_proba(X_test)[:, 1]

### Explanation
Precision/Recall/F1 and the confusion matrix.

In [None]:
prec, rec, f1, _ = precision_recall_fscore_support(y_test, y_pred, average='binary', zero_division=0)

In [None]:
(prec, rec, f1)

In [None]:
cm = confusion_matrix(y_test, y_pred)

In [None]:
cm

### Explanation
ROC curve & AUC (single cell).

In [None]:

fpr, tpr, _ = roc_curve(y_test, y_proba)
roc_auc = auc(fpr, tpr)
fig, ax = plt.subplots()
ax.plot(fpr, tpr, label=f'ROC (AUC = {roc_auc:.3f})')
ax.plot([0,1], [0,1], linestyle='--')
ax.set_xlabel('False Positive Rate')
ax.set_ylabel('True Positive Rate')
ax.set_title('ROC Curve')
ax.legend(loc='lower right')
ax.grid(True)
plt.show()


### Explanation
Precision–Recall curve & Average Precision (single cell).

In [None]:

from sklearn.metrics import precision_recall_curve, average_precision_score
pr_prec, pr_rec, _ = precision_recall_curve(y_test, y_proba)
ap = average_precision_score(y_test, y_proba)
fig, ax = plt.subplots()
ax.plot(pr_rec, pr_prec, label=f'PR (AP = {ap:.3f})')
ax.set_xlabel('Recall')
ax.set_ylabel('Precision')
ax.set_title('Precision-Recall Curve')
ax.legend(loc='lower left')
ax.grid(True)
plt.show()


### Explanation
Inspect coefficients/intercept and convert to odds ratios.

In [None]:
clf.coef_, clf.intercept_

In [None]:
odds_ratios = np.exp(clf.coef_)

In [None]:
odds_ratios

### Explanation
Show effect of stronger regularization (smaller C).

In [None]:
clf_strong = LogisticRegression(C=0.1, max_iter=1000)

In [None]:
clf_strong.fit(X_train, y_train)

In [None]:
clf_strong.coef_

### Explanation
Pipeline: scaling + one-hot encoding + LR (prevents leakage).

In [None]:
from sklearn.pipeline import Pipeline

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
from sklearn.compose import ColumnTransformer

In [None]:
from sklearn.preprocessing import OneHotEncoder

In [None]:
df = pd.DataFrame({'age':[22,25,47,52,46,56,55,60],'income':[35,40,80,90,83,120,110,130],'city':['A','B','A','B','B','A','B','A'],'y':[0,0,1,1,1,1,1,1]})

In [None]:
X_mixed = df[['age','income','city']]

In [None]:
y_mixed = df['y'].values

In [None]:
num_features = ['age','income']

In [None]:
cat_features = ['city']

In [None]:
preprocess = ColumnTransformer([('num', StandardScaler(), num_features), ('cat', OneHotEncoder(handle_unknown='ignore'), cat_features)])

In [None]:
pipe = Pipeline(steps=[('prep', preprocess), ('clf', LogisticRegression(max_iter=1000))])

In [None]:
pipe.fit(X_mixed, y_mixed)

In [None]:
pipe.predict_proba(pd.DataFrame({'age':[28], 'income':[50], 'city':['A']}))

### Explanation
Grid search over C with 5-fold CV (ROC-AUC scoring).

In [None]:
from sklearn.model_selection import GridSearchCV

In [None]:
param_grid = {'clf__C':[0.01, 0.1, 1.0, 10.0], 'clf__penalty':['l2'], 'clf__solver':['lbfgs']}

In [None]:
grid = GridSearchCV(pipe, param_grid=param_grid, cv=5, scoring='roc_auc')

In [None]:
grid.fit(X_mixed, y_mixed)

In [None]:
grid.best_params_, grid.best_score_

### Explanation
Threshold sweep plot (single cell so it renders reliably).

In [None]:

thresholds = np.linspace(0, 1, 101)
prec_list, rec_list = [], []
for t in thresholds:
    y_pred_t = (y_proba >= t).astype(int)
    pr, rc, _, _ = precision_recall_fscore_support(y_test, y_pred_t, average='binary', zero_division=0)
    prec_list.append(pr)
    rec_list.append(rc)
fig, ax = plt.subplots()
ax.plot(thresholds, prec_list, label='Precision')
ax.plot(thresholds, rec_list, label='Recall')
ax.set_xlabel('Threshold')
ax.set_ylabel('Score')
ax.set_title('Precision/Recall vs. Threshold')
ax.legend()
ax.grid(True)
plt.show()


### Explanation
Handle imbalance with class weights; multinomial for naturally multiclass problems.

In [None]:
imb_clf = LogisticRegression(class_weight='balanced', max_iter=1000)

In [None]:
imb_clf.fit(X_train, y_train)

In [None]:
multi_clf = LogisticRegression(multi_class='multinomial', solver='lbfgs', max_iter=1000)

In [None]:
multi_clf.fit(X_train, y_train)

### Explanation
Add L2 to the from-scratch objective; gradient adds 2*lam*w.

In [None]:
def log_loss_l2(y_true, p, w, lam=1.0):
    return log_loss(y_true, p) + lam * np.sum(w*w)

In [None]:
def gradients_l2(X, y, w, b, lam=1.0):
    gw, gb = gradients(X, y, w, b)
    gw = gw + 2*lam*w
    return gw, gb

In [None]:

w2 = np.zeros_like(w)
b2 = 0.0
lam = 0.1
for epoch in range(150):
    gw, gb = gradients_l2(Xz, y, w2, b2, lam=lam)
    w2 -= 0.1 * gw
    b2 -= 0.1 * gb


### Explanation
**Pitfalls & Checklist**
- [ ] Scale features before regularized LR.
- [ ] Avoid data leakage (fit preprocessing on train or use a Pipeline).
- [ ] Inspect PR/ROC on imbalanced data (accuracy can mislead).
- [ ] Watch collinearity; regularize.
- [ ] Ensure convergence (solver, max_iter).
- [ ] Interpret coefficients in standardized units.
- [ ] Linear decision boundary: consider interactions or other models if needed.