# Logistic Regression Demo (Binary Classification)

This notebook walks through training a logistic regression model on the scikit-learn breast cancer dataset. It explains the sigmoid function, probability outputs, and how to evaluate the classifier.

## Sigmoid Function
The sigmoid squashes any real-valued input into (0, 1):

$$\sigma(z) = \frac{1}{1 + e^{-z}}$$

In logistic regression, $z$ is the linear combination of features and weights. The sigmoid output is interpreted as the probability of the positive class.

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, RocCurveDisplay, PrecisionRecallDisplay
import matplotlib.pyplot as plt
import seaborn as sns

## Load and Inspect Data

In [None]:
dataset = load_breast_cancer(as_frame=True)
X = dataset.data
y = dataset.target
print(X.shape)
X.head()

## Train/Test Split and Scaling
Standardizing features helps solvers like `lbfgs` converge faster because features are on similar scales.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## Train Logistic Regression
The model uses the sigmoid to convert linear outputs to probabilities. A probability >= 0.5 becomes class 1; otherwise class 0.

In [None]:
model = LogisticRegression(max_iter=200, solver="lbfgs", random_state=42)
model.fit(X_train_scaled, y_train)
probabilities = model.predict_proba(X_test_scaled)[:, 1]
preds = (probabilities >= 0.5).astype(int)

## Evaluation

In [None]:
acc = accuracy_score(y_test, preds)
prec = precision_score(y_test, preds)
rec = recall_score(y_test, preds)
f1 = f1_score(y_test, preds)
cm = confusion_matrix(y_test, preds)
print({"accuracy": acc, "precision": prec, "recall": rec, "f1": f1})

In [None]:
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=["benign", "malignant"], yticklabels=["benign", "malignant"])
plt.xlabel("Predicted")
plt.ylabel("True")
plt.title("Confusion Matrix")
plt.show()

## Probability vs Label
The raw probability helps you choose different decision thresholds. Here is the distribution of predicted probabilities for the positive class.

In [None]:
plt.hist(probabilities, bins=20, color="teal", edgecolor="black")
plt.axvline(0.5, color="red", linestyle="--", label="Decision boundary = 0.5")
plt.xlabel("Predicted probability (class 1)")
plt.ylabel("Count")
plt.legend()
plt.show()

## ROC and Precision-Recall Curves

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
RocCurveDisplay.from_predictions(y_test, probabilities, ax=axes[0])
axes[0].set_title("ROC Curve")
PrecisionRecallDisplay.from_predictions(y_test, probabilities, ax=axes[1])
axes[1].set_title("Precision-Recall Curve")
plt.show()