# **Threshold Analysis**

Logistic regression outputs probabilities.

Classification decisions depend on the chosen threshold.

In [1]:
import sys
import os

current_dir = os.getcwd()

project_root = os.path.abspath(os.path.join(current_dir, "..")) 

if project_root not in sys.path:
    sys.path.append(project_root)

In [2]:
import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

from src.model import LogisticRegression
from src.preprocessing import StandardScaler
from src.metrics import f1_score
from src.plotting import handle_plot

In [3]:
data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model = LogisticRegression(learning_rate=0.01, n_iters=1000)
model.fit(X_train_scaled, y_train)


Epoch 0 | Loss: 0.693147
Epoch 100 | Loss: 0.244663
Epoch 200 | Loss: 0.182222
Epoch 300 | Loss: 0.153932
Epoch 400 | Loss: 0.137189
Epoch 500 | Loss: 0.125909
Epoch 600 | Loss: 0.117686
Epoch 700 | Loss: 0.111363
Epoch 800 | Loss: 0.106313
Epoch 900 | Loss: 0.102163


**Metrics are evaluated across multiple thresholds to observe precision-recall trade-offs.**

In [4]:
thresholds = [0.3, 0.5, 0.7]
f1_scores = []

for t in thresholds:
    preds = model.predict(X_test_scaled, threshold=t)
    f1_scores.append(f1_score(y_test, preds))


In [None]:
SAVE_PLOTS = False

In [6]:
plt.figure(figsize=(6, 4))
plt.plot(thresholds, f1_scores, marker="o")
plt.xlabel("Threshold")
plt.ylabel("F1 Score")
plt.title("Threshold vs F1 Score")
plt.grid(True)

handle_plot(filename="threshold_vs_f1.png",
            save_plots=SAVE_PLOTS)

**Lower thresholds improve recall.**

**Higher thresholds improve precision.**

**F1 highlights the balance point.**