
# Logistic Regression – Binary Classification with Evaluation Metrics

This notebook demonstrates a simple **Logistic Regression** model for binary classification.
- Train a model on synthetic data
- Predict labels on a test set
- Evaluate performance using:
  - Confusion Matrix
  - Accuracy, Precision, Recall, F1 Score
  - Visual summary of these metrics


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    confusion_matrix,
    classification_report,
    ConfusionMatrixDisplay,
    accuracy_score,
    precision_score,
    recall_score,
    f1_score
)

In [None]:
# Create binary classification dataset
X, y = make_classification(
    n_samples=500, n_features=4, n_informative=3, n_redundant=0,
    random_state=42
)

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [None]:
# Train logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

In [None]:
# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot(cmap="Blues")
plt.title("Confusion Matrix")
plt.show()

In [None]:
# Print classification report
print("Classification Report:")
print(classification_report(y_test, y_pred, digits=3))

# Extract scores for visualization
acc = accuracy_score(y_test, y_pred)
prec = precision_score(y_test, y_pred)
rec = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

metrics = pd.DataFrame({
    'Metric': ['Accuracy', 'Precision', 'Recall', 'F1 Score'],
    'Score': [acc, prec, rec, f1]
})

In [None]:
# Plot evaluation metrics
plt.figure(figsize=(8, 5))
sns.barplot(x='Metric', y='Score', data=metrics, palette='viridis')
plt.ylim(0, 1.05)
plt.title("Evaluation Metrics – Logistic Regression")
plt.ylabel("Score")
plt.tight_layout()
plt.show()


## Interpretation & Insights

- **Confusion Matrix**: Shows how many predictions were correct (diagonal) vs. incorrect (off-diagonal).
- **Accuracy**: Overall correctness of the model.
- **Precision**: How many predicted positives were actually correct.
- **Recall**: How well the model captures all actual positives.
- **F1 Score**: Harmonic mean of precision and recall — useful when classes are imbalanced.

> If **precision is high** and **recall is low**, the model is conservative (avoids false positives).  
> If **recall is high** but **precision is low**, the model is aggressive (catches more positives but with false alarms).

This helps guide decisions about **threshold tuning**, **model adjustments**, or **business risk tolerance**.
