# Logistic Regression with GridSearchCV
## Credit Card Fraud Detection

**Objective:**
To optimize a Logistic Regression model using GridSearchCV by tuning
class weights and regularization parameters for an imbalanced dataset.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
    classification_report,
    confusion_matrix,
    roc_auc_score,
    precision_score,
    recall_score,
    make_scorer,
    PrecisionRecallDisplay
)

In [2]:
df = pd.read_csv("data/creditcard.csv")

X = df.drop(columns=["Class"])
y = df["Class"]

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    stratify=y,
    random_state=42
)

The dataset is split using stratification to preserve the class distribution.

In [3]:
pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("model", LogisticRegression(
        max_iter=1000,
        solver="liblinear",
        random_state=42
    ))
])

In [4]:
recall_scorer = make_scorer(recall_score)

In [5]:
def min_precision_recall(y_true, y_pred):
    return min(
        precision_score(y_true, y_pred, zero_division=0),
        recall_score(y_true, y_pred, zero_division=0)
    )

custom_scorer = make_scorer(min_precision_recall)

In [6]:
param_grid = {
    "model__class_weight": [
        {0: 1, 1: w} for w in np.linspace(1, 20, 10)
    ],
    "model__C": np.logspace(-3, 1, 5)
}

Class weights are tuned to control the trade-off between fraud recall
and false positives.

In [None]:
grid = GridSearchCV(
    estimator=pipe,
    param_grid=param_grid,
    scoring=custom_scorer,  # or custom_scorer
    cv=5,
    n_jobs=-1,
    verbose=1
)

grid.fit(X_train, y_train)

Fitting 5 folds for each of 50 candidates, totalling 250 fits


In [None]:
grid.best_params_
best_model = grid.best_estimator_

In [None]:
y_pred = best_model.predict(X_test)
y_prob = best_model.predict_proba(X_test)[:, 1]

print(classification_report(y_test, y_pred))
roc_auc_score(y_test, y_prob)

In [None]:
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix (GridSearch)")
plt.show()

In [None]:
PrecisionRecallDisplay.from_predictions(y_test, y_prob)
plt.show()

## Observations

- GridSearch significantly improved fraud recall
- Increasing class weight improves recall at the cost of precision
- Logistic Regression remains interpretable and efficient
- Tuned model outperforms baseline in detecting fraud

## Conclusion

Using GridSearchCV to tune class weights and regularization parameters
significantly improves the performance of Logistic Regression on
highly imbalanced fraud detection data.