In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.dummy import DummyClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [None]:
# ------------------------------------------------------
# 1. Load dataset
# ------------------------------------------------------
df = pd.read_csv("final_engineered.csv")

# Target variable
TARGET = "pretrial_recidivism"

X = df.drop(columns=[TARGET])
y = df[TARGET]

In [None]:
# ------------------------------------------------------
# 2. Train/Validation/Test Split (70/15/15)
# ------------------------------------------------------
X_train, X_temp, y_train, y_temp = train_test_split(
    X, y, test_size=0.30, random_state=42, stratify=y
)

X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.50, random_state=42, stratify=y_temp
)

In [None]:
# ------------------------------------------------------
# 3. Baseline Model (Majority Class Predictor)
# ------------------------------------------------------
baseline = DummyClassifier(strategy="most_frequent")
baseline.fit(X_train, y_train)

In [None]:
# ------------------------------------------------------
# 4. Evaluate Baseline
# ------------------------------------------------------
val_pred = baseline.predict(X_val)
test_pred = baseline.predict(X_test)

In [None]:
print("\nBaseline Model (Most Frequent Class) Results")
print("--------------------------------------------")
print("Validation Accuracy:", accuracy_score(y_val, val_pred))
print("Test Accuracy:", accuracy_score(y_test, test_pred))

print("\nClassification Report (Test):")
print(classification_report(y_test, test_pred))

print("\nConfusion Matrix (Test):")
print(confusion_matrix(y_test, test_pred))


Baseline Model (Most Frequent Class) Results
--------------------------------------------
Validation Accuracy: 0.7672065709679896
Test Accuracy: 0.7671947924342913

Classification Report (Test):
              precision    recall  f1-score   support

         0.0       0.77      1.00      0.87     49972
         1.0       0.00      0.00      0.00     15164

    accuracy                           0.77     65136
   macro avg       0.38      0.50      0.43     65136
weighted avg       0.59      0.77      0.67     65136


Confusion Matrix (Test):
[[49972     0]
 [15164     0]]


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


This baseline model predicts only the majority class (non-recidivism). It achieves ~77% accuracy because the dataset is imbalanced, with most samples belonging to class 0. However, the model completely fails to identify any recidivism cases (class 1), resulting in 0 precision, 0 recall, and 0 F1-score for that class. This baseline establishes the minimum performance level that real machine learning models must surpass, particularly in improving detection of class 1.

I chose the DummyClassifier (most_frequent) as my baseline model because it provides the simplest, zero-intelligence benchmark for evaluating how well machine learning models actually perform on the recidivism prediction task.

This baseline model does not attempt to learn patterns from the data. Instead, it always predicts the majority class (0 = non-recidivism). Since my dataset is imbalanced, with class 0 appearing much more frequently than class 1, a model that predicts the majority class achieves relatively high accuracy (~77%) despite having no predictive power.

Goals of the Predictive Model (1-liners)

- Outperform the baseline accuracy (~0.77) by achieving higher overall accuracy.

- Achieve non-zero recall and precision for class 1 (baseline is 0.00).

- Improve F1-score for class 1 beyond the baseline value of 0.00.

- Increase recidivism recall to ensure fewer false negatives.

- Handle class imbalance so the model does not default to predicting only class 0.

- Demonstrate better validation and test performance than the majority-class classifier.

- Produce interpretable predictions that show meaningful patterns in the data.

- Maintain good generalization without large gaps between train/val/test scores.