In [1]:
import multiprocessing
print(multiprocessing.cpu_count())

import psutil
print(f"Available memory before training: {psutil.virtual_memory().available / 1e9:.2f} GB")

10
Available memory before training: 6.24 GB


# Diabetes Readmission – Logistic Regression with Regularization 
 
## Introduction 
 
This notebook implements logistic regression models with L1 (Lasso) and L1+L2 (Elastic Net) regularization for predicting hospital readmission within 30 days for diabetic patients. We use the preprocessed dataset created in the previous notebook, which includes: 
 
- **Statistical independence**: First encounter per patient only (71,518 patients) to meet logistic regression assumptions 
- **Diagnostic code consolidation**: ICD-9 codes grouped into high-level categories rather than one-hot encoded 
- **Box-Cox transformations**: Applied to skewed numeric features for better linear model performance 
- **Engineered features**: Service utilization scores, medication changes, and discharge groupings 
 
## Methodology 
 
**Class Imbalance Handling**: We use SMOTENC (Synthetic Minority Oversampling Technique for Categorical) to address the class imbalance in readmission outcomes, generating synthetic minority class examples while preserving the categorical nature of our features. 
 
**Regularization Approaches**: 
1. **Lasso (L1)**: Performs automatic feature selection by driving coefficients to zero 
2. **Elastic Net**: Combines L1 and L2 penalties, balancing feature selection with coefficient shrinkage 
 
**Hyperparameter Optimization**: Optuna's Bayesian optimization efficiently searches the regularization parameter space, significantly more effective than traditional grid search for these high-dimensional problems. 
 
**Preprocessing Pipeline**: MinMax scaling for numeric features and one-hot encoding for categorical features, resulting in ~2,900 features after expansion. 
 
The goal is to build interpretable models that can identify key predictors of readmission while maintaining good predictive performance through proper regularization.

In [2]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import pickle
import time

In [3]:
token = 'f11' # iteratable by the user as we try new things
randy = 42 # random value insertion for repeatability
log_reg = pd.read_pickle("../models/logReg.pkl") # See prior notebook, p02.

In [4]:
# Fill all categorical NaNs

categorical_cols_with_nans = [
    "primary_group",
    "primary_subgroup",
    "secondary_group",
    "secondary_subgroup",
    "secondary2_group",
    "secondary2_subgroup",
]

for col in categorical_cols_with_nans:
    log_reg[col] = log_reg[col].fillna("Missing")

## Memory Optimization

The `optimize_dtypes()` function reduces memory usage by downcasting numeric types to their smallest sufficient representation:
- `int64` → `int8/int16/int32` based on value ranges
- `float64` → `float32` when precision allows

This optimization is particularly valuable for large datasets and memory-intensive operations like SMOTE resampling.

In [5]:
def optimize_dtypes(df):
    
    """
    Here we convert some of our columns to save on memory & time
    """
    
    for col in df.columns:
        col_type = df[col].dtype

        if col_type == 'int64':
            c_min = df[col].min()
            c_max = df[col].max()

            if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
                df[col] = df[col].astype(np.int8)
            elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
                df[col] = df[col].astype(np.int16)
            elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
                df[col] = df[col].astype(np.int32)

        elif col_type == 'float64':
            c_min = df[col].min()
            c_max = df[col].max()

            if c_min > np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:
                df[col] = df[col].astype(np.float32)

    return df

In [6]:

log_reg = optimize_dtypes(log_reg)

In [7]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder 
from sklearn.model_selection import train_test_split, cross_val_score

from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression

from imblearn.over_sampling import SMOTENC

import optuna

from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix, roc_curve, auc
from sklearn.metrics import precision_score, recall_score, f1_score

  from .autonotebook import tqdm as notebook_tqdm


## Model Evaluation and Persistence Function

The `evaluate_and_save_pipeline()` function provides standardized evaluation across all modeling approaches in this project:

**Comprehensive Metrics Calculation:**
- **Classification performance**: Accuracy, precision, recall, F1-score, specificity
- **Probability-based metrics**: ROC curve data and AUC score for threshold optimization
- **Confusion matrix**: True/false positive/negative counts for detailed performance analysis
- **Prediction arrays**: Both binary predictions and probability scores for ensemble building

**Standardized Output Format:**
All metrics are saved in identical pickle format enabling:
- Direct performance comparison across different model types
- Consistent evaluation methodology regardless of underlying algorithm
- Easy integration into ensemble methods and model selection workflows
- Reproducible results with preserved prediction arrays

**Model Persistence:**
Trained pipelines are saved with preprocessing steps intact, ensuring deployment-ready models that can handle new data with the same
transformations applied during training.

This standardization is critical for fair model comparison and supports the ensemble modeling approach in later notebooks.

In [8]:
def evaluate_and_save_pipeline(pipeline, namestring, token, 
                                X_train, X_test, 
                                y_train, y_test, 
                                console_out = False):
    """
    Evaluates a trained pipeline and saves metrics to a pickle file.
    """

    # Input validation
    if any(v is None for v in [X_train, X_test, y_train, y_test]):
        raise ValueError("X_train, X_test, y_train, or y_test must not be None.")

    # Convert to numpy if needed
    y_train = y_train.values if hasattr(y_train, "values") else y_train
    y_test = y_test.values if hasattr(y_test, "values") else y_test

    # Make predictions once
    y_train_pred = pipeline.predict(X_train)
    y_test_pred = pipeline.predict(X_test)

    # Get probability predictions
    if hasattr(pipeline, "predict_proba"):
        y_test_pred_pct = pipeline.predict_proba(X_test)[:, 1]
    elif hasattr(pipeline, "decision_function"):
        y_test_pred_pct = pipeline.decision_function(X_test)
    else:
        raise AttributeError("Pipeline needs predict_proba() or decision_function() for ROC/AUC.")

    # Classification metrics (not regression metrics)
    accuracy = pipeline.score(X_test, y_test)
    precision = precision_score(y_test, y_test_pred)
    recall = recall_score(y_test, y_test_pred)  # Same as sensitivity
    f1 = f1_score(y_test, y_test_pred)

    # Confusion matrix metrics
    tn, fp, fn, tp = confusion_matrix(y_test, y_test_pred).ravel()
    specificity = tn / (tn + fp) if (tn + fp) > 0 else 0

    # ROC curve
    fpr, tpr, thresholds = roc_curve(y_test, y_test_pred_pct)
    roc_auc = auc(fpr, tpr)

    # Safe access to classes
    classes_ = getattr(pipeline, 'classes_', np.unique(y_train))

    # Save metrics
    pickle_metrics = {
        'model_version': f"{token}_{namestring}",
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1_score': f1,
        'specificity': specificity,
        'roc_auc': roc_auc,
        'y_test': y_test,
        'y_train_pred': y_train_pred,
        'y_test_pred': y_test_pred,
        'y_test_pred_proba': y_test_pred_pct,
        'display_labels': classes_,
        'confusion_matrix': {'tn': tn, 'fp': fp, 'fn': fn, 'tp': tp},
        'roc_curve': {'fpr': fpr, 'tpr': tpr, 'thresholds': thresholds},

        # SHAP-specific additions
        'shap_data': {
            'model': pipeline,
            'X_train_processed': pipeline.named_steps['preprocessor'].transform(X_train),
            'X_test_processed': pipeline.named_steps['preprocessor'].transform(X_test),
            'feature_names': pipeline.named_steps['preprocessor'].get_feature_names_out(),
            'original_feature_names': list(X_train.columns)
        }
    }
    
    # Save to file
    filename = f"../models/fits_pickle_{token}_{namestring}.pkl"
    with open(filename, "wb") as file:
        pickle.dump(pickle_metrics, file)

    if console_out:
        # Print summary
        print(f"Metrics saved to {filename}")
        print(f'Accuracy:    {accuracy:.4f}')
        print(f'Precision:   {precision:.4f}')
        print(f'Recall:      {recall:.4f}')
        print(f'F1-Score:    {f1:.4f}')
        print(f'Specificity: {specificity:.4f}')
        print(f'ROC AUC:     {roc_auc:.4f}')

        # Plot confusion matrix
        cm = confusion_matrix(y_test, y_test_pred)
        disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=classes_)
        disp.plot(cmap=plt.cm.Blues)
        plt.title(f"Confusion Matrix - {namestring}")
        plt.show()
    
    return pickle_metrics

In [9]:
X = log_reg.drop(["readmitted"], axis=1)
y = log_reg["readmitted"]

## Feature Type Usage

`exclude_features`: Used as a filter when defining the other feature types - ensures ID columns and target variable don't get included in modeling features.

`numeric_features`:
- Fed into the `MinMaxScaler` in the `ColumnTransformer` preprocessor
- Scales values to `[0,1]` range for logistic regression
- These remain as continuous variables (15 features)

`boolean_features`:
- Combined with `object_features` and passed to `OneHotEncoder`
- Gets one-hot encoded despite being boolean (creates dummy variables)
- Used in SMOTENC `categorical_features` index calculation

`object_features`:
- Combined with `boolean_features` and passed to `OneHotEncoder`
- Creates dummy variables for each category (`drop="first"` removes one for multicollinearity)
- Used in SMOTENC `categorical_features` index calculation

Combined usage:
- `categorical_features = [X.columns.get_loc(col) for col in object_features + boolean_features]` creates column indices for SMOTENC to know which features are categorical
- `ColumnTransformer` applies different preprocessing: `MinMaxScaler` to numeric, `OneHotEncoder` to categorical
- Results in feature expansion from 50 → 2,871 features after one-hot encoding

The separation allows proper preprocessing - continuous features get scaled, categorical features get encoded, and SMOTE knows which synthetic samples need categorical constraints.

In [10]:
# Training features to include
exclude_features = ["patient_nbr", "encounter_id", "readmitted"]
numeric_features = [col for col in X.columns
                    if col not in exclude_features and pd.api.types.is_numeric_dtype(X[col])
]
boolean_features = [col for col in X.columns 
                    if col not in exclude_features and X[col].dtype == "bool"
]
object_features = [col for col in X.columns 
                   if col not in exclude_features and X[col].dtype == "object"
]

In [11]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=randy
)

## Handling class imbalance

We'll use Synthetic Minority Oversampling Technique (SMOTE) to handle class imbalance issues. It's a method for handling imbalanced datasets by creating synthetic examples of the minority class rather than just duplicating existing ones. SMOTE generates new samples by interpolating between existing minority class samples and their nearest neighbors.

SMOTE is only used for training, not for model application.

### SMOTENC vs SMOTE: Handling Mixed Data Types

**Why SMOTENC is Required:**

Standard SMOTE (Synthetic Minority Oversampling Technique) only works with continuous numerical features. It generates synthetic samples by:
1. Finding k-nearest neighbors of minority class samples
2. Interpolating between a sample and its neighbors using linear combinations
3. Creating new points along the line segments between samples

**The Problem with Mixed Data Types:**
Our dataset contains both continuous (age, time_in_hospital, num_medications) and categorical features (race, medical_specialty, A1Cresult). Standard SMOTE would try to interpolate categorical values, potentially creating impossible combinations like:
- `race = 1.7` (meaningless interpolation between `"Caucasian"=1` and `"African American"=2`)
- `medical_specialty = "Cardiology + 0.3 * Internal Medicine"` (nonsensical categorical interpolation)

**SMOTENC Solution:**
SMOTENC (SMOTE for Nominal and Continuous) handles mixed data types by:
1. Continuous features: Uses standard SMOTE interpolation
2. Categorical features: Uses the mode (most frequent value) from the k-nearest neighbors instead of interpolation
3. Nearest neighbor calculation: Uses Gower distance metric that properly handles both data types

Implementation Details:
`categorical_features = [X.columns.get_loc(col) for col in object_features + boolean_features]`
This tells SMOTENC which column indices contain categorical data, ensuring proper synthetic sample generation that respects the categorical nature of features like diagnosis codes and medication names.

The result is realistic synthetic samples that maintain the integrity of both continuous measurements and discrete categorical classifications.

In [12]:
%%time
# Specify which columns are categorical (by index) 
categorical_features = [X.columns.get_loc(col) for col in object_features + boolean_features]

smote_nc = SMOTENC(categorical_features=categorical_features, random_state=randy)
X_train_resampled, y_train_resampled = smote_nc.fit_resample(X_train, y_train)
y_train_resampled.value_counts()
print(f"Training data shape: {X_train_resampled.shape}")
print(f"Memory usage: {X_train_resampled.memory_usage(deep=True).sum() / 1e9:.2f} GB")

Training data shape: (68610, 50)
Memory usage: 0.12 GB
CPU times: user 22.6 s, sys: 1.45 s, total: 24 s
Wall time: 24.2 s


## Preprocessing with ColumnTransformer

This applies different transformers to different columns simultaneously:

1. "num" step: Applies MinMaxScaler() to numeric_features
  - Scales numeric values to range [0,1]
2. "cat" step: Applies OneHotEncoder() to object_features and boolean features
  - drop="first": Removes first category to avoid multicollinearity
  - sparse_output=True: Returns sparse matrix (memory efficient)
  - handle_unknown="ignore": Creates all-zero row for unseen categories

The output combines both transformations into a single feature matrix - scaled numerics + one-hot encoded categoricals.

In [13]:
preprocessor = ColumnTransformer(
    transformers=[
        ("num", MinMaxScaler(), numeric_features),
        (
            "cat",
            OneHotEncoder(drop="first", sparse_output=True, handle_unknown="ignore"),
            object_features + boolean_features,
        ),  # Combine both categorical types
    ]
)

In [14]:
# Final check of NaN's before training
print("Checking for NaNs in categorical columns...")
categorical_cols = object_features + boolean_features
nan_check = {}

for col in categorical_cols:
    nan_count = X[col].isna().sum()
    if nan_count > 0:
        nan_check[col] = nan_count

if nan_check:
    print("STOP! Still have NaNs:")
    for col, count in nan_check.items():
        print(f"  {col}: {count} NaNs")
    print("\nFill these before training!")
else:
    print("No NaNs found in categorical columns - safe to train!")

Checking for NaNs in categorical columns...
No NaNs found in categorical columns - safe to train!


## Lasso implementation

We'll start with a Lasso approach first.

In [15]:
# Check feature expansion after preprocessing - in the modeling call, this is done by the pipeline function
X_processed = preprocessor.fit_transform(X_train_resampled)
print(f"After preprocessing: {X_processed.shape}")
print(f"Feature expansion: {X_processed.shape[1]} features (from {X_train_resampled.shape[1]})")
print(f"Sparsity: {1 - X_processed.nnz / (X_processed.shape[0] * X_processed.shape[1]):.3f}")

After preprocessing: (68610, 2871)
Feature expansion: 2871 features (from 50)
Sparsity: 0.987


### Lasso (L1) Hyperparameter Tuning Strategy

Hyperparameter Being Tuned:
- `C`: Inverse regularization strength (0.05-1.0, log scale) - smaller values mean stronger regularization

Solver Choice - LibLinear:
LibLinear is specifically chosen for L1 regularization because:
- Coordinate descent optimization: Efficiently handles the non-differentiable L1 penalty at zero
- Sparse solution handling: Optimized for problems where many coefficients become exactly zero
- Computational efficiency: Faster convergence for high-dimensional sparse problems like ours (2,871 features)
- L1 specialization: Unlike general-purpose solvers, liblinear is designed specifically for L1-penalized problems

The L1 penalty drives coefficients to exactly zero, performing automatic feature selection - critical for interpretability with our 2,871 features after one-hot encoding.

In [16]:
%%time
# Optuna implementation 

def objective(trial):
      C = trial.suggest_float('C', .05, 1, log=True)

      model = Pipeline([
          ('preprocessor', preprocessor),
          ('model', LogisticRegression(penalty='l1', 
                                       solver='liblinear', 
                                       C=C, 
                                       random_state=randy))
      ])

      scores = cross_val_score(model, 
                               X_train_resampled, 
                               y_train_resampled, 
                               cv=5, 
                               scoring='roc_auc',
                               n_jobs=-1)
      return scores.mean()

study = optuna.create_study(direction='maximize')
study.optimize(objective, 
               n_trials=100,
               callbacks=[optuna.study.MaxTrialsCallback(n_trials=15, 
               states=[optuna.trial.TrialState.COMPLETE])])  # Much smarter than grid search

[I 2025-07-07 09:03:01,677] A new study created in memory with name: no-name-76c545af-0d21-4185-83e5-c9424a16ae54




[I 2025-07-07 09:20:35,019] Trial 14 finished with value: 0.7365113967633599 and parameters: {'C': 0.264729366398427}. Best is trial 11 with value: 0.7365989452921752.


CPU times: user 2.9 s, sys: 1.13 s, total: 4.03 s
Wall time: 17min 33s


In [17]:
# After study.optimize() completes
best_params = study.best_params
print(f"Best parameters: {best_params}")
print(f"Best AUC: {study.best_value:.4f}")

Best parameters: {'C': 0.21689040246062266}
Best AUC: 0.7366


In [18]:
%%time 
log_Lasso = Pipeline(steps=[('preprocessor', preprocessor), 
                                  ('model', LogisticRegression(penalty='l1',
                                                                 solver='liblinear',
                                                                 C=best_params['C'],
                                                                 random_state=randy,
                                                                 max_iter=100000,
                                                                 n_jobs=-1))
                                 ])
log_Lasso.fit(X_train_resampled, y_train_resampled)

# Save the trained model
with open(f"../models/{token}_log_Lasso.pkl", "wb") as file:
    pickle.dump(log_Lasso, file)
print(f"Model saved as {token}_log_Lasso.pkl") 



Model saved as f11_log_Lasso.pkl
CPU times: user 1min 18s, sys: 597 ms, total: 1min 19s
Wall time: 1min 19s


### Final Lasso Pipeline and Evaluation

**Pipeline Components:**
This final pipeline combines preprocessing and the optimized Lasso model:
1. **Preprocessing**: MinMax scaling + one-hot encoding (50 → 2,871 features)
2. **Lasso Model**: L1 regularization with optimized C parameter for automatic feature selection

**Model Persistence:**
The complete pipeline is saved as `{token}_log_Lasso.pkl`, preserving both the fitted preprocessor and trained model for deployment.

**Standardized Evaluation:**
The `evaluate_and_save_pipeline` is used here for standardized evaluation in other notebooks.

In [19]:
# Open a trained model
# log_Lasso = pd.read_pickle("../models/f01_log_Lasso.pkl")

In [20]:
evaluate_and_save_pipeline(
    pipeline=log_Lasso, 
    namestring='log_Lasso',
    token=token, 
    X_train=X_train_resampled, 
    X_test=X_test, 
    y_train=y_train_resampled, 
    y_test=y_test
)





{'model_version': 'f11_log_Lasso',
 'accuracy': 0.624064881493393,
 'precision': 0.52331897359056,
 'recall': 0.4966228226093139,
 'f1_score': 0.5096215230278158,
 'specificity': np.float64(0.7066958626253314),
 'roc_auc': np.float64(0.6529259331653884),
 'y_test': array([0, 0, 0, ..., 1, 0, 0], dtype=int8),
 'y_train_pred': array([1, 1, 1, ..., 1, 1, 1], dtype=int8),
 'y_test_pred': array([0, 0, 0, ..., 1, 1, 0], dtype=int8),
 'y_test_pred_proba': array([0.49744732, 0.21709086, 0.41688131, ..., 0.6531187 , 0.57877947,
        0.34670747]),
 'display_labels': array([0, 1], dtype=int8),
 'confusion_matrix': {'tn': np.int64(6132),
  'fp': np.int64(2545),
  'fn': np.int64(2832),
  'tp': np.int64(2794)},
 'roc_curve': {'fpr': array([0.00000000e+00, 1.15247205e-04, 1.15247205e-04, ...,
         9.94237640e-01, 9.94237640e-01, 1.00000000e+00]),
  'tpr': array([0.00000000e+00, 0.00000000e+00, 1.77746178e-04, ...,
         9.99822254e-01, 1.00000000e+00, 1.00000000e+00]),
  'thresholds': array

## Elasticnet implementation

### Elastic Net Hyperparameter Tuning Strategy

Hyperparameters Being Tuned:
- `C`: Inverse regularization strength (0.01-0.5, log scale) - controls overall penalty magnitude
- `l1_ratio`: Mixing parameter (0.1-0.9) - balances L1 vs L2 regularization (0=Ridge, 1=Lasso)

Solver Choice - SAGA:
SAGA is required for Elastic Net because:
- Dual penalty support: Only solver in scikit-learn that handles both L1 and L2 penalties simultaneously
- Stochastic optimization: Uses variance-reduced stochastic gradients for efficient convergence on large datasets
- Numerical stability: Better handles the combined L1+L2 penalty term without convergence issues
- Flexibility: Can handle the full spectrum from pure Ridge (`l1_ratio=0`) to pure Lasso (`l1_ratio=1`)

Elastic Net combines L1's feature selection with L2's coefficient shrinkage, potentially providing better performance when groups of correlated
features exist (common in our medical dataset with related diagnostic codes).

In [21]:
%%time

def objective_enet(trial):
    C = trial.suggest_float('C', 0.01, 0.5, log=True)  # Narrower range based on your CV result
    l1_ratio = trial.suggest_float('l1_ratio', 0.1, 0.9)  # ElasticNet mixing parameter

    model = Pipeline([
        ('preprocessor', preprocessor),
        ('model', LogisticRegression(
            penalty='elasticnet',
            solver='saga',  # Only solver that supports elasticnet
            C=C,
            l1_ratio=l1_ratio,  # New parameter for ElasticNet
            random_state=randy,
            max_iter=100000
        ))
    ])

    scores = cross_val_score(
        model,
        X_train_resampled,
        y_train_resampled,
        cv=5,
        scoring='roc_auc',
        n_jobs=-1  # Keep for CV parallelization
    )
    return scores.mean()

study_enet = optuna.create_study(direction='maximize')
study_enet.optimize(
    objective_enet,
    n_trials=50,  # Fewer trials to speed up
    timeout=3600  # Stop after 1 hour max
)


[I 2025-07-07 09:21:54,977] A new study created in memory with name: no-name-05cb662a-1b0c-4946-894a-247eb589e298


[I 2025-07-07 10:22:16,689] Trial 7 finished with value: 0.7374019266666993 and parameters: {'C': 0.14720803388280645, 'l1_ratio': 0.3640218127141768}. Best is trial 7 with value: 0.7374019266666993.


CPU times: user 3.55 s, sys: 1.88 s, total: 5.43 s
Wall time: 1h 21s


In [22]:
# After study.optimize() completes
best_params = study_enet.best_params
print(f"Best ElasticNet params: {study_enet.best_params}")
print(f"Best ElasticNet AUC: {study_enet.best_value:.4f}")

Best ElasticNet params: {'C': 0.14720803388280645, 'l1_ratio': 0.3640218127141768}
Best ElasticNet AUC: 0.7374


In [23]:
%%time 
log_ENet = Pipeline(steps=[('preprocessor', preprocessor), 
                                  ('model', LogisticRegression(penalty='elasticnet',
                                                                 solver='saga',
                                                                 C=best_params['C'],
                                                                 l1_ratio=best_params['l1_ratio'],
                                                                 random_state=randy,
                                                                 max_iter=100000,
                                                                 n_jobs=-1 ))
                                 ])
log_ENet.fit(X_train_resampled, y_train_resampled)

# Save the trained model
with open(f"../models/{token}_log_ENet.pkl", "wb") as file:
    pickle.dump(log_ENet, file)
print(f"Model saved as {token}_log_ENet.pkl") 

Model saved as f11_log_ENet.pkl
CPU times: user 5min 32s, sys: 151 ms, total: 5min 32s
Wall time: 5min 32s


### Final Elastic Net Pipeline and Evaluation

**Pipeline Components:**
This final pipeline combines preprocessing and the optimized Elastic Net model:
1. Preprocessing: MinMax scaling + one-hot encoding (50 → 2,871 features)
2. Elastic Net Model: Combined L1+L2 regularization with optimized C and l1_ratio parameters

**Model Persistence:**
The complete pipeline is saved as `{token}_log_ENet.pkl`, preserving both the fitted preprocessor and trained model for deployment.

**Standardized Evaluation:**
The same `evaluate_and_save_pipeline()` function is used here as well.

In [24]:
# Open a trained model
# log_ENet = pd.read_pickle("../models/f01_log_ENet.pkl")

In [25]:
evaluate_and_save_pipeline(
    pipeline=log_ENet, 
    namestring='log_ENet',
    token=token, 
    X_train=X_train_resampled, 
    X_test=X_test, 
    y_train=y_train_resampled, 
    y_test=y_test
)





{'model_version': 'f11_log_ENet',
 'accuracy': 0.6234356428721247,
 'precision': 0.5222222222222223,
 'recall': 0.5012442232492001,
 'f1_score': 0.511518229639035,
 'specificity': np.float64(0.7026622104413968),
 'roc_auc': np.float64(0.6533970414530635),
 'y_test': array([0, 0, 0, ..., 1, 0, 0], dtype=int8),
 'y_train_pred': array([1, 1, 1, ..., 1, 1, 1], dtype=int8),
 'y_test_pred': array([1, 0, 0, ..., 1, 1, 0], dtype=int8),
 'y_test_pred_proba': array([0.50322397, 0.21800525, 0.4239119 , ..., 0.6370222 , 0.55686554,
        0.34375856]),
 'display_labels': array([0, 1], dtype=int8),
 'confusion_matrix': {'tn': np.int64(6097),
  'fp': np.int64(2580),
  'fn': np.int64(2806),
  'tp': np.int64(2820)},
 'roc_curve': {'fpr': array([0.00000000e+00, 1.15247205e-04, 1.15247205e-04, ...,
         9.93430909e-01, 9.93430909e-01, 1.00000000e+00]),
  'tpr': array([0.00000000e+00, 0.00000000e+00, 1.77746178e-04, ...,
         9.99822254e-01, 1.00000000e+00, 1.00000000e+00]),
  'thresholds': arra