In [1]:
import multiprocessing
print(multiprocessing.cpu_count())

import psutil
print(f"Available memory before training: {psutil.virtual_memory().available / 1e9:.2f} GB")

10
Available memory before training: 6.49 GB


# Diabetes Readmission – Neural Network Classification

## Introduction

This notebook implements a deep neural network using TensorFlow/Keras for predicting hospital readmission within 30 days for diabetic patients. We use a specialized preprocessed dataset optimized for neural networks, which includes:

- Full dataset: All encounters retained (101,763 records), as neural networks can learn complex patterns from correlated observations
- Mixed feature representation: Both consolidated diagnostic categories and binary indicator features for comprehensive pattern recognition
- Box-Cox transformations: Applied to skewed numeric features for improved neural network convergence
- Standardized features: All numeric inputs scaled for optimal gradient-based learning

## Methodology

**Architecture Search**: Using Keras Tuner's Bayesian Optimization to systematically explore neural network architectures across:
- Network depth: Number of hidden layers
- Layer width: Neurons per layer 
- Regularization: Dropout rates and batch normalization
- Learning dynamics: Learning rate optimization

**Advanced Training Techniques**:
- Early stopping: Prevents overfitting with patience-based monitoring
- Learning rate reduction: Adaptive LR scheduling for fine-tuned convergence
- Class weighting: Handles imbalanced dataset without synthetic sampling
- Batch normalization: Stabilizes training and improves convergence speed

Preprocessing Pipeline: StandardScaler for numeric features and ordinal encoding for categoricals, ensuring all inputs are properly normalized for neural network training.

Hardware Optimization: Configured for Apple Silicon GPU acceleration while maintaining CPU fallback compatibility for broader reproducibility.

The goal is to leverage neural networks' ability to capture complex non-linear feature interactions and patterns that traditional ML methods may miss, while maintaining robust performance through proper architecture selection and training regularization.

In [2]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import shutil
import pickle
import json
import time

In [3]:
token = 'f11' # iteratable by the user as we try new things
randy = 42 # random value insertion for repeatability
nn = pd.read_pickle("../models/neural_net.pkl") # See prior notebook, p02.

In [4]:
nn.info()

<class 'pandas.core.frame.DataFrame'>
Index: 101763 entries, 0 to 101765
Columns: 102 entries, encounter_id to has_E990_E999
dtypes: bool(4), float64(6), int64(64), object(28)
memory usage: 77.3+ MB


## Memory Optimization

The `optimize_dtypes()` function reduces memory usage by downcasting numeric types to their smallest sufficient representation:
- `int64` → `int8/int16/int32` based on value ranges
- `float64` → `float32` when precision allows

This optimization is particularly valuable for large datasets and memory-intensive operations like SMOTE resampling.

In [5]:
def optimize_dtypes(df):
    
    """
    Here we convert some of our columns intelligently to save on memory & time
    """
    
    for col in df.columns:
        col_type = df[col].dtype

        if col_type == 'int64':
            c_min = df[col].min()
            c_max = df[col].max()

            if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
                df[col] = df[col].astype(np.int8)
            elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
                df[col] = df[col].astype(np.int16)
            elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
                df[col] = df[col].astype(np.int32)

        elif col_type == 'float64':
            c_min = df[col].min()
            c_max = df[col].max()

            if c_min > np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:
                df[col] = df[col].astype(np.float32)

    return df

In [6]:
nn = optimize_dtypes(nn)
nn.info() # 40+mb RAM savings

<class 'pandas.core.frame.DataFrame'>
Index: 101763 entries, 0 to 101765
Columns: 102 entries, encounter_id to has_E990_E999
dtypes: bool(4), float32(6), int16(1), int32(2), int8(61), object(28)
memory usage: 32.1+ MB


In [7]:
import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

In [8]:
import tensorflow as tf 

In [9]:
print("TensorFlow version:", tf.__version__)
print("Available devices:")
for device in tf.config.list_physical_devices():
    print(f"  {device}")
print("Metal GPU available:", len(tf.config.list_physical_devices('GPU')) > 0)

TensorFlow version: 2.19.0
Available devices:
  PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')
  PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
Metal GPU available: True


In [10]:
from concurrent.futures import ThreadPoolExecutor
import threading

from tensorflow import keras 
from tensorflow.keras import layers 

import keras_tuner as kt

from itertools import product
import random

from sklearn.compose import ColumnTransformer 
from sklearn.preprocessing import StandardScaler, OrdinalEncoder 
from sklearn.model_selection import train_test_split 
from sklearn.model_selection import ParameterGrid

from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.metrics import accuracy_score, roc_curve, auc, confusion_matrix
from sklearn.metrics import precision_score, recall_score, f1_score

## Model Evaluation and Persistence Function

The `evaluate_and_save_pipeline()` function provides standardized evaluation across all modeling approaches in this project. However, for this neural network model, it's a little bit different: 

- **TensorFlow integration**: Keras models use different prediction methods and don't fit sklearn pipeline structure
- **Additional data**: Captures training history and architecture parameters unique to neural networks
- **Preprocessing separation**: Works with pre-transformed numpy arrays rather than pipeline wrappers

**Consistent Output:**
Despite implementation differences, saves metrics in identical pickle format as other models for fair comparison and ensemble integration.

This standardization is critical for fair model comparison and supports the ensemble modeling approach in later notebooks.

In [11]:
# Evaluate and save
def evaluate_and_save_nn(model, history, params, namestring, token, 
                        X_train_processed, X_test_processed, 
                        y_train, y_test,
                        preprocessor, original_feature_names,
                        console_out = False):
    """Evaluate neural network and save metrics in same format as other models"""

    # Predictions
    y_train_pred = (model.predict(X_train_processed).flatten() > 0.5).astype(int)
    y_test_pred_proba = model.predict(X_test_processed).flatten()
    y_test_pred = (y_test_pred_proba > 0.5).astype(int)

    # Calculate metrics
    accuracy = accuracy_score(y_test, y_test_pred)
    precision = precision_score(y_test, y_test_pred)
    recall = recall_score(y_test, y_test_pred)
    f1 = f1_score(y_test, y_test_pred)

    # ROC curve
    fpr, tpr, thresholds = roc_curve(y_test, y_test_pred_proba)
    roc_auc = auc(fpr, tpr)

    # Confusion matrix
    tn, fp, fn, tp = confusion_matrix(y_test, y_test_pred).ravel()
    specificity = tn / (tn + fp)

    # Create metrics dict (same format as other models)
    pickle_metrics = {
        'model_version': f"{token}_{namestring}",
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1_score': f1,
        'specificity': specificity,
        'roc_auc': roc_auc,
        'y_test': y_test,
        'y_train_pred': y_train_pred,
        'y_test_pred': y_test_pred,
        'y_test_pred_proba': y_test_pred_proba,
        'display_labels': [0, 1],
        'confusion_matrix': {'tn': tn, 'fp': fp, 'fn': fn, 'tp': tp},
        'roc_curve': {'fpr': fpr, 'tpr': tpr, 'thresholds': thresholds},
        'best_params': params,
        'training_history': history.history,
        'shap_data': {
            'model': model,
            'preprocessor': preprocessor,
            'X_train_processed': X_train_processed,
            'X_test_processed': X_test_processed,
            'feature_names': preprocessor.get_feature_names_out(),
            'original_feature_names': original_feature_names
        }
    }

    # Save to file
    filename = f"../models/fits_pickle_{token}_{namestring}.pkl"
    with open(filename, "wb") as file:
        pickle.dump(pickle_metrics, file)

    if console_out:
        # Print summary (same as your existing function)
        print(f"Metrics saved to {filename}")
        print(f'Accuracy:    {accuracy:.4f}')
        print(f'Precision:   {precision:.4f}')
        print(f'Recall:      {recall:.4f}')
        print(f'F1-Score:    {f1:.4f}')
        print(f'Specificity: {specificity:.4f}')
        print(f'ROC AUC:     {roc_auc:.4f}')

        # Plot confusion matrix
        cm = confusion_matrix(y_test, y_test_pred)
        disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=[0, 1])
        disp.plot(cmap=plt.cm.Blues)
        plt.title(f"Confusion Matrix - {namestring}")
        plt.show()

    return pickle_metrics

## Neural Network Architecture Builder

**Dynamic Model Construction**: The `build_model()` function creates neural networks with configurable hyperparameters:

**Architecture Parameters:**
- `hidden_layers`: Number of dense layers for network depth
- `neurons_per_layer`: Units per layer controlling model capacity
- `dropout_rate`: Regularization strength preventing overfitting
- `learning_rate`: Adam optimizer step size

**Layer Construction:**
- Dense layers: Fully connected with ReLU activation for non-linear transformations
- BatchNormalization: Stabilizes training and accelerates convergence
- Dropout: Randomly zeroes neurons during training for regularization
- Output layer: Single sigmoid unit for binary classification

**Flexible Design:** The function accepts either:
- Search space dictionary: For Keras Tuner hyperparameter optimization
- Default values: For fallback when no search space provided

This modular approach enables both automated architecture search and manual model construction with consistent layer patterns optimized for binary classification.


In [12]:
def build_model(hp, search_space=None):
    """Build model with configurable search space and fallback defaults"""
    model = keras.Sequential()
    model.add(keras.Input(shape=(X_train_processed.shape[1],)))

      # Use search space if provided, otherwise use single default values
    if search_space:
        # Use Int/Float ranges for Bayesian optimization
        hidden_layers = hp.Int('hidden_layers',
                            min_value=search_space['hidden_layers'][0],
                            max_value=search_space['hidden_layers'][1],
                            step=1)
        neurons_per_layer = hp.Int('neurons_per_layer',
                                min_value=search_space['neurons_per_layer'][0],
                                max_value=search_space['neurons_per_layer'][1],
                                step=search_space['neurons_per_layer'][2])
        dropout_rate = hp.Float('dropout_rate',
                                min_value=search_space['dropout_rate'][0],
                                max_value=search_space['dropout_rate'][1])
        learning_rate = hp.Float('learning_rate',
                                min_value=search_space['learning_rate'][0],
                                max_value=search_space['learning_rate'][1],
                                sampling='log')  # Log sampling for learning rate
        
    else:
        # Fallback defaults
        hidden_layers = 2
        neurons_per_layer = 128
        dropout_rate = 0.2
        learning_rate = 1e-3

    # Build the layers
    for i in range(hidden_layers):
        model.add(layers.Dense(neurons_per_layer, activation='relu'))
        model.add(layers.BatchNormalization())
        model.add(layers.Dropout(dropout_rate))

    # Output layer
    model.add(layers.Dense(1, activation='sigmoid'))

    # Compile
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
        loss='binary_crossentropy',
        metrics=['AUC']
    )
    return model

### Modeling execution: 

In [13]:
# Fill all categorical NaNs

categorical_cols_with_nans = [
    "primary_group",
    "primary_subgroup",
    "secondary_group",
    "secondary_subgroup",
    "secondary2_group",
    "secondary2_subgroup",
]

for col in categorical_cols_with_nans:
    nn[col] = nn[col].fillna("Missing")

In [14]:
X = nn.drop(["readmitted"], axis=1)
y = nn["readmitted"]

In [15]:
# Training features to include
exclude_features = ["patient_nbr", "encounter_id", "readmitted"]
numeric_features = [
    col
    for col in X.columns
    if col not in exclude_features and pd.api.types.is_numeric_dtype(X[col])
]
boolean_features = [
    col for col in X.columns if col not in exclude_features and X[col].dtype == "bool"
]
object_features = [
    col for col in X.columns if col not in exclude_features and X[col].dtype == "object"
]
categorical_features = [
    X.columns.get_loc(col) for col in object_features + boolean_features
]

In [16]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=randy, stratify=y
)

## OrdinalEncoder for Neural Networks

**Why Not OneHotEncoder**: OneHotEncoder would expand ~30 categorical features into hundreds of sparse binary columns, creating dimensionality and memory issues.

**Neural Network Advantage**: Unlike linear models, neural networks can learn optimal categorical representations through hidden layers, effectively creating embeddings from ordinal-encoded inputs.

**Practical Benefits**: Maintains computational efficiency and consistency with other model pipelines while leveraging the network's representational power to handle ordinal encoding appropriately.

In [17]:
# Neural network preprocessor - MUST scale numeric features
preprocessor_nn = ColumnTransformer([
    ('num', StandardScaler(), numeric_features),
    ('cat', OrdinalEncoder(handle_unknown="use_encoded_value",
                           unknown_value=-1,
                           ),
    object_features + boolean_features)
], remainder='drop')


In [18]:
# Final check of NaN's before training
print("Checking for NaNs in categorical columns...")
categorical_cols = object_features + boolean_features
nan_check = {}

for col in categorical_cols:
    nan_count = X[col].isna().sum()
    if nan_count > 0:
        nan_check[col] = nan_count

if nan_check:
    print("STOP! Still have NaNs:")
    for col, count in nan_check.items():
        print(f"  {col}: {count} NaNs")
    print("\nFill these before training!")
else:
    print("No NaNs found in categorical columns - safe to train!")

Checking for NaNs in categorical columns...
No NaNs found in categorical columns - safe to train!


In [19]:
# Apply preprocessing
X_train_processed = preprocessor_nn.fit_transform(X_train)
X_test_processed = preprocessor_nn.transform(X_test)

## Neural Network Search Space Definition

**Architecture Hyperparameters:**
- `hidden_layers` (2-5): Network depth - deeper networks can capture more complex patterns but risk overfitting
- `neurons_per_layer` (64, 128, 256, 512): Layer width - more neurons increase model capacity but computational cost
- `dropout_rate` (0.1-0.4): Regularization strength - higher values prevent overfitting but may underfitting
- `learning_rate` (1e-4 - 5e-3): Optimizer step size - balances convergence speed vs stability

**Search Strategy:**
Keras Tuner's Bayesian Optimization explores this 4-dimensional space efficiently, learning from previous trials to suggest promising architecture combinations. The discrete choices reflect common neural network best practices while covering a range from lightweight to complex models.

Total Combinations: 81 possible architectures (3×3×3×3), manageable for thorough exploration within reasonable compute time.

In [20]:
search_space = { #[min, max]
    'hidden_layers': [1, 8],  
    'neurons_per_layer': [64, 1024, 128], # step needed
    'dropout_rate': [0.01, 0.5], 
    'learning_rate': [1e-4, 1e-1],
} 

In [21]:
# Last opportunity to force CPU usage
tf.config.set_visible_devices([], 'GPU') ## faster on CPU on m4 mac mini

In [22]:
tuner = kt.BayesianOptimization(
    lambda hp: build_model(hp, search_space),
    objective='val_AUC',
    max_trials=60,
    directory='neural_net_tuning',
    project_name=f'{token}_nn_optimization',
    overwrite=False
)

In [23]:
tuner.search_space_summary()

Search space summary
Default search space size: 4
hidden_layers (Int)
{'default': None, 'conditions': [], 'min_value': 1, 'max_value': 8, 'step': 1, 'sampling': 'linear'}
neurons_per_layer (Int)
{'default': None, 'conditions': [], 'min_value': 64, 'max_value': 1024, 'step': 128, 'sampling': 'linear'}
dropout_rate (Float)
{'default': 0.01, 'conditions': [], 'min_value': 0.01, 'max_value': 0.5, 'step': None, 'sampling': 'linear'}
learning_rate (Float)
{'default': 0.0001, 'conditions': [], 'min_value': 0.0001, 'max_value': 0.1, 'step': None, 'sampling': 'log'}


## Training Callbacks

**Early Stopping** (`patience=3`): Prevents overfitting by stopping when validation accuracy plateaus for 3 epochs, restoring best weights.

**Learning Rate Reduction** (`patience=3, factor=0.5`): Halves learning rate when validation loss stagnates, enabling fine-tuned convergence.

These callbacks automate optimal training duration and learning rate scheduling across all architecture trials.

In [24]:
%%time
# Run the search

class_weights = {0: 1.0, 1: 1.17}

tuner.search(
    X_train_processed, y_train,
    epochs=60,
    batch_size=1024,
    validation_split=0.2,
    class_weight=class_weights,
    callbacks=[
        keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True),
        keras.callbacks.ReduceLROnPlateau(patience=3, factor=0.5)
    ],
    verbose=1
)

# Save 
zip_filename=f"../models/{token}_nn_tuning_run"
shutil.make_archive(zip_filename, 'zip', tuner.directory)
print(f"Model saved as {zip_filename}")

Trial 60 Complete [00h 00m 52s]
val_AUC: 0.6656467914581299

Best val_AUC So Far: 0.6836284399032593
Total elapsed time: 00h 33m 37s


Model saved as ../models/f11_nn_tuning_run
CPU times: user 3h 6min 42s, sys: 24min 38s, total: 3h 31min 21s
Wall time: 34min 12s


In [25]:
## In case of error - here's how to reload tuning results (uncomment to load)

# zip_filename=f"../models/{token}_nn_tuning_run"

# # Load the saved tuner from the expanded zip
# tuner_dir = "neural_net_tuning/f01_nn_optimization"
# tuner = kt.BayesianOptimization(
#     lambda hp: build_model(hp, search_space),
#     objective='val_accuracy',
#     max_trials=60,
#     directory='neural_net_tuning',
#     project_name='f01_nn_optimization',
#     overwrite=False
# )
# tuner.reload()

In [26]:
tuning_metadata = {
    'model_type': 'neural_network',
    'search_method': 'BayesianOptimization',
    'best_hyperparameters': tuner.get_best_hyperparameters()[0].values,
    'best_score': tuner.oracle.get_best_trials(num_trials=1)[0].score,
    'total_trials': len(tuner.oracle.trials),
    'archive_name': f"{zip_filename}.zip",
    'search_space': search_space
}

metadata_filename = f"../models/{token}_nn_tuning_metadata.pkl"
with open(metadata_filename, "wb") as file:
    pickle.dump(tuning_metadata, file)
print(f"Metadata saved as {metadata_filename}")

# Extract the best hyperparameters
best_hps = tuner.get_best_hyperparameters()[0]
print("\nBest hyperparameters:")
for param, value in best_hps.values.items():
    print(f"  {param}: {value}")

Metadata saved as ../models/f11_nn_tuning_metadata.pkl

Best hyperparameters:
  hidden_layers: 4
  neurons_per_layer: 64
  dropout_rate: 0.01
  learning_rate: 0.1


In [27]:
# Re-toggle GPU on?
# tf.config.set_visible_devices([], 'GPU') ## faster on CPU on m4 mac mini

In [28]:
%%time
# Build and train final model with best hyperparameters
final_model = build_model(best_hps, search_space)

# Train the final model (longer training for final model)
class_weights = {0: 1.0, 1: 1.17}

history = final_model.fit(
    X_train_processed, y_train,
    epochs=100,  # More epochs for final model
    batch_size=1024,
    validation_split=0.2,
    class_weight=class_weights,
    callbacks=[
        keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True),
        keras.callbacks.ReduceLROnPlateau(patience=5, factor=0.5)
    ],
    verbose=1
)

# Save 
zip_filename=f"../models/{token}_nn_final_run"
shutil.make_archive(zip_filename, 'zip', tuner.directory)
print(f"Model saved as {zip_filename}")

Epoch 1/100


[1m64/64[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - AUC: 0.6691 - loss: 0.6979 - val_AUC: 0.6663 - val_loss: 0.6600 - learning_rate: 0.0500


Model saved as ../models/f11_nn_final_run
CPU times: user 36.8 s, sys: 2.84 s, total: 39.6 s
Wall time: 37.8 s


The trained model is evaluated on the test set and key metrics are saved in the same format as other pipelines to enable cross-model comparison. 

In [29]:
# Load model from disk:
# zip_filename = "../models/f03_nn_final_run"
# shutil.unpack_archive(f"{zip_filename}.zip", "temp_neural_net_tuning")

# Load the tuner to get best hyperparameters
# tuner = kt.BayesianOptimization(
#     lambda hp: build_model(hp, search_space),
#     objective='val_AUC',
#     max_trials=60,
#     directory='/Users/cwaters/diabetes/notebooks/neural_net_tuning',
#     project_name='f03_nn_optimization',
#     overwrite=False
# )

# # Get the best hyperparameters and model
# best_hps = tuner.get_best_hyperparameters()[0]
# final_model = tuner.get_best_models(1)[0]
# best_trial = tuner.oracle.get_best_trials(1)[0]

# # The trained model should be in the best trial
# final_model = tuner.get_best_models(1)[0]

# # Create a history-like object from the trial metrics
# class HistoryFromTrial:
#     def __init__(self, trial_metrics):
#         # Extract the metrics from the trial
#         self.history = {}
#         # If trial has metrics, convert them to history format
#         if hasattr(trial_metrics, 'metrics'):
#             for metric_name, metric_history in trial_metrics.metrics.items():
#                 self.history[metric_name] = metric_history.get_history()
#         else:
#             # Fallback empty history
#             self.history = {
#                 'accuracy': [],
#                 'loss': [],
#                 'val_accuracy': [],
#                 'val_loss': []
#             }

# history = HistoryFromTrial(best_trial.metrics)

In [30]:
# Use existing evaluation function
evaluate_and_save_nn(
    model=final_model,
    history=history,
    params=best_hps.values,  # Pass the best hyperparameters
    namestring='Neural_Network_KerasTuner',
    token=token,
    X_train_processed=X_train_processed,
    X_test_processed=X_test_processed,
    y_train=y_train,
    y_test=y_test,
    preprocessor = preprocessor_nn, 
    original_feature_names = list(X_train.columns),
)

[1m   1/2545[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1:31[0m 36ms/step

{'model_version': 'f11_Neural_Network_KerasTuner',
 'accuracy': 0.6232005109811821,
 'precision': 0.6160520607375272,
 'recall': 0.48438332800341116,
 'f1_score': 0.5423405144118876,
 'specificity': np.float64(0.7418884433102443),
 'roc_auc': np.float64(0.659714363193994),
 'y_test': 27827    1
 84192    0
 60829    0
 84663    1
 72262    1
         ..
 46646    1
 64740    0
 9515     1
 89761    1
 16019    0
 Name: readmitted, Length: 20353, dtype: int8,
 'y_train_pred': array([1, 0, 1, ..., 0, 0, 1]),
 'y_test_pred': array([1, 1, 1, ..., 1, 1, 0]),
 'y_test_pred_proba': array([0.50016296, 0.68076545, 0.56447834, ..., 0.54499114, 0.7407975 ,
        0.31072548], dtype=float32),
 'display_labels': [0, 1],
 'confusion_matrix': {'tn': np.int64(8140),
  'fp': np.int64(2832),
  'fn': np.int64(4837),
  'tp': np.int64(4544)},
 'roc_curve': {'fpr': array([0.00000000e+00, 0.00000000e+00, 9.11410864e-05, ...,
         9.99362012e-01, 9.99362012e-01, 1.00000000e+00]),
  'tpr': array([0.000000