Part 2: AI Development Workflow — Hospital Readmission Prediction

Scenario: A hospital wants an AI system to predict patient readmission risk within 30 days of discharge.

Step 1: Problem Scope

1️⃣ Problem definition

Predict which patients are at risk of readmission within 30 days after discharge.

Goal: Help clinicians take preventive actions to reduce avoidable readmissions.

2️⃣ Objectives

Accurately predict readmission risk.

Reduce 30-day readmissions and optimize post-discharge care.

Provide interpretable outputs (risk factors) for clinicians.

3️⃣ Stakeholders

Clinical staff: Doctors, nurses — use predictions to plan follow-ups.

Hospital administrators: Measure readmission reduction and allocate resources.

Data/IT team: Build and deploy the model securely.

Patients: Indirect stakeholders affected by interventions.

4️⃣ Success criteria / KPIs

Model performance: AUC-ROC ≥ 0.75, recall ≥ 0.7

Operational: Integration with hospital EHR, prediction latency <2s

Clinical impact: ≥10% reduction in avoidable readmissions for target patients

Step 2: Data Strategy (10 points)
1️⃣ Proposed data sources

EHR (Electronic Health Records): admission/discharge dates, diagnosis codes (ICD), procedures.

Demographics: age, gender, race/ethnicity, insurance type.

Clinical measurements: vitals (BP, HR, Temp), labs (blood tests, glucose, etc.).

Medications: inpatient meds, discharge meds.

Care logistics: follow-up appointment scheduled, discharge destination (home/facility).

Hospital utilization: prior admissions, length of stay, ER visits.

2️⃣ Ethical concerns

Patient privacy: Must comply with HIPAA or local regulations; de-identify or pseudonymize PHI.

Bias and fairness: Historical data may reflect unequal care; monitor model performance across groups (age, gender, race).

3️⃣ Preprocessing pipeline

We’ll create a pipeline in scikit-learn to clean and prepare data for modeling.

Steps:

Handle missing values (median for numeric, most frequent for categorical).

Encode categorical features (one-hot).

Scale numeric features (StandardScaler).

Train/test split (stratified for target balance).

Feature engineering (optional, e.g., num_prev_admissions, avg_length_of_stay, followup_scheduled).



4️⃣ Example Dataset

Here’s a small mock dataset to simulate patient data:

In [2]:
import pandas as pd
import numpy as np

# Example dataset
data = {
    'age': [65, 54, 72, 43, np.nan],
    'gender': ['Male', 'Female', 'Female', 'Male', 'Female'],
    'num_prev_admissions': [2, 1, 3, 0, 1],
    'avg_length_of_stay': [5.2, 3.4, 7.1, 2.8, 4.5],
    'followup_scheduled': [1, 0, 1, 0, 1],
    'discharge_to_facility': [0, 1, 0, 0, 1],
    'readmitted_within_30d': [1, 0, 1, 0, 0]  # Target variable
}

df = pd.DataFrame(data)


5️⃣ Preprocessing Pipeline

In [3]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer

# Split features and target
X = df.drop('readmitted_within_30d', axis=1)
y = df['readmitted_within_30d']

# Define numeric and categorical features
numeric_features = ['age', 'num_prev_admissions', 'avg_length_of_stay']
categorical_features = ['gender', 'followup_scheduled', 'discharge_to_facility']

# Numeric preprocessing: impute missing values & scale
numeric_transformer = Pipeline([
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])

# Categorical preprocessing: impute & one-hot encode
categorical_transformer = Pipeline([
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])

# Combine into a single ColumnTransformer
preprocessor = ColumnTransformer([
    ('num', numeric_transformer, numeric_features),
    ('cat', categorical_transformer, categorical_features)
])


X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Apply preprocessing pipeline
X_train_prep = preprocessor.fit_transform(X_train)
X_test_prep = preprocessor.transform(X_test)

print("Preprocessing complete! X_train shape:", X_train_prep.shape)


Preprocessing complete! X_train shape: (4, 9)


Step 3: Model Development 
1️⃣ Model selection & justification

Chosen model: Random Forest Classifier

Why Random Forest?

Can handle numeric and categorical features well.

Captures nonlinear relationships.

Robust to overfitting compared to single decision trees.

Provides feature importance, useful for explaining predictions to clinicians.

Works well with small-to-medium tabular datasets typical in hospitals.

2️⃣ Train the Random Forest model

In [4]:
from sklearn.ensemble import RandomForestClassifier

# Initialize Random Forest
rf_model = RandomForestClassifier(
    n_estimators=100,          # number of trees
    random_state=42,           # for reproducibility
    class_weight='balanced'    # handle class imbalance
)

# Train model on preprocessed training data
rf_model.fit(X_train_prep, y_train)


3️⃣ Make predictions

In [5]:
# Predict on the test set
y_pred = rf_model.predict(X_test_prep)

# For demonstration, using small example dataset
print("Predictions:", y_pred)


Predictions: [0]


4️⃣ Confusion matrix

In [6]:
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", cm)


Confusion Matrix:
 [[1]]




5️⃣ Precision and Recall

In [7]:
from sklearn.metrics import precision_score, recall_score

precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)

print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")


Precision: 0.00
Recall: 0.00


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Explanation:

Precision: Of all patients predicted to be readmitted, how many actually were.

Recall: Of all patients who were actually readmitted, how many were correctly predicted.

Example calculation using the table above:

TP = 3, FP = 0 → Precision = 3 / (3+0) = 1.0

TP = 3, FN = 1 → Recall = 3 / (3+1) = 0.75

Step 4: Evaluation & Deployment
1️⃣ Evaluation Metrics

We will select two key metrics for evaluating our Random Forest model:

1. Recall (Sensitivity)

Definition: Percentage of actual readmissions correctly predicted by the model.

Relevance: In a hospital setting, missing a high-risk patient is costly — we want to catch as many true readmissions as possible to intervene early.

Example: If recall = 0.75, 75% of patients who are readmitted are correctly flagged.

2. Precision

Definition: Percentage of predicted readmissions that were actually readmitted.

Relevance: Helps reduce false alarms, so hospital staff don’t waste resources on unnecessary interventions.

Example: Precision = 0.8 means 80% of flagged patients actually need attention.

Optional: You can also track F1-score (harmonic mean of precision and recall) for balanced evaluation.

2️⃣ Concept Drift & Monitoring

Concept Drift:

Concept drift occurs when the relationship between input features and the target variable changes over time.

Example: Changes in hospital protocols, patient demographics, or new treatments can alter readmission patterns.

Monitoring Concept Drift Post-Deployment:

Continuously compare incoming patient data distribution with training data.

Monitor model performance metrics over time (e.g., rolling recall, precision, AUC-ROC).

Set thresholds to trigger retraining if metrics degrade.

Optionally, use automated drift detection tools like Evidently AI or scikit-multiflow.

3️⃣ Technical Deployment Challenge

Scalability:

Deploying the model for thousands of patients daily may stress hospital IT systems.

Solution: Use a lightweight API (Flask/FastAPI), containerized with Docker, and deploy on a hospital server or cloud platform with auto-scaling.

Preprocessing and model inference should be optimized for low latency (<2 seconds per patient) to integrate smoothly into EHR workflows.

Ensuring Compliance with Healthcare Regulations (HIPAA)

To ensure the AI system complies with healthcare privacy and security laws such as HIPAA (in the U.S.) or equivalent local regulations:

1️⃣ Data Privacy and Security

De-identify patient data before training (remove names, IDs, exact dates, contact info).

Encrypt data at rest and in transit (use SSL/TLS for API communication).

Limit access to data through role-based permissions — only authorized personnel can view PHI.

2️⃣ Data Handling Policies

Maintain audit logs of all data access and model predictions.

Sign Business Associate Agreements (BAAs) with any cloud provider handling medical data.

Store model outputs securely within the hospital’s EHR system, not external servers.

3️⃣ Model Transparency and Accountability

Document model behavior, version, and training data lineage.

Provide explainable outputs to clinicians (e.g., top contributing features for each prediction).

Establish a review process to retrain and validate the model periodically.

Step 5: Optimization
Problem: Random Forest can overfit if trees are too deep or if the model learns noise from small datasets.

Proposed method to reduce overfitting:

Limit tree depth (max_depth) and number of features (max_features):

In [8]:
rf_model = RandomForestClassifier(
    n_estimators=100,
    max_depth=5,          # limit depth of trees
    max_features='sqrt',  # use subset of features at each split
    random_state=42,
    class_weight='balanced'
)


Additional options:

Increase training data if possible.

Use cross-validation to tune hyperparameters.

Prune unnecessary features or remove noisy variables.

Why it works:

Limiting depth and features prevents trees from memorizing the training set.

Cross-validation ensures the model generalizes to unseen patients.