# Orthopedic No-Show Prevention Tool
This notebook demonstrates a thin-slice of the predictive and reporting tool requested by the orthopedic service line. It is built on synthetic data for illustration, yet every step mirrors what we will deploy once real encounter-level data (with `SCH_STATE_DISPLAY`, `NO_SHOW_FLAG`, reminders, etc.) becomes available.

### Context & guiding principles
- Focus on *at-scheduling* risk so staff can intervene days before the appointment.
- Favor interpretable baselines (logistic regression + permutation importance) and only layer tree ensembles when they offer material lift.
- Keep artifacts clinic-friendly: individual patient call lists and aggregate day-of-clinic drill downs.
- All results below use `Synthetic_Patient_Dataset_with_Target.csv` (n=25). Treat numbers as placeholders; the code scaffolding is what matters.

In [None]:

import warnings
warnings.filterwarnings('ignore')

from pathlib import Path
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import (
    classification_report,
    roc_auc_score,
    average_precision_score,
    roc_curve,
    precision_recall_curve
)
from sklearn.inspection import permutation_importance
from sklearn.base import clone

sns.set_theme(style='ticks', palette='deep')
pd.set_option('display.max_columns', 50)


## 1. Load synthetic encounter-level data
In production the extract will land from HealtheIntent / Encounter tables with the critical fields listed in the implementation memo. For prototyping we reuse the evidence-based synthetic file generated earlier in the repo.

In [None]:

data_path = Path('data/Synthetic_Patient_Dataset_with_Target.csv')
df = pd.read_csv(
    data_path,
    parse_dates=['BIRTH_DT_TM', 'APPT_START_DATE', 'APPOINTMENT_SCHEDULED_DATE']
)
print(f"Records: {len(df)} | Columns: {df.shape[1]}")
df.head()


In [None]:

fig, ax = plt.subplots(figsize=(6, 4))
target_counts = df['APPOINTMENT_STATUS'].value_counts()
ax.bar(target_counts.index, target_counts.values, color=['#2ca02c', '#d62728', '#1f77b4'])
ax.set_title('Observed appointment outcomes (synthetic)')
ax.set_ylabel('Count of appointments')
ax.set_xticklabels(target_counts.index, rotation=45, ha='right')
plt.tight_layout()
plt.show()

print('No-show + cancellation rate:', f"{df['no_show_binary'].mean():.2%}")


## 2. Feature engineering blueprint
We engineer only features that would exist at scheduling time so the model stays actionable. Additional elements (prior no-shows, reminder logs, ADI score) will be merged once supplied by the data warehouse team.

In [None]:

df_model = df.copy()

df_model['appointment_month'] = df_model['APPT_START_DATE'].dt.month
df_model['appointment_week'] = df_model['APPT_START_DATE'].dt.isocalendar().week.astype(int)
df_model['appointment_dayofweek_name'] = df_model['APPT_START_DATE'].dt.day_name()
df_model['lead_time_bucket'] = pd.cut(
    df_model['lead_time_days'],
    bins=[0, 7, 14, 21, 60],
    labels=['≤7d', '8-14d', '15-21d', '>21d'],
    include_lowest=True
).astype(str)

numeric_features = [
    'patient_age',
    'lead_time_days',
    'appointment_month',
    'appointment_week',
    'is_new_patient',
    'is_follow_up'
]

categorical_features = [
    'PRIMARY_INSURANCE',
    'PRIMARY_PLAN_TYPE',
    'ATTENDING_SPECIALTY',
    'SCH_REASON_DISPLAY',
    'LOC_FACILITY_DISPLAY',
    'age_group',
    'SEX',
    'RACE',
    'ETHNIC_GROUP',
    'appointment_dayofweek_name',
    'lead_time_bucket'
]

feature_columns = numeric_features + categorical_features
X = df_model[feature_columns]
y = df_model['no_show_binary']

X.head()


## 3. Train/test split and candidate models
We benchmark an interpretable logistic regression (balanced class weights) against a Gradient Boosting Classifier. With real data we will add cross-site validation, temporal splits, and potentially XGBoost/LightGBM if they materially outperform while maintaining explainability via SHAP.

In [None]:

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.3,
    random_state=42,
    stratify=y
)

preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numeric_features),
        ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)
    ]
)

models = {
    'Logistic Regression (balanced)': Pipeline([
        ('preprocess', preprocessor),
        ('clf', LogisticRegression(max_iter=1000, class_weight='balanced', solver='liblinear'))
    ]),
    'Gradient Boosting': Pipeline([
        ('preprocess', preprocessor),
        ('clf', GradientBoostingClassifier(random_state=42))
    ])
}

results = []
fitted_models = {}
roc_curves = {}
pr_curves = {}

for name, pipe in models.items():
    pipe.fit(X_train, y_train)
    fitted_models[name] = pipe

    probs = pipe.predict_proba(X_test)[:, 1]
    preds = pipe.predict(X_test)

    roc_auc = roc_auc_score(y_test, probs)
    pr_auc = average_precision_score(y_test, probs)
    fpr, tpr, _ = roc_curve(y_test, probs)
    precision, recall, _ = precision_recall_curve(y_test, probs)

    roc_curves[name] = (fpr, tpr)
    pr_curves[name] = (recall, precision)

    results.append({'model': name, 'roc_auc': roc_auc, 'pr_auc': pr_auc})

    print(f"
{name}")
    print(f"ROC AUC: {roc_auc:.3f} | PR AUC: {pr_auc:.3f}")
    print(classification_report(y_test, preds, target_names=['Kept', 'No-Show'], zero_division=0))

results_df = pd.DataFrame(results).sort_values(by='roc_auc', ascending=False)
results_df


In [None]:

fig, axes = plt.subplots(1, 2, figsize=(12, 4))
for name, (fpr, tpr) in roc_curves.items():
    axes[0].plot(fpr, tpr, label=f"{name}")
axes[0].plot([0, 1], [0, 1], linestyle='--', color='gray')
axes[0].set_title('ROC curves (synthetic split)')
axes[0].set_xlabel('False Positive Rate')
axes[0].set_ylabel('True Positive Rate')
axes[0].legend()

for name, (recall, precision) in pr_curves.items():
    axes[1].plot(recall, precision, label=f"{name}")
axes[1].set_title('Precision-Recall curves')
axes[1].set_xlabel('Recall')
axes[1].set_ylabel('Precision')
axes[1].legend()

plt.tight_layout()
plt.show()


## 4. Interpretability guardrails
The clinical teams asked for an interpretable solution. We therefore keep Logistic Regression as the reference model and use both coefficients and permutation importance to surface the drivers. With real data we will complement this view with SHAP summary plots and fairness dashboards (e.g., compare calibration across zip codes / payer).

In [None]:

log_model = fitted_models['Logistic Regression (balanced)']
feature_names = log_model.named_steps['preprocess'].get_feature_names_out()
coefficients = log_model.named_steps['clf'].coef_[0]
coef_df = pd.DataFrame({
    'feature': feature_names,
    'coefficient': coefficients,
    'abs_coefficient': np.abs(coefficients)
}).sort_values('abs_coefficient', ascending=False).head(10)

print('Top logistic coefficients (positive => higher no-show odds)')
coef_df[['feature', 'coefficient']]


In [None]:

perm = permutation_importance(
    log_model,
    X_test,
    y_test,
    n_repeats=25,
    random_state=42
)
imp_df = pd.DataFrame({
    'feature': feature_names,
    'importance': perm.importances_mean
}).sort_values('importance', ascending=False).head(10)

fig, ax = plt.subplots(figsize=(6, 4))
sns.barplot(data=imp_df, x='importance', y='feature', ax=ax, color='#1f77b4')
ax.set_title('Permutation importance (logistic baseline)')
plt.tight_layout()
plt.show()


## 5. Operational reporting prototypes
Supervisors asked for clarity on *how* the predictions will be used. Below we mock the two core artifacts:
1. **Patient-level action list** – ranked roster for call center / care navigators.
2. **Clinic-day overview** – aggregate risk by clinic day and facility to guide overbooking or reminder intensity.

In [None]:

production_model = clone(log_model)
production_model.fit(X, y)

df_scores = df_model.copy()
df_scores['predicted_no_show_prob'] = production_model.predict_proba(X)[:, 1]
df_scores['risk_band'] = pd.cut(
    df_scores['predicted_no_show_prob'],
    bins=[0, 0.15, 0.3, 1],
    labels=['Low', 'Medium', 'High']
)

columns_to_show = [
    'APPT_START_DATE', 'PATIENT_PERSON_ID', 'PRIMARY_INSURANCE',
    'SCH_REASON_DISPLAY', 'lead_time_days', 'LOC_FACILITY_DISPLAY',
    'predicted_no_show_prob', 'risk_band'
]

individual_view = df_scores[columns_to_show].sort_values(
    by='predicted_no_show_prob', ascending=False
).head(10)
individual_view['predicted_no_show_prob'] = (individual_view['predicted_no_show_prob'] * 100).round(1)
individual_view.rename(columns={'predicted_no_show_prob': 'risk_percent'})


In [None]:

agg_day = df_scores.groupby('APPT_START_DATE').agg(
    appointments=('ENCNTR_ID', 'count'),
    avg_pred=('predicted_no_show_prob', 'mean'),
    high_risk=('predicted_no_show_prob', lambda x: (x >= 0.30).sum())
).reset_index()
agg_day['avg_pred_pct'] = (agg_day['avg_pred'] * 100).round(1)

agg_facility = df_scores.groupby('LOC_FACILITY_DISPLAY').agg(
    appointments=('ENCNTR_ID', 'count'),
    avg_pred=('predicted_no_show_prob', 'mean')
).reset_index().sort_values('avg_pred', ascending=False)
agg_facility['avg_pred_pct'] = (agg_facility['avg_pred'] * 100).round(1)

fig, axes = plt.subplots(1, 2, figsize=(14, 4))
axes[0].bar(agg_day['APPT_START_DATE'], agg_day['avg_pred_pct'], color='#ff7f0e')
axes[0].set_title('Daily mean predicted no-show %')
axes[0].set_ylabel('Predicted no-show %')
axes[0].set_xlabel('Appointment date')
axes[0].tick_params(axis='x', rotation=45)
for idx, row in agg_day.iterrows():
    axes[0].text(row['APPT_START_DATE'], row['avg_pred_pct'] + 0.2, f"n={row['appointments']}", ha='center', fontsize=8)

axes[1].barh(agg_facility['LOC_FACILITY_DISPLAY'], agg_facility['avg_pred_pct'], color='#1f77b4')
axes[1].set_title('Facility-level mean risk')
axes[1].set_xlabel('Predicted no-show %')

plt.tight_layout()
plt.show()

agg_day[['APPT_START_DATE', 'appointments', 'avg_pred_pct', 'high_risk']]


## 6. Next steps for production data
- Request the missing operational fields (scheduling timestamp, reminder channel, prior no-shows) highlighted in the implementation memo and rerun the notebook to validate lift.
- Expand evaluation to time-based splits (train on months 1-3, test on 4-6) to mimic go-live.
- Promote the notebook into a scheduled pipeline (Databricks or hospital ETL) that refreshes scores daily and writes them to a governance-reviewed table for Power BI/Tableau dashboards.
- Layer on fairness monitoring (e.g., compare calibration curves by insurance category or ADI quartile) before exposing the tool to frontline staff.