# 🧠 Personalized Rehab Scoring with V2 Hybrid Model

This notebook performs **real-time simulation of recovery prediction** using trained sub-models and the Hybrid Meta-Model V2. Unlike V1, this version uses true recovery outcomes (not synthetic targets), better cleaned data, and more reliable percentile and score calibration.

This script loads saved models, prepares new patient data, predicts recovery outcomes, and generates a report to guide post-operative planning.


### 📦 Step 1: Load Dependencies

This step loads essential Python libraries:
- `pandas`, `numpy`: For data processing
- `joblib`: To load trained models (.pkl files)
- `os`: For file handling if needed

These libraries power the **model inference engine** that simulates real patient evaluations.


In [1]:
# rehab_predictor

import pandas as pd
import numpy as np
import shap
import joblib
import matplotlib.pyplot as plt

from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRegressor

  from .autonotebook import tqdm as notebook_tqdm


### 🧠 Step 2: Load Trained Cardiac, Mobility, and Meta Models

We load three previously trained models from disk:
- `cardiac_model.pkl`: Predicts recovery from ECG + treadmill features
- `mobility_model.pkl`: Predicts mobility readiness from gait features
- `meta_model_v2.pkl`: Combines the two into a final calibrated recovery score

📌 This mirrors a deployed hospital server loading saved models during intake or review.


In [2]:
# Load Models
cardiac_model = joblib.load(r"D:\AI_finaltrial\project\models\cardiac_rf_model.pkl")
mobility_model = joblib.load(r"D:\AI_finaltrial\project\models\mobility_xgb_model.pkl")
meta_model = joblib.load(r"D:\AI_finaltrial\project\models\hybrid_meta_model_v2.pkl")

# Load feature templates
ecg_features = pd.read_csv(r"D:\AI_finaltrial\project\results\ecg_features1.csv").rename(columns={"Patient_ID": "Subject_ID"})
treadmill = pd.read_csv(r"D:\AI_finaltrial\project\data\treadmildata\treadmill_test_measure.csv")
wearable_info = pd.read_csv(r"D:\AI_finaltrial\project\data\wearabledata\Wearable_subject-info.csv")
wearable_avail = pd.read_csv(r"D:\AI_finaltrial\project\data\wearabledata\Wearable_test-availability.csv")

### 🏃 Step 3: Extract Cardiac Features from Treadmill Data

This function processes test logs to extract:
- VO₂ max (peak oxygen consumption)
- HR Recovery at 1-minute post peak
- VE/VO₂ ratio

These are standard cardiopulmonary recovery markers used in **rehab risk stratification**.

📌 *Clinical Utility*: Enables the model to interpret **real physiological performance** instead of just raw waveform patterns.


In [3]:
# Compute treadmill metrics
def compute_recovery_metrics(df):
    grouped = df.groupby("ID_test")
    results = []
    for test_id, group in grouped:
        group = group.sort_values(by="time").dropna(subset=["VO2", "HR", "VE"])
        if group.empty: continue
        sid = group["ID"].iloc[0]
        try:
            max_vo2 = group["VO2"].max()
            max_hr = group["HR"].max()
            t_max = group.loc[group["HR"].idxmax()]["time"]
            hr_rec = max_hr - group[group["time"] >= t_max + 60].iloc[0]["HR"]
            ve_vo2 = group.loc[group["VO2"].idxmax()]["VE"] / group["VO2"].max()
        except:
            continue
        results.append({"Subject_ID": sid, "VO2_max": max_vo2, "HR_recovery_1min": hr_rec, "VE_VO2_ratio": ve_vo2})
    return pd.DataFrame(results)

### 🦿 Step 4: Clean Mobility Sensor Features

Wearable test exports can contain `"value ± SD"` style entries. This utility:
- Parses and extracts clean numeric values
- Converts columns into usable features for model input

📌 *Outcome*: Results in clean input for the mobility model — essential to avoid garbage-in, garbage-out effects during prediction.


In [4]:
# Clean gait columns
def clean_numeric_columns(df):
    for col in df.columns:
        if df[col].dtype == 'object' and df[col].str.contains("±").any():
            df[col] = df[col].str.extract(r'([-+]?[0-9]*\.?[0-9]+)').astype(float)
    return df

# Base data
recovery_metrics = compute_recovery_metrics(treadmill)
cardiac_df = pd.merge(ecg_features, recovery_metrics, on="Subject_ID").drop(columns=["Subject_ID"], errors="ignore")

wearable_merged = pd.merge(wearable_info, wearable_avail, on="Patient ID")
wearable_cleaned = clean_numeric_columns(wearable_merged)
wearable_cleaned = wearable_cleaned.select_dtypes(include=[np.number]).drop(columns=["Patient ID"], errors="ignore").fillna(0)

X_card_base = cardiac_df[cardiac_model.feature_names_in_]
X_mob_base = wearable_cleaned[mobility_model.feature_names_in_]

### 📂 Step 5: Load Test Patient Input Data

This loads 3 separate patient data inputs:
- `rehab_test_cases.csv`: Patient metadata
- `ecg_features1.csv`: Processed cardiac features
- `mhealth_features_summary.csv`: Wearable-derived gait features

📌 *Purpose*: These files are passed through the prediction pipeline to simulate how new patients would be scored.

---

### 🧮 Step 6: Predict Recovery Outcomes

In this critical step:
1. We pass cleaned treadmill data to the **cardiac model**
2. Wearable gait data goes into the **mobility model**
3. Their outputs are combined into `X_meta`, a 2-feature DataFrame
4. `meta_model_v2` processes that to generate the **final recovery score**

📌 This is exactly how the model would score patients in a hospital by taking two streams of evidence and merging them into a single, interpretable metric.

---

### 📏 Step 7: Translate Model Score into Estimated Recovery Days

We convert the final hybrid score to a **predicted recovery duration** (in days) using the formula:
```python
Recovery Days = (3.0 - Score) * 30 + 60


In [5]:
# Load CSV of patients to simulate
patients_df = pd.read_csv(r"D:\AI_finaltrial\project\results\rehab_test_cases.csv")

all_reports = []

for i, row in patients_df.iterrows():
    ecg_row = X_card_base.sample(1).copy()
    mob_row = X_mob_base.sample(1).copy()

    # Inject test data
    ecg_row["VO2_max"] = row["VO2_max"]
    ecg_row["HR_recovery_1min"] = row["HR_recovery_1min"]
    ecg_row["VE_VO2_ratio"] = row["VE_VO2_ratio"]

    mob_row["Velocity, km/h"] = row["Velocity_kmph"]
    mob_row["Cadence, steps/min"] = row["Cadence"]
    mob_row["Stride time, s"] = row["Stride_time"]

    # Predict
    cardiac_score = float(cardiac_model.predict(ecg_row)[0])
    mobility_score = float(mobility_model.predict(mob_row)[0])
    final_score = float(meta_model.predict(pd.DataFrame([{
        'Cardiac_Score': cardiac_score,
        'Mobility_Score': mobility_score
    }]))[0])

    final_score = min(max(final_score, 0), 3)
    recovery_days = int(180 - final_score * 50)

    # SHAP analysis
    explainer = shap.Explainer(meta_model)
    shap_values = explainer(pd.DataFrame([{"Cardiac_Score": cardiac_score, "Mobility_Score": mobility_score}]))
    shap.plots.bar(shap_values, max_display=2, show=False)
    plt.title(f"SHAP Impact for Patient {i+1}")
    plt.tight_layout()
    plt.savefig(f"patient_{i+1}_shap.png")
    plt.close()

    # Recommendations
    suggestions = []
    if cardiac_score < 40:
        suggestions.append("Increase supervised cardio rehab")
    if mobility_score < 100:
        suggestions.append("Continue mobility strengthening")
    if cardiac_score > 80 and mobility_score > 140:
        suggestions.append("Ready for transition to independent rehab")
    if mobility_score > 120 and cardiac_score < 50:
        suggestions.append("High mobility, monitor cardiac stress tolerance")

    all_reports.append({
        "Patient_ID": i+1,
        "VO2_max": row["VO2_max"],
        "HR_recovery_1min": row["HR_recovery_1min"],
        "VE_VO2_ratio": row["VE_VO2_ratio"],
        "Velocity": row["Velocity_kmph"],
        "Cadence": row["Cadence"],
        "Stride_time": row["Stride_time"],
        "Cardiac_Score": round(cardiac_score, 2),
        "Mobility_Score": round(mobility_score, 2),
        "Final_Score": round(final_score, 2),
        "Recovery_Days": recovery_days,
        "Suggestions": "; ".join(suggestions)
    })


### 📝 Step 8: Generate Output Report

We compile the results into a structured table containing:
- Submodel scores (cardiac + mobility)
- Final recovery prediction
- Translated recovery estimate (days)

Saved as: `rehab_prediction_v2_report.csv`

📌 *Impact*: This file is ready for clinician facing tools like:

- Electronic Medical Record (EMR) integration
- Rehab scheduling dashboards
- Post-discharge planning software


In [6]:
# Export summary report
report_df = pd.DataFrame(all_reports)
report_df.to_csv(r"D:\AI_finaltrial\finalmodels\rehab_batch_report_using_V2.csv", index=False)
print("✔️ All predictions complete. Output saved as 'rehab_batch_report.csv'")


✔️ All predictions complete. Output saved as 'rehab_batch_report.csv'
