## Notebook Objectives

1. Consolidate findings across utilization, LOS, cost, equity, and performance
2. Identify system-level bottlenecks and inefficiencies
3. Translate analytics into public health and policy insights
4. Provide evidence-based recommendations

In [None]:
import pandas as pd
import numpy as np

from pathlib import Path

In [None]:
DATA_PATH = Path("../data/processed/hospital_inpatient_discharges_cleaned.csv")
df = pd.read_csv(DATA_PATH)

In [None]:
summary_metrics = {
    "total_discharges": len(df),
    "avg_length_of_stay": df["length_of_stay"].mean(),
    "median_length_of_stay": df["length_of_stay"].median(),
    "avg_total_charges": df["total_charges"].mean(),
    "median_total_charges": df["total_charges"].median(),
    "total_system_charges": df["total_charges"].sum()
}

pd.Series(summary_metrics)

In [None]:
df["prolonged_los"] = df["length_of_stay"] >= df["length_of_stay"].quantile(0.90)

In [None]:
prolonged_share = df["prolonged_los"].mean()
prolonged_share

In [None]:
df["high_cost_case"] = df["total_charges"] >= df["total_charges"].quantile(0.95)
df["high_cost_case"].mean()

In [None]:
if "age" in df.columns:
    df.groupby(pd.cut(df["age"], [0,17,35,50,65,80,120]))[
        ["length_of_stay", "total_charges"]
    ].mean()

In [None]:
if "gender" in df.columns:
    df.groupby("gender")[["length_of_stay", "total_charges"]].mean()

In [None]:
hospital_summary = (
    df.groupby("hospital_name")
      .agg(
          discharges=("hospital_name", "count"),
          avg_los=("length_of_stay", "mean"),
          avg_charges=("total_charges", "mean")
      )
)

In [None]:
hospital_summary.describe()

### Key Modeling Insights

* LOS and total charges are **highly predictable** using routine administrative data
* Non-linear ML models significantly outperform linear baselines
* Predictive models enable:

  * Early identification of high-risk admissions
  * Proactive discharge planning
  * Financial forecasting and budgeting