## **Notebook Objective**

This notebook translates analytical findings into:

* Risk segmentation frameworks
* Pricing and underwriting logic
* Preventive health incentives
* Policy and regulatory insights
* Executive-ready summaries

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
import joblib

pd.set_option("display.float_format", "{:.2f}".format)
plt.style.use("seaborn-v0_8")

In [None]:
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split

DATA_PATH = "../data/raw/insurance.csv"
PREPROCESSOR_PATH = "../data/processed/preprocessor.pkl"

df = pd.read_csv(DATA_PATH)
preprocessor = joblib.load(PREPROCESSOR_PATH)

In [None]:
# Feature definitions
numerical_features = [
    "age", "age_squared", "bmi", "children", "smoker_bmi_interaction"
]

categorical_features = [
    "sex", "region", "bmi_category"
]

X = df[numerical_features + categorical_features]
y = np.log1p(df["charges"])

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

In [None]:
# Champion model (from Notebook 04)
gb_pipeline = Pipeline(
    steps=[
        ("preprocessor", preprocessor),
        ("model", GradientBoostingRegressor(
            n_estimators=300,
            learning_rate=0.05,
            max_depth=3,
            random_state=42
        ))
    ]
)

gb_pipeline.fit(X_train, y_train)

In [None]:
df["predicted_log_charges"] = gb_pipeline.predict(X)
df["predicted_charges"] = np.expm1(df["predicted_log_charges"])

df[["charges", "predicted_charges"]].head()

In [None]:
df["risk_tier"] = pd.qcut(
    df["predicted_charges"],
    q=4,
    labels=["Low Risk", "Moderate Risk", "High Risk", "Very High Risk"]
)

df["risk_tier"].value_counts()

In [None]:
risk_profile = df.groupby("risk_tier").agg(
    avg_actual_cost=("charges", "mean"),
    avg_predicted_cost=("predicted_charges", "mean"),
    population=("charges", "count")
).reset_index()

risk_profile

In [None]:
sns.barplot(
    data=risk_profile,
    x="risk_tier",
    y="avg_predicted_cost"
)
plt.title("Average Predicted Cost by Risk Tier")
plt.show()

In [None]:
risk_driver_summary = df.groupby("risk_tier").agg(
    smoker_rate=("smoker_bmi_interaction", "mean"),
    avg_bmi=("bmi", "mean"),
    avg_age=("age", "mean")
).reset_index()

risk_driver_summary

In [None]:
premium_adjustments = pd.DataFrame({
    "Risk Tier": ["Low Risk", "Moderate Risk", "High Risk", "Very High Risk"],
    "Suggested Premium Multiplier": [0.85, 1.00, 1.30, 1.75],
    "Underwriting Notes": [
        "Eligible for wellness discounts",
        "Standard pricing",
        "Enhanced monitoring recommended",
        "Requires medical underwriting"
    ]
})

premium_adjustments

In [None]:
# Simulated scenario: smoker quits smoking
df_scenario = df.copy()
df_scenario["smoker_bmi_interaction"] = 0

df_scenario["scenario_predicted_cost"] = np.expm1(
    gb_pipeline.predict(df_scenario[numerical_features + categorical_features])
)

cost_reduction = (
    df["predicted_charges"] - df_scenario["scenario_predicted_cost"]
).mean()

cost_reduction

In [None]:
df.groupby("sex").agg(
    avg_predicted_cost=("predicted_charges", "mean"),
    avg_actual_cost=("charges", "mean")
)

In [None]:
executive_summary = {
    "Primary Cost Driver": "Smoking status",
    "Secondary Drivers": "BMI, Age",
    "Best Predictive Model": "Gradient Boosting Regressor",
    "Risk Segmentation": "4-tier quantile-based",
    "Preventable Cost Component": "High (behavior-driven)",
    "Policy Readiness": "High"
}

pd.DataFrame.from_dict(
    executive_summary,
    orient="index",
    columns=["Summary"]
)

## **Strategic Recommendations**

### **For Insurers**

* Implement behavior-sensitive pricing
* Use explainable ML for underwriting
* Introduce wellness-linked premium discounts

### **For Policymakers**

* Target smoking and obesity interventions
* Encourage insurerâ€“public health collaboration
* Promote transparency in risk scoring

### **For Employers / Group Plans**

* Incentivize preventive health participation
* Use tier-based contribution schemes
