# ⚖️ Ontario Deep-Dive: Quantifying Health Disparities

### **Project Objective**
This notebook provides the final analytical step: quantifying the **practical significance** of demographic factors on chronic disease prevalence **specifically within Ontario**. While the previous notebook confirmed these disparities are statistically significant, this analysis measures their real-world magnitude.

### **Key Question**
* Within Ontario, how large are the health gaps between the most and least affected groups, and which factors (age, sex, income) have the most substantial impact?

---

### **Methodology**
The analysis uses Ontario-specific bootstrap estimates to calculate three key metrics that measure the size and impact of the observed health disparities:

* **Absolute Difference:** Measures the simple percentage point gap between the highest and lowest prevalence groups.
* **Relative Risk (RR):** Calculates how many times more likely the highest-risk group is to have a condition compared to the lowest-risk group.
* **Cramér's V:** Provides a standardized measure of the strength of association between a demographic factor and a health outcome.

### **Data Disclaimer**
Bootstrap estimate files derived from CCHS data are not shared in this repository due to licensing restrictions. The workflow and code logic are provided for transparency and reproducibility with similar datasets.

---

In [1]:
# ===========================================================
# Ontario Practical Significance Testing (Age, Sex, Income)
# Author: Arun Acharya
# ===========================================================

import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency

# -----------------------------------------------------------
# 1. FILE PATHS
# -----------------------------------------------------------
bootstrap_file = "C:/Users/achar/myprojects/Healthcare/ontario_bootstrap_estimates.xlsx"
output_file = "C:/Users/achar/myprojects/Healthcare/Ontario_Practical_Significance.xlsx"

# -----------------------------------------------------------
# 2. CONDITIONS
# -----------------------------------------------------------
conditions = [
    "Sleep Apnea",
    "High Blood Pressure",
    "High Blood Cholesterol",
    "Diabetes",
    "Chronic Fatigue Syndrome",
    "Mood Disorder",
    "Anxiety Disorder",
    "Respiratory Condition",
    "Musculoskeletal Condition",
    "Cardiovascular Condition"
]

# -----------------------------------------------------------
# 3. SHEET NAME DETECTION
# -----------------------------------------------------------
all_sheets = pd.ExcelFile(bootstrap_file).sheet_names
sheet_map = {cond: {"age": None, "sex": None, "income": None} for cond in conditions}

for sheet in all_sheets:
    for cond in conditions:
        if sheet.startswith(cond[:30]):
            lower = sheet.lower()
            if "_by_ag" in lower or "_by_age" in lower:
                sheet_map[cond]["age"] = sheet
            elif "_by_se" in lower:
                sheet_map[cond]["sex"] = sheet
            elif "_by_in" in lower:
                sheet_map[cond]["income"] = sheet

print("=== Auto-generated Sheet Mapping (Ontario) ===")
for cond, mapping in sheet_map.items():
    print(f"{cond}: {mapping}")

# -----------------------------------------------------------
# 4. FUNCTION: CRAMÉR'S V
# -----------------------------------------------------------
def calculate_cramers_v(df):
    df['Prop'] = df['Estimated Prevalence (%)'] / 100
    df['Weighted_Positive'] = df['Prop'] * df['Weighted N']
    df['Weighted_Negative'] = df['Weighted N'] - df['Weighted_Positive']
    contingency = df[['Weighted_Positive', 'Weighted_Negative']].to_numpy()
    chi2, _, _, _ = chi2_contingency(contingency)
    n = df['Weighted N'].sum()
    k = len(df)
    v = np.sqrt(chi2 / (n * (min(k, 2) - 1 if min(k, 2) > 1 else 1))) if k > 1 else 0
    return v

# -----------------------------------------------------------
# 5. FUNCTION: PRACTICAL SIGNIFICANCE
# -----------------------------------------------------------
def practical_significance(df, condition_name, stratifier_name):
    highest = df.loc[df['Estimated Prevalence (%)'].idxmax()]
    lowest = df.loc[df['Estimated Prevalence (%)'].idxmin()]
    
    # Metrics
    absolute_difference = highest['Estimated Prevalence (%)'] - lowest['Estimated Prevalence (%)']
    relative_risk = highest['Estimated Prevalence (%)'] / lowest['Estimated Prevalence (%)']
    highest_ci = (highest['95% CI Lower'], highest['95% CI Upper'])
    lowest_ci = (lowest['95% CI Lower'], lowest['95% CI Upper'])
    
    # Cramér's V only for stratifiers with >2 groups
    cramers_v = None
    if stratifier_name != "Sex (Male vs Female)":
        cramers_v = calculate_cramers_v(df)
    
    # Interpretation
    interpretation = (
        f"For {stratifier_name}, the highest prevalence group is '{highest['Group']}' "
        f"({highest['Estimated Prevalence (%)']:.2f}%, CI: {highest_ci[0]:.2f}-{highest_ci[1]:.2f}) "
        f"and the lowest is '{lowest['Group']}' "
        f"({lowest['Estimated Prevalence (%)']:.2f}%, CI: {lowest_ci[0]:.2f}-{lowest_ci[1]:.2f}). "
        f"The absolute difference is {absolute_difference:.2f}% "
        f"and the relative risk is {relative_risk:.2f}."
    )
    if cramers_v is not None:
        interpretation += f" Overall association strength (Cramér's V) is {cramers_v:.3f}."
    
    return {
        "Condition": condition_name,
        "Stratifier": stratifier_name,
        "Highest Group": highest['Group'],
        "Highest Prevalence (%)": round(highest['Estimated Prevalence (%)'], 2),
        "Highest CI": highest_ci,
        "Lowest Group": lowest['Group'],
        "Lowest Prevalence (%)": round(lowest['Estimated Prevalence (%)'], 2),
        "Lowest CI": lowest_ci,
        "Absolute Difference (%)": round(absolute_difference, 2),
        "Relative Risk": round(relative_risk, 2),
        "Cramer's V": round(cramers_v, 3) if cramers_v is not None else None,
        "Interpretation": interpretation
    }

# -----------------------------------------------------------
# 6. LOOP THROUGH ALL CONDITIONS & STRATIFIERS (ONTARIO)
# -----------------------------------------------------------
all_results = []
for cond in conditions:
    mapping = sheet_map[cond]
    # Age
    if mapping["age"]:
        df_age = pd.read_excel(bootstrap_file, sheet_name=mapping["age"])
        all_results.append(practical_significance(df_age, cond, "Age Group"))
    # Sex
    if mapping["sex"]:
        df_sex = pd.read_excel(bootstrap_file, sheet_name=mapping["sex"])
        all_results.append(practical_significance(df_sex, cond, "Sex (Male vs Female)"))
    # Income
    if mapping["income"]:
        df_income = pd.read_excel(bootstrap_file, sheet_name=mapping["income"])
        all_results.append(practical_significance(df_income, cond, "Income Group"))

# -----------------------------------------------------------
# 7. SAVE RESULTS
# -----------------------------------------------------------
results_df = pd.DataFrame(all_results)
results_df.to_excel(output_file, index=False)

print("\n=== ONTARIO PRACTICAL SIGNIFICANCE RESULTS (with Prevalence Columns) ===")
print(results_df)
print(f"\nResults saved to: {output_file}")


=== Auto-generated Sheet Mapping (Ontario) ===
Sleep Apnea: {'age': 'Sleep Apnea_by_Age Group', 'sex': 'Sleep Apnea_by_Sex', 'income': 'Sleep Apnea_by_Income Group'}
High Blood Pressure: {'age': 'High Blood Pressure_by_Age Grou', 'sex': 'High Blood Pressure_by_Sex', 'income': 'High Blood Pressure_by_Income G'}
High Blood Cholesterol: {'age': 'High Blood Cholesterol_by_Age G', 'sex': 'High Blood Cholesterol_by_Sex', 'income': 'High Blood Cholesterol_by_Incom'}
Diabetes: {'age': 'Diabetes_by_Age Group', 'sex': 'Diabetes_by_Sex', 'income': 'Diabetes_by_Income Group'}
Chronic Fatigue Syndrome: {'age': 'Chronic Fatigue Syndrome_by_Age', 'sex': 'Chronic Fatigue Syndrome_by_Sex', 'income': 'Chronic Fatigue Syndrome_by_Inc'}
Mood Disorder: {'age': 'Mood Disorder_by_Age Group', 'sex': 'Mood Disorder_by_Sex', 'income': 'Mood Disorder_by_Income Group'}
Anxiety Disorder: {'age': 'Anxiety Disorder_by_Age Group', 'sex': 'Anxiety Disorder_by_Sex', 'income': 'Anxiety Disorder_by_Income Grou'}
Respirat

# Conclusion and Final Project Summary

### Summary of Findings
This focused analysis confirms that the profound health disparities driven by age and income are not just a national phenomenon but are deeply embedded **within Ontario**.

* **Age remains the most powerful determinant** of chronic disease in the province. For high blood pressure, the prevalence among Ontario seniors (**52.36%**) is over **400 times higher** than among adolescents (**0.12%**), a gap of immense practical significance [3].
* **Income-based disparities are also substantial.** For musculoskeletal conditions, the prevalence in Ontario's lower-income groups is nearly **double** that of the highest-income groups, highlighting significant socioeconomic inequities within the province [26].

### Final Project Conclusion
This five-part analysis has provided a comprehensive, multi-layered view of chronic disease in Canada. We began with a broad exploratory analysis, confirmed our findings with national and provincial statistical testing, and finally quantified the real-world magnitude of these health gaps.

The consistent and central finding is that **age and income are the most significant drivers of health inequity** for chronic conditions, both in Canada and within Ontario. This provides strong, data-driven evidence for public health officials to focus resources on age-appropriate interventions and strategies that address the social determinants of health to create a more equitable healthcare landscape.

---

### References
3. Statistics Canada. *Canadian Community Health Survey (CCHS) 2019–2020: Ontario Subsample Analysis*.  
26. Public Health Ontario. *Socioeconomic Inequalities in Chronic Disease Prevalence in Ontario*.  
