# 🔬 Ontario-Specific Inferential Statistics

### **Project Objective**
Following the national-level analysis, this notebook performs a focused **inferential statistical analysis on the Ontario subset** of the CCHS data. The objective is to confirm whether the health disparities observed within Ontario across different demographic groups are **statistically significant**.

### **Key Question**
* Within Ontario, is there a statistically significant association between chronic disease prevalence and key demographic factors like **age, sex, and income**?

---

### **Methodology**
The analytical approach mirrors the national-level notebook, applying two main statistical tests to Ontario-specific bootstrap estimates:

* **Chi-square Test of Independence (χ²):** Used to test for significant associations between multi-category variables (Age Group, Income Group) and the prevalence of chronic conditions.
* **Two-Proportion Z-test:** Used to test for a significant difference in prevalence between males and females.

A p-value of less than 0.05 is used as the threshold for statistical significance.

### **Data Disclaimer**
Bootstrap estimate files derived from CCHS data are not shared in this repository due to licensing restrictions. The workflow and code logic are provided for transparency and reproducibility with similar datasets.

---

In [1]:
# ==============================================================
# Ontario-Specific Statistical Testing for All Chronic Conditions
# Author: Arun Acharya
# Description:
#   Auto-detects condition sheets, performs Chi-square & Z-tests
#   for Ontario population, outputs one master results table.
# ==============================================================

import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency, norm

# --------------------------------------------------------------
# 1. File path (Ontario bootstrap estimates)
# --------------------------------------------------------------
bootstrap_file = "C:/Users/achar/myprojects/Healthcare/ontario_bootstrap_estimates.xlsx"

# --------------------------------------------------------------
# 2. List of chronic conditions
# --------------------------------------------------------------
conditions = [
    "Sleep Apnea",
    "High Blood Pressure",
    "High Blood Cholesterol",
    "Diabetes",
    "Chronic Fatigue Syndrome",
    "Mood Disorder",
    "Anxiety Disorder",
    "Respiratory Condition",
    "Musculoskeletal Condition",
    "Cardiovascular Condition"
]

# --------------------------------------------------------------
# 3. Helper functions
# --------------------------------------------------------------
def run_chi_square(df):
    """Chi-square test for multi-group differences."""
    df['Prop'] = df['Estimated Prevalence (%)'] / 100
    df['Weighted_Positive'] = df['Prop'] * df['Weighted N']
    df['Weighted_Negative'] = df['Weighted N'] - df['Weighted_Positive']
    contingency = df[['Weighted_Positive', 'Weighted_Negative']].to_numpy()
    chi2, p, _, _ = chi2_contingency(contingency)
    return chi2, p

def run_z_test(df, group1_name, group2_name):
    """Z-test for two proportions (e.g., Male vs Female)."""
    g1 = df.loc[df['Group'] == group1_name]
    g2 = df.loc[df['Group'] == group2_name]
    p1, n1 = g1['Estimated Prevalence (%)'].iloc[0] / 100, g1['Weighted N'].iloc[0]
    p2, n2 = g2['Estimated Prevalence (%)'].iloc[0] / 100, g2['Weighted N'].iloc[0]
    se = np.sqrt((p1*(1-p1))/n1 + (p2*(1-p2))/n2)
    z = (p1 - p2) / se
    p_value = 2 * (1 - norm.cdf(abs(z)))
    return z, p_value

# --------------------------------------------------------------
# 4. Detect available sheets
# --------------------------------------------------------------
all_sheets = pd.ExcelFile(bootstrap_file).sheet_names
sheet_map = {cond: {"age": None, "sex": None, "income": None} for cond in conditions}

for sheet in all_sheets:
    for cond in conditions:
        if sheet.startswith(cond[:30]):  # safe prefix match
            lower = sheet.lower()
            if "_by_ag" in lower or "_by_age" in lower:
                sheet_map[cond]["age"] = sheet
            elif "_by_se" in lower:
                sheet_map[cond]["sex"] = sheet
            elif "_by_in" in lower:
                sheet_map[cond]["income"] = sheet

print("=== Auto-generated Sheet Mapping for Ontario ===")
for cond, mapping in sheet_map.items():
    print(f"{cond}: {mapping}")

# --------------------------------------------------------------
# 5. Run statistical tests for Ontario
# --------------------------------------------------------------
all_results = []

for cond in conditions:
    print(f"\nProcessing: {cond}")
    mapping = sheet_map[cond]
    
    # Age Group → Chi-square
    if mapping["age"]:
        df_age = pd.read_excel(bootstrap_file, sheet_name=mapping["age"])
        chi2_age, p_age = run_chi_square(df_age)
        all_results.append([cond, "Age Group", "Chi-square", chi2_age, p_age, p_age < 0.05])
        print(f"  Age sheet used: {mapping['age']}")
    else:
        print(f"⚠ No Age sheet found for {cond}")
    
    # Sex → Z-test
    if mapping["sex"]:
        df_sex = pd.read_excel(bootstrap_file, sheet_name=mapping["sex"])
        z_sex, p_sex = run_z_test(df_sex, "Male", "Female")
        all_results.append([cond, "Sex (Male vs Female)", "Z-test", z_sex, p_sex, p_sex < 0.05])
        print(f"  Sex sheet used: {mapping['sex']}")
    else:
        print(f"⚠ No Sex sheet found for {cond}")
    
    # Income → Chi-square
    if mapping["income"]:
        df_income = pd.read_excel(bootstrap_file, sheet_name=mapping["income"])
        chi2_income, p_income = run_chi_square(df_income)
        all_results.append([cond, "Income Group", "Chi-square", chi2_income, p_income, p_income < 0.05])
        print(f"  Income sheet used: {mapping['income']}")
    else:
        print(f"⚠ No Income sheet found for {cond}")

# --------------------------------------------------------------
# 6. Save Ontario results
# --------------------------------------------------------------
results_df = pd.DataFrame(all_results, columns=[
    "Condition", "Stratifier", "Test", "Test Statistic", "p-value", "Significant (p<0.05)"
])

output_file = "C:/Users/achar/myprojects/Healthcare/Ontario_Significance_Results.xlsx"
results_df.to_excel(output_file, index=False)

print("\n=== ONTARIO STATISTICAL TEST RESULTS ===")
print(results_df)
print(f"\nOntario-specific results saved to: {output_file}")


=== Auto-generated Sheet Mapping for Ontario ===
Sleep Apnea: {'age': 'Sleep Apnea_by_Age Group', 'sex': 'Sleep Apnea_by_Sex', 'income': 'Sleep Apnea_by_Income Group'}
High Blood Pressure: {'age': 'High Blood Pressure_by_Age Grou', 'sex': 'High Blood Pressure_by_Sex', 'income': 'High Blood Pressure_by_Income G'}
High Blood Cholesterol: {'age': 'High Blood Cholesterol_by_Age G', 'sex': 'High Blood Cholesterol_by_Sex', 'income': 'High Blood Cholesterol_by_Incom'}
Diabetes: {'age': 'Diabetes_by_Age Group', 'sex': 'Diabetes_by_Sex', 'income': 'Diabetes_by_Income Group'}
Chronic Fatigue Syndrome: {'age': 'Chronic Fatigue Syndrome_by_Age', 'sex': 'Chronic Fatigue Syndrome_by_Sex', 'income': 'Chronic Fatigue Syndrome_by_Inc'}
Mood Disorder: {'age': 'Mood Disorder_by_Age Group', 'sex': 'Mood Disorder_by_Sex', 'income': 'Mood Disorder_by_Income Group'}
Anxiety Disorder: {'age': 'Anxiety Disorder_by_Age Group', 'sex': 'Anxiety Disorder_by_Sex', 'income': 'Anxiety Disorder_by_Income Grou'}
Respir

# Conclusion and Interpretation

### Summary of Statistical Findings
The analysis confirms that the significant health disparities identified at the national level also hold true **within Ontario**. For every chronic condition examined, the tests revealed **statistically significant associations (p < 0.05)** with age, sex, and income.

### Practical Implications
* **Confirms Internal Disparities:** This provides robust statistical evidence that the most critical health equity challenges for Ontario lie within its own population, particularly concerning age- and income-related disparities.
* **Supports Targeted Policy:** The findings validate the conclusion from the main report: focusing on provincial averages is insufficient. These results justify the need for targeted, intra-provincial public health strategies aimed at supporting older adults and lower-income communities in Ontario.
* **Completes the Analytical Narrative:** This Ontario-specific analysis reinforces the overall project's thesis that understanding internal, demographically-driven health gaps is more impactful for policy-making than focusing on broad provincial-level comparisons.