# H2: Clinical Role Moderates SDT → Acceptance Relationship

## Research Question
**Does clinical role moderate the relationship between self-determination (SDT) and acceptance of AI mental health interventions?**

## Hypothesis (H2)
**"Clinical role moderates this relationship × 3 technologies"**

We test whether the SDT → Acceptance relationship differs by clinical role, separately for each technology.

---

## TWO Separate Analyses

### Analysis A (H2a): Combined China + USA Sample
**Sample:** China + USA participants with clear clinician or patient roles
**Role Variable:** `role_binary` (clinician vs patient) 
**N ≈ 1632** (excludes USA community members)

**Models:**
1. Avatar: `Accept_avatar_imputed ~ TENS_Life × role_binary + confounders + Country`
2. Chatbot: `Accept_chatbot_imputed ~ TENS_Life × role_binary + confounders + Country`
3. Teletherapy: `Accept_tele_imputed ~ TENS_Life × role_binary + confounders + Country`

*"We can do the analysis of clinician vs patient in the Chinese and USA sample combined"*

---

### Analysis B (H2b): USA Sample Only
**Sample:** USA participants only
**Role Variable:** `role_label_usa3` (clinician vs patient vs community)
**N ≈ 1742**

**Models:**
1. Avatar: `Accept_avatar_imputed ~ TENS_Life × role_label_usa3 + confounders`
2. Chatbot: `Accept_chatbot_imputed ~ TENS_Life × role_label_usa3 + confounders`
3. Teletherapy: `Accept_tele_imputed ~ TENS_Life × role_label_usa3 + confounders`

*"We can do the analysis on clinician vs patient vs community in the USA sample only"*

**Note:** Clinicians are coded as clinicians even if they have lived experience as patients.

---

## Confounders / Controls
All models control for:
- **GAAIS_mean_imputed** (general AI attitudes - control, NOT moderator)
- **ET_mean_imputed** (epistemic trust - control, NOT moderator)
- PHQ5_mean_imputed (depression)
- SSRPH_mean_imputed (stigma)
- age_imputed
- gender
- Country (in H2a only)

**Important:** GAAIS and ET are **covariates**, not moderators in H2.

---

## Analytic Approach
1. **Fit baseline model:** SDT + role + all confounders (main effects only)
2. **Fit interaction model:** SDT × role + all confounders
3. **Compare models:** ANOVA test for interaction significance
4. **If significant:** Probe simple slopes by role level

**Each analysis tests 3 technologies separately.**

# 0.0 Imports and Path Setup

In [None]:
from __future__ import annotations

import warnings
from pathlib import Path
from typing import Dict, List

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.stats.anova import anova_lm
from statsmodels.stats.outliers_influence import variance_inflation_factor
from IPython.display import display


warnings.filterwarnings("ignore", category=FutureWarning)
sns.set(style="whitegrid")

PROJECT_ROOT = Path.cwd()
DATA_DIR = PROJECT_ROOT / "data"
OUTPUT_DIR = DATA_DIR / "output"

PROCESSED_PATH = OUTPUT_DIR / "processed_for_analysis.csv"

In [None]:
processed = pd.read_csv(PROCESSED_PATH)
print("Processed shape:", processed.shape)

In [None]:
print("Unique Country values:")
print(processed["Country"].value_counts(dropna=False))

In [None]:
print("Unique role_binary values (clinician vs patient cross-country):")
print(processed["role_binary"].value_counts(dropna=False))

In [None]:
print("Unique role_label_usa3 values (USA-only 3-level role):")
print(processed["role_label_usa3"].value_counts(dropna=False))

# 1.0. Common H2 Definition

## 1.1. H2 Outcomes: Three Technology-Specific Acceptance Measures

**"Clinical role moderates this relationship × 3 technologies"**

Each technology is analyzed in separate models (NOT pooled):

In [None]:
h2_outcomes = [
    "Accept_avatar_imputed",    # Avatar / AI therapist acceptance
    "Accept_chatbot_imputed",   # Chatbot acceptance
    "Accept_tele_imputed",      # Teletherapy (human therapist) acceptance
]

## 1.2. Continuous Variables will be centered in each analytic Sample

In [None]:
continuous_imputed = [
    "TENS_Life_mean_imputed", "GAAIS_mean_imputed", "ET_mean_imputed", 
    "PHQ5_mean_imputed", "SSRPH_mean_imputed", "age_imputed"
    ]

## 1.3. Helper to fit Baseline main-effects model: 

SDT + role + confounders and Role-moderation model: SDT * role + same confounders
- For Clean ANOVA comparison
- outcome: e.g., 'Accept_chatbot_imputed'
- role: 'role_binary' or 'role_label_usa3'
- include_country: True when using China+USA; False for USA-only
- label: string for printing (e.g., 'H2a' / 'H2b')

In [None]:
def fit_role_moderation_for_outcome(outcome: str, df: pd.DataFrame, role_var: str, include_country: bool, label: str):
    
    cols = [
        outcome,
        "TENS_Life_mean_imputed_c",
        "GAAIS_mean_imputed_c",
        "ET_mean_imputed_c",
        "PHQ5_mean_imputed_c",
        "SSRPH_mean_imputed_c",
        "age_imputed_c",
        "gender",
        role_var,
    ]
    if include_country:
        cols.append("Country")

    sub_df = df[cols].dropna().copy()
    if sub_df.empty:
        print(f"[{label}] {outcome}: no complete cases available.")
        return None, None, None

    print(f"{label} – Role moderation for {outcome} (N={len(sub_df)})")

    # Baseline model: SDT + role + confounders
    baseline_formula = (
        f"{outcome} ~ "
        "TENS_Life_mean_imputed_c "
        "+ GAAIS_mean_imputed_c "
        "+ ET_mean_imputed_c "
        "+ PHQ5_mean_imputed_c "
        "+ SSRPH_mean_imputed_c "
        "+ age_imputed_c "
        "+ C(gender) "
        f"+ C({role_var})"
    )
    if include_country:
        baseline_formula += " + C(Country)"

    baseline_model = smf.ols(formula=baseline_formula, data=sub_df).fit()
    print("Baseline model (SDT + role + confounders):")
    display(baseline_model.summary().tables[1])
    print(f"R² (baseline) = {baseline_model.rsquared:.3f}")

    # Role-moderation model: SDT * role + confounders
    role_formula = (
        f"{outcome} ~ "
        f"TENS_Life_mean_imputed_c * C({role_var}) "
        "+ GAAIS_mean_imputed_c "
        "+ ET_mean_imputed_c "
        "+ PHQ5_mean_imputed_c "
        "+ SSRPH_mean_imputed_c "
        "+ age_imputed_c "
        "+ C(gender)"
    )
    if include_country:
        role_formula += " + C(Country)"

    role_model = smf.ols(formula=role_formula, data=sub_df).fit()
    print("Role-moderation model (SDT × role):")
    display(role_model.summary().tables[1])
    print(f"R² (role-moderation) = {role_model.rsquared:.3f}")

    # Model comparison via ANOVA
    print("Model comparison (Baseline vs Role-moderation):")
    comp = anova_lm(baseline_model, role_model)
    display(comp)

    return sub_df, baseline_model, role_model

# 2.0. Analysis A (H2a): Combined Sample - Clinician vs Patient

## Research Question
Does clinical role (clinician vs patient) moderate the SDT → Acceptance relationship across three technologies?

## Sample Characteristics
- **Countries:** China + USA combined
- **Role Variable:** `role_binary` (clinician vs patient)
- **Sample Size:** N ≈ 1632 (excludes USA community members)
- **Key Feature:** Tests whether SDT effects differ between clinicians and patients cross-culturally

## Models (Three Separate)
Each technology gets its own interaction model:
1. **Avatar:** Accept_avatar ~ TENS_Life × role_binary + confounders + Country
2. **Chatbot:** Accept_chatbot ~ TENS_Life × role_binary + confounders + Country  
3. **Teletherapy:** Accept_tele ~ TENS_Life × role_binary + confounders + Country

**Katie's requirement:** *"Analysis of clinician vs patient in the Chinese and USA sample combined"*

**Note:** Country is included as a **covariate** (not moderator - that's H3)

## 2.1. Build combined analytic sample with clinician vs patient

In [None]:
h2a_vars = (h2_outcomes + continuous_imputed + ["gender", "Country", "role_binary"])
h2a_df = processed[h2a_vars].copy()

In [None]:
# Keep only China + USA
h2a_df = h2a_df[h2a_df["Country"].isin(["China", "USA"])].copy()

# Keep only clear clinician vs patient (role_binary) cases
h2a_df = h2a_df[h2a_df["role_binary"].isin(["clinician", "patient"])].copy()

## 2.2. Drop missing gender / country to match confounder sets

In [None]:
h2a_df = h2a_df.dropna(subset=["gender", "Country", "role_binary"])

In [None]:
print("H2a analytic sample size:", len(h2a_df))
print("H2a Country distribution:")
print(h2a_df["Country"].value_counts(dropna=False))

In [None]:
print("H2a role_binary distribution (clinician vs patient):")
print(h2a_df["role_binary"].value_counts(dropna=False))

## 2.3. Center all continuous imputed predictors in combined sample

In [None]:
for col in continuous_imputed:
    mean_val = h2a_df[col].mean()
    h2a_df[f"{col}_c"] = h2a_df[col] - mean_val
    print(f"H2a – {col} mean for centering: {mean_val:.3f}")

In [None]:
print("H2a – means of centered variables:")
display(h2a_df[[f"{c}_c" for c in continuous_imputed]].mean())

## 2.4. Fit H2a models for each intervention-specific outcome

In [None]:
h2a_models: Dict[str, Dict[str, object]] = {}

for outcome in h2_outcomes:
    sub_df, base_m, role_m = fit_role_moderation_for_outcome(
        outcome=outcome,
        df=h2a_df,
        role_var="role_binary",
        include_country=True,
        label="H2a (China & USA, clinician vs patient)",
    )
    h2a_models[outcome] = {
        "data": sub_df,
        "baseline": base_m,
        "role_model": role_m,
    }

## 2.5. Summary table for SDT × Role (clinician vs patient) interaction

In [None]:
h2a_summary_rows = []

for outcome in h2_outcomes:
    role_model = h2a_models[outcome]["role_model"]
    if role_model is None:
        continue

    # Find the interaction term for SDT × role_binary (whatever the non-reference level is)
    int_terms = [
        p for p in role_model.params.index
        if p.startswith("TENS_Life_mean_imputed_c:C(role_binary)")
    ]

    # 'TENS_Life_mean_imputed_c:C(role_binary)[T.patient]'
    term_name = int_terms[0]
    role_level = term_name.split("[T.")[-1].rstrip("]")

    beta = role_model.params[term_name]
    se = role_model.bse[term_name]
    p = role_model.pvalues[term_name]
    ci_low, ci_high = role_model.conf_int().loc[term_name]
    r2 = role_model.rsquared

    h2a_summary_rows.append({
        "Outcome": outcome,
        "Interaction_level": role_level,
        "beta_TENSxRole(diff_vs_clinician)": beta,
        "SE": se,
        "p": p,
        "CI_low": ci_low,
        "CI_high": ci_high,
        "R2_role_model": r2,
    })

h2a_summary_df = pd.DataFrame(h2a_summary_rows)
print("H2a: SDT × Role (clinician vs patient) interaction summary (China + USA):")
display(h2a_summary_df)

# 3.0. Analysis B (H2b): USA Sample Only - Three-Level Role

## Research Question  
Does clinical role (clinician vs patient vs community) moderate the SDT → Acceptance relationship in the USA sample?

## Sample Characteristics
- **Country:** USA only
- **Role Variable:** `role_label_usa3` (clinician vs patient vs community)
- **Sample Size:** N ≈ 1742
- **Key Feature:** Tests finer-grained role differences including community members

## Models (Three Separate)
Each technology gets its own interaction model:
1. **Avatar:** Accept_avatar ~ TENS_Life × role_label_usa3 + confounders
2. **Chatbot:** Accept_chatbot ~ TENS_Life × role_label_usa3 + confounders
3. **Teletherapy:** Accept_tele ~ TENS_Life × role_label_usa3 + confounders

*"Analysis on clinician vs patient vs community in the USA sample only"*

*"We can use the clinicians as clinicians (even if they are also patients, which is very common). We can use all patients unless they are also clinicians."*

**Note:** Country is constant (USA), so not included as covariate

In [None]:
h2b_vars = (h2_outcomes + continuous_imputed + ["gender", "Country", "role_label_usa3"])
h2b_df = processed[h2b_vars].copy()

## 3.1. Restrict to USA only

In [None]:
h2b_df = h2b_df[h2b_df["Country"] == "USA"].copy()

In [None]:
# Keep non-missing role_label_usa3 + gender
h2b_df = h2b_df.dropna(subset=["gender", "role_label_usa3"])

In [None]:
print("H2b analytic sample size (USA-only):", len(h2b_df))
print("H2b role_label_usa3 distribution:")
print(h2b_df["role_label_usa3"].value_counts(dropna=False))

In [None]:
# Center continuous variables within USA-only sample
for col in continuous_imputed:
    mean_val = h2b_df[col].mean()
    h2b_df[f"{col}_c"] = h2b_df[col] - mean_val
    print(f"H2b – {col} mean for centering (USA-only): {mean_val:.3f}")

In [None]:
print("H2b – means of centered variables (should be ≈ 0):")
display(h2b_df[[f"{c}_c" for c in continuous_imputed]].mean())

In [None]:
# Fit H2b models for each outcome (Country is constant, so not included)
h2b_models: Dict[str, Dict[str, object]] = {}
for outcome in h2_outcomes:
    sub_df, base_m, role_m = fit_role_moderation_for_outcome(
        outcome=outcome,
        df=h2b_df,
        role_var="role_label_usa3",
        include_country=False,
        label="H2b (USA-only, clinician vs patient vs community)",
    )
    h2b_models[outcome] = {
        "data": sub_df,
        "baseline": base_m,
        "role_model": role_m,
    }

In [None]:
# Summary table – SDT × each USA role (vs reference, likely 'patient')
h2b_summary_rows = []
for outcome in h2_outcomes:
    role_model = h2b_models[outcome]["role_model"]
    if role_model is None:
        continue

    print(f"[H2b] SDT × role interaction terms for {outcome}:")
    int_terms = [p for p in role_model.params.index
                 if "TENS_Life_mean_imputed_c:C(role_label_usa3)" in p]
    print(int_terms)

    for term in int_terms:
        beta = role_model.params[term]
        se = role_model.bse[term]
        p = role_model.pvalues[term]
        ci_low, ci_high = role_model.conf_int().loc[term]
        r2 = role_model.rsquared

        h2b_summary_rows.append({
            "Outcome": outcome,
            "Interaction_term": term,
            "beta": beta,
            "SE": se,
            "p": p,
            "CI_low": ci_low,
            "CI_high": ci_high,
            "R2_role_model": r2,
        })

h2b_summary_df = pd.DataFrame(h2b_summary_rows)
print("H2b: SDT × Role (role_label_usa3) interaction summary (USA-only):")
display(h2b_summary_df)

# 4.0. H2 OVERALL SUMMARY: Role Moderation Results

## Research Question
**Does clinical role moderate the relationship between self-determination and acceptance of AI mental health interventions?**

## Two Analyses Completed

### Analysis A (H2a): Combined China + USA  
- **Sample:** N = 1632 (clinicians + patients only)
- **Role:** Clinician vs Patient (2-level)
- **Technologies:** Avatar, Chatbot, Teletherapy (3 separate models)
- **Covariate:** Country

### Analysis B (H2b): USA Only
**Sample:** N = 1742 (all role categories)
**Role:** Clinician vs Patient vs Community (3-level)
**Technologies:** Avatar, Chatbot, Teletherapy (3 separate models)
**Note:** Country constant (USA), not included

---

## Combined Results Summary

Below we synthesize interaction effects across both analyses and all technologies.

In [None]:
# Create comprehensive H2 summary table
print("=" * 90)
print("H2 COMPREHENSIVE SUMMARY: Role Moderation Across Both Analyses")
print("=" * 90)

# Collect all interaction effects
h2_all_results = []

# H2a results (Combined sample)
print("\n" + "─" * 90)
print("ANALYSIS A (H2a): Combined China + USA - Clinician vs Patient")
print("─" * 90)

for outcome in h2_outcomes:
    base_model = h2a_models[outcome]["baseline"]
    role_model = h2a_models[outcome]["role_model"]
    
    if role_model is None:
        continue
    
    # Get interaction term
    int_term = [p for p in role_model.params.index 
                if "TENS_Life_mean_imputed_c:C(role_binary)" in p][0]
    
    beta = role_model.params[int_term]
    se = role_model.bse[int_term]
    p = role_model.pvalues[int_term]
    
    r2_base = base_model.rsquared
    r2_full = role_model.rsquared
    delta_r2 = r2_full - r2_base
    
    tech_name = outcome.replace("Accept_", "").replace("_imputed", "").title()
    
    h2_all_results.append({
        "Analysis": "H2a (Combined)",
        "Technology": tech_name,
        "Role_Contrast": "Patient vs Clinician",
        "beta_interaction": beta,
        "SE": se,
        "p": p,
        "R2_baseline": r2_base,
        "R2_full": r2_full,
        "Delta_R2": delta_r2,
        "Significant": "Yes" if p < 0.05 else "No"
    })
    
    sig_symbol = "**" if p < 0.01 else "*" if p < 0.05 else ""
    print(f"{tech_name:12} | β = {beta:7.3f}, SE = {se:.3f}, p = {p:.3f}{sig_symbol:2} | ΔR² = {delta_r2:.4f}")

# H2b results (USA only)
print("\n" + "─" * 90)
print("ANALYSIS B (H2b): USA Only - Clinician vs Patient vs Community")
print("─" * 90)

for outcome in h2_outcomes:
    base_model = h2b_models[outcome]["baseline"]
    role_model = h2b_models[outcome]["role_model"]
    
    if role_model is None:
        continue
    
    tech_name = outcome.replace("Accept_", "").replace("_imputed", "").title()
    
    # Get all interaction terms (community and patient vs clinician)
    int_terms = [p for p in role_model.params.index 
                 if "TENS_Life_mean_imputed_c:C(role_label_usa3)" in p]
    
    r2_base = base_model.rsquared
    r2_full = role_model.rsquared
    delta_r2 = r2_full - r2_base
    
    print(f"\n{tech_name}:")
    
    for term in int_terms:
        role_level = term.split("[T.")[-1].rstrip("]").title()
        beta = role_model.params[term]
        se = role_model.bse[term]
        p = role_model.pvalues[term]
        
        h2_all_results.append({
            "Analysis": "H2b (USA only)",
            "Technology": tech_name,
            "Role_Contrast": f"{role_level} vs Clinician",
            "beta_interaction": beta,
            "SE": se,
            "p": p,
            "R2_baseline": r2_base,
            "R2_full": r2_full,
            "Delta_R2": delta_r2,
            "Significant": "Yes" if p < 0.05 else "No"
        })
        
        sig_symbol = "**" if p < 0.01 else "*" if p < 0.05 else ""
        print(f"  {role_level:10} vs Clinician | β = {beta:7.3f}, SE = {se:.3f}, p = {p:.3f}{sig_symbol}")
    
    print(f"  Overall ΔR² = {delta_r2:.4f}")

# Create comprehensive summary DataFrame
h2_summary_df = pd.DataFrame(h2_all_results)

print("\n" + "=" * 90)
print("COMBINED H2 SUMMARY TABLE")
print("=" * 90)
display(h2_summary_df)