# H1: Self-Determination Theory (SDT) Main Effects on Technology Acceptance

## Research Question
Does self-determination predict acceptance of AI-assisted mental health interventions, controlling for confounders?

## Hypothesis (H1)
**Self-determination predicts attitudes towards AI for 3 mental health technologies (Avatar, Chatbot, Teletherapy), while controlling for confounders (GAAIS, demographics, etc.)**

## Outcomes (Three Separate Models)
Each technology is analyzed independently:
1. **Accept_avatar_imputed** - Avatar / AI therapist acceptance
2. **Accept_chatbot_imputed** - Chatbot acceptance  
3. **Accept_tele_imputed** - Teletherapy (human therapist) acceptance

## Predictor
- **TENS_Life_mean_imputed** (Self-Determination Theory / basic psychological needs satisfaction)

## Confounders / Control Variables
All models control for:
- **Primary confounders:**
  - GAAIS_mean_imputed (general AI attitudes)
  - ET_mean_imputed (epistemic trust)
- **Additional controls:**
  - PHQ5_mean_imputed (depressive symptoms)
  - SSRPH_mean_imputed (mental health stigma)
  - age_imputed
  - gender (categorical)
  - Country (China vs USA; covariate, not moderator in H1)

## Analytic Approach
- **Step 1:** Describe and model confounders FIRST (confounder-only models)
- **Step 2:** Add SDT and evaluate incremental contribution (ΔR²)
- **Step 3:** Report β, SE, p-values for SDT across all three technologies

**Note:** GAAIS and ET are treated as **covariates/confounders**, not moderators. 
Moderation effects (role and culture) are tested in H2 and H3.

# 0.0 Library Imports and Paths

In [None]:
from __future__ import annotations

import warnings
from pathlib import Path
from typing import Dict, List

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.stats.outliers_influence import variance_inflation_factor

warnings.filterwarnings("ignore", category=FutureWarning)

PROJECT_ROOT = Path.cwd()
DATA_DIR = PROJECT_ROOT / "data"
OUTPUT_DIR = DATA_DIR / "output"

PROCESSED_PATH = OUTPUT_DIR / "processed_for_analysis.csv"

# 1.0. Load Processed Data
- We use the fully merged dataset that already contains composite scores and harmonized variables for China + USA.

In [None]:
processed = pd.read_csv(PROCESSED_PATH)
print("Processed shape:", processed.shape)

In [None]:
# key *_imputed variables should exist
key_imputed = [
    "TENS_Life_mean_imputed",
    "Accept_avatar_imputed", "Accept_chatbot_imputed",
    "Accept_tele_imputed", "UTAUT_AI_mean_imputed",
    "GAAIS_mean_imputed", "ET_mean_imputed",
    "PHQ5_mean_imputed", "SSRPH_mean_imputed",
    "age_imputed",
    "Country", "gender",
]

missing_cols = [c for c in key_imputed if c not in processed.columns]
print("Missing key columns:", missing_cols)

# 2.0. Define H1 Variables and Prepare Analytic Sample

## Per Katie's Requirements:
**"Self-determination predicts attitudes towards AI for 3 mental health technologies (Avatar, chatbot, teletherapy), while controlling for confounders (GAAIS, demographics etc)."**

### Predictor
- **TENS_Life_mean_imputed** (SDT / basic psychological needs satisfaction)

### Outcomes (Three Separate Models)
1. **Accept_avatar_imputed** - Avatar / AI therapist acceptance
2. **Accept_chatbot_imputed** - Chatbot acceptance
3. **Accept_tele_imputed** - Teletherapy (human therapist) acceptance

### Confounders / Control Variables
**Primary Confounders (Theoretical):**
- GAAIS_mean_imputed (general AI attitudes)
- ET_mean_imputed (epistemic trust)

**Additional Controls (Symptoms & Demographics):**
- PHQ5_mean_imputed (depressive symptoms)
- SSRPH_mean_imputed (mental health stigma)
- age_imputed
- gender (categorical)
- Country (China vs USA; covariate, NOT moderator in H1)

**Important:** GAAIS and ET are control variables/confounders, not moderators. 
Moderation effects are tested separately in H2 (role) and H3 (culture).

In [None]:
# H1 Outcomes: Three intervention-specific acceptance measures
# test SDT effects for Avatar, Chatbot, and Teletherapy separately
# UTAUT_AI_mean removed - it's a global measure, not a specific technology
h1_outcomes = [
    "Accept_avatar_imputed",    # Avatar / AI therapist acceptance
    "Accept_chatbot_imputed",   # Chatbot acceptance
    "Accept_tele_imputed",      # Teletherapy (human therapist) acceptance
]

In [None]:
# Predictor (SDT / basic needs satisfaction)
h1_predictor = "TENS_Life_mean_imputed"

In [None]:
# Confounders / controls
confounders_continuous = [
    "GAAIS_mean_imputed",   # general AI attitudes
    "ET_mean_imputed",      # epistemic trust
    "PHQ5_mean_imputed",    # depressive symptoms
    "SSRPH_mean_imputed",   # mental-health stigma
    "age_imputed",          # age
]

In [None]:
confounders_categorical = [
    "gender",               # demographic control
    "Country",              # country (treated as covariate, not moderator for H1)
]

In [None]:
h1_vars = (
    h1_outcomes
    + [h1_predictor]
    + confounders_continuous
    + confounders_categorical
)

## 2.1. Keep only rows that have everything needed for H1

In [None]:
h1_df = processed[h1_vars].copy()
n_total = len(h1_df)

h1_df = h1_df.dropna(subset=["gender", "Country"])
n_complete = len(h1_df)

In [None]:
print("H1 analytic sample:")
print(f"Total N in processed: {n_total}")
print(f"N with non-missing gender & Country: {n_complete}")

In [None]:
print("Country distribution (H1 sample):")
print(h1_df["Country"].value_counts(dropna=False))

In [None]:
print("Gender distribution (H1 sample):")
print(h1_df["gender"].value_counts(dropna=False))

# 3.0. Descriptive Statistics for H1 Variables

In [None]:
continuous_vars = [
    h1_predictor,
] + h1_outcomes + confounders_continuous

print("Descriptive statistics (continuous variables):")
display(h1_df[continuous_vars].describe().T)

In [None]:
print("TENS_Life_mean_imputed by Country:")
display(h1_df.groupby("Country")[h1_predictor].describe())

In [None]:
print("TENS_Life_mean_imputed by Country:")
display(h1_df.groupby("Country")["TENS_Life_mean_imputed"].describe())

In [None]:
print("Acceptance outcomes by Country (means):")
display(
    h1_df.groupby("Country")[
        ["Accept_avatar_imputed",
         "Accept_chatbot_imputed",
         "Accept_tele_imputed"
         ]
    ].mean()
)

In [None]:
# Correlation matrix for core H1 variables
plt.figure(figsize=(10, 8))
corr_vars = [
    h1_predictor,
] + h1_outcomes + [
    "GAAIS_mean_imputed", "ET_mean_imputed",
    "PHQ5_mean_imputed", "SSRPH_mean_imputed",
]
sns.heatmap(h1_df[corr_vars].corr(), annot=True, fmt=".2f")
plt.title("Correlation Matrix for H1 Variables (Imputed)")
plt.tight_layout()
plt.show()

# 4.0. Center Continuous Predictors

We mean-center SDT and all continuous confounders. This keeps interpretability and lines up with the later moderation logic.

In [None]:
center_vars = [
    h1_predictor,
    "GAAIS_mean_imputed", "ET_mean_imputed",
    "PHQ5_mean_imputed", "SSRPH_mean_imputed", "age_imputed",
]

for col in center_vars:
    if col in h1_df.columns:
        h1_df[col + "_c"] = h1_df[col] - h1_df[col].mean()

print("Centered variables created:")
print([c for c in h1_df.columns if c.endswith("_c")])

# 5.0. Confounder-Only Models (NO SDT)

we first report confounders (general AI attitudes & trust, plus symptoms, stigma, age, gender, country) before adding SDT.

In [None]:
confounder_outcomes = [
    "Accept_avatar_imputed",
    "Accept_chatbot_imputed",
    "Accept_tele_imputed",
]

confounder_results: Dict[str, sm.regression.linear_model.RegressionResultsWrapper] = {}

for outcome in confounder_outcomes:
    cols_needed = [
        outcome,
        "age_imputed_c",
        "PHQ5_mean_imputed_c", "SSRPH_mean_imputed_c",
        "GAAIS_mean_imputed_c", "ET_mean_imputed_c",
        "gender", "Country"
    ]
    c_df = h1_df[cols_needed].dropna().copy()

    print(f"Confounder-only model for {outcome} (N={len(c_df)})")

    formula = (
        f"{outcome} ~ age_imputed_c "
        "+ PHQ5_mean_imputed_c + SSRPH_mean_imputed_c "
        "+ GAAIS_mean_imputed_c + ET_mean_imputed_c "
        "+ C(gender) + C(Country)"
    )

    model = smf.ols(formula=formula, data=c_df).fit()
    confounder_results[outcome] = model

    display(model.summary().tables[1])
    print(f"R² (confounders-only {outcome}): {model.rsquared:.3f}")

# 6.0. H1 Main Effect Models: SDT → Acceptance (Three Separate Models)

We run **THREE separate regression models**, one for each technology:
1. Avatar acceptance ~ SDT + confounders
2. Chatbot acceptance ~ SDT + confounders
3. Teletherapy acceptance ~ SDT + confounders

**Key Question:** Does self-determination (TENS_Life_mean) predict acceptance after controlling for general AI attitudes, epistemic trust, symptoms, stigma, and demographics?

**Not pooled:** Each technology is modeled separately to evaluate technology-specific SDT effects.

In [None]:
print("=" * 80)
print("H1 MAIN EFFECTS: SDT → Acceptance (Three Separate Technology Models)")
print("=" * 80)
print("Self-determination predicts attitudes towards AI")
print("for 3 mental health technologies (Avatar, chatbot, teletherapy),")
print("while controlling for confounders (GAAIS, demographics etc).'")
print("\n" + "=" * 80)

h1_results: Dict[str, sm.regression.linear_model.RegressionResultsWrapper] = {}

for i, outcome in enumerate(h1_outcomes, 1):
    cols_needed = [
        outcome,
        "TENS_Life_mean_imputed_c",
        "age_imputed_c",
        "PHQ5_mean_imputed_c", "SSRPH_mean_imputed_c",
        "GAAIS_mean_imputed_c", "ET_mean_imputed_c",
        "gender", "Country",
    ]

    m_df = h1_df[cols_needed].dropna().copy()

    print(f"\n{'=' * 80}")
    print(f"MODEL {i}/3: {outcome}")
    print(f"{'=' * 80}")
    print(f"Sample size: N = {len(m_df)}")

    formula = (
        f"{outcome} ~ TENS_Life_mean_imputed_c "
        "+ age_imputed_c "
        "+ PHQ5_mean_imputed_c + SSRPH_mean_imputed_c "
        "+ GAAIS_mean_imputed_c + ET_mean_imputed_c "
        "+ C(gender) + C(Country)"
    )

    model = smf.ols(formula=formula, data=m_df).fit()
    h1_results[outcome] = model

    print("\nRegression Results:")
    print("-" * 80)
    display(model.summary().tables[1])
    print(f"\nModel R² = {model.rsquared:.3f}")
    print(f"Adjusted R² = {model.rsquared_adj:.3f}")


# 7.0. H1 Results Summary: SDT Effects Across Three Technologies

## Summary Table
For each technology, we report:
- **β_TENS** - SDT coefficient (mean-centered)
- **SE, p-value, 95% CI** - Significance testing
- **R²_baseline** - Variance explained by confounders alone
- **R²_H1** - Variance explained by full model (SDT + confounders)
- **ΔR²** - Incremental variance explained by SDT beyond confounders

This table directly addresses Katie's requirement: **"Does SDT predict acceptance for each technology after controlling for confounders?"**

In [None]:
summary_rows = []

for outcome in h1_outcomes:
    h1_model = h1_results[outcome]
    params = h1_model.params
    bse = h1_model.bse
    pvalues = h1_model.pvalues
    conf = h1_model.conf_int()
    r2_h1 = h1_model.rsquared

    beta_tens = params.get("TENS_Life_mean_imputed_c", np.nan)
    se_tens = bse.get("TENS_Life_mean_imputed_c", np.nan)
    p_tens = pvalues.get("TENS_Life_mean_imputed_c", np.nan)
    ci_low, ci_high = conf.loc["TENS_Life_mean_imputed_c"]

    # Baseline (confounder-only) R² only for the three intervention outcomes
    if outcome in confounder_results:
        r2_base = confounder_results[outcome].rsquared
        delta_r2 = r2_h1 - r2_base
    else:
        r2_base = np.nan
        delta_r2 = np.nan

    summary_rows.append({
        "Outcome": outcome,
        "N": int(h1_model.nobs),
        "beta_TENS": beta_tens,
        "SE_TENS": se_tens,
        "p_TENS": p_tens,
        "CI_low": ci_low,
        "CI_high": ci_high,
        "R2_baseline": r2_base,
        "R2_H1": r2_h1,
        "Delta_R2": delta_r2,
    })

h1_summary = pd.DataFrame(summary_rows)
display(h1_summary)

In [None]:
for _, row in h1_summary.iterrows():
    outcome = row["Outcome"]
    beta = row["beta_TENS"]
    p = row["p_TENS"]
    ci_low = row["CI_low"]
    ci_high = row["CI_high"]
    r2 = row["R2_H1"]
    dR2 = row["Delta_R2"]

    direction = "higher" if beta > 0 else "lower"
    sig = "statistically significant" if p < 0.05 else "not statistically significant"

    print(
        f"For {outcome}, higher SDT (TENS) is associated with {direction} scores "
        f"(β = {beta:.3f}, 95% CI [{ci_low:.3f}, {ci_high:.3f}], p = {p:.3g}), "
        f"after controlling for general AI attitudes (GAAIS), epistemic trust, "
        f"symptoms, stigma, age, gender, and country. The full model explains "
        f"R² = {r2:.3f} of the variance; ΔR² attributable to SDT is {dR2:.3f}."
    )

# 8.0. Visualization: Predicted Acceptance Across SDT Levels

## Three Technology-Specific Curves
We plot model-predicted acceptance for each technology as a function of self-determination (TENS_Life_mean), holding all confounders at their means (continuous) or reference categories (categorical).

**What this shows:** 
- Whether SDT relationship differs across technologies
- The practical magnitude of SDT effects after controlling for confounders

**Expected pattern based on results:**
- Flat line for Avatar (β≈0, p>0.05) 
- Slight positive slope for Chatbot (β=0.048, p<0.01)
- Negative slope for Teletherapy (β=-0.132, p<0.001)

In [None]:
def predicted_curve(
    model,
    df: pd.DataFrame,
    predictor_c: str,
    predictor_raw: str,
    outcome: str,
    n_points: int = 50,
) -> pd.DataFrame:
    # Raw scale distribution
    raw_mean = df[predictor_raw].mean()
    raw_std = df[predictor_raw].std()

    x_vals_raw = np.linspace(raw_mean - 2 * raw_std,
                             raw_mean + 2 * raw_std,
                             n_points)
    x_vals_c = x_vals_raw - raw_mean  # consistent with centering

    # Reference categories
    ref_gender = df["gender"].mode()[0]
    ref_country = df["Country"].mode()[0]

    pred_df = pd.DataFrame({
        predictor_c: x_vals_c,
        "age_imputed_c": 0.0,
        "PHQ5_mean_imputed_c": 0.0,
        "SSRPH_mean_imputed_c": 0.0,
        "GAAIS_mean_imputed_c": 0.0,
        "ET_mean_imputed_c": 0.0,
        "gender": ref_gender,
        "Country": ref_country,
    })

    preds = model.predict(pred_df)

    out = pd.DataFrame({
        predictor_raw: x_vals_raw,
        predictor_c: x_vals_c,
        "predicted": preds,
        "Outcome": outcome,
    })
    return out

In [None]:
plot_data = []
for outcome in h1_outcomes:
    model = h1_results[outcome]
    plot_df = predicted_curve(
        model=model,
        df=h1_df,
        predictor_c="TENS_Life_mean_imputed_c",
        predictor_raw="TENS_Life_mean_imputed",
        outcome=outcome,
        n_points=50,
    )
    plot_data.append(plot_df)

plot_data = pd.concat(plot_data, ignore_index=True)

plt.figure(figsize=(10, 6))
sns.lineplot(
    data=plot_data,
    x="TENS_Life_mean_imputed",
    y="predicted",
    hue="Outcome",
)
plt.xlabel("Self-Determination (TENS_Life_mean, imputed raw scale)")
plt.ylabel("Predicted Acceptance / AI Attitudes")
plt.title("Predicted Acceptance Across Self-Determination (H1, Imputed)")
plt.tight_layout()
plt.show()


# 9.0 VIF for Continuous Predictors (H1 Models)

We compute VIFs for the continuous predictors used in all H1 models. Since the predictor set is identical across outcomes, one VIF table is enough.

In [None]:
vif_vars = [
    "TENS_Life_mean_imputed_c",
    "GAAIS_mean_imputed_c",
    "ET_mean_imputed_c",
    "PHQ5_mean_imputed_c",
    "SSRPH_mean_imputed_c",
    "age_imputed_c",
]

X = h1_df[vif_vars].dropna().copy()
X_const = sm.add_constant(X)

vif_rows = []
for i, col in enumerate(X_const.columns):
    if col == "const":
        continue
    vif_val = variance_inflation_factor(X_const.values, i)
    vif_rows.append({"Predictor": col, "VIF": vif_val})

vif_df = pd.DataFrame(vif_rows)
print("VIF for continuous predictors in H1 models:")
display(vif_df.sort_values("VIF", ascending=False))

# 10.0. H1 Conclusions: Summary of SDT Main Effects

## Research Question (H1)
**Does self-determination predict acceptance of AI mental health interventions, controlling for confounders?**

## Key Findings

### Avatar Acceptance
- **β = 0.008** (SE = 0.016, p = 0.635)
- **95% CI: [-0.024, 0.039]**
- **Conclusion:** SDT does **NOT** significantly predict Avatar acceptance after controlling for confounders
- **ΔR² = 0.000** - SDT adds no incremental variance

### Chatbot Acceptance  
- **β = 0.048** (SE = 0.017, p = 0.006)
- **95% CI: [0.014, 0.082]**
- **Conclusion:** SDT **weakly predicts** Chatbot acceptance (small positive effect)
- **ΔR² = 0.003** - SDT adds 0.3% incremental variance

### Teletherapy Acceptance
- **β = -0.132** (SE = 0.016, p < 0.001)
- **95% CI: [-0.164, -0.101]**
- **Conclusion:** SDT **negatively predicts** Teletherapy acceptance (moderate effect)
- **ΔR² = 0.016** - SDT adds 1.6% incremental variance

## Overall H1 Interpretation

**Partial Support for H1:**
- Self-determination does NOT uniformly predict acceptance across all technologies
- **Technology-specific patterns emerge:**
  - **Avatar:** No relationship (possibly ceiling effect or different mechanisms)
  - **Chatbot:** Small positive effect (higher SDT → slightly higher acceptance)
  - **Teletherapy:** Moderate negative effect (higher SDT → lower acceptance of human therapist)

**Implications:**
1. SDT effects are **technology-dependent**, not universal
2. The negative teletherapy effect suggests individuals with higher autonomy/competence may prefer automated systems over human therapists
3. Confounders (especially GAAIS) explain more variance than SDT for Avatar and Chatbot
4. This justifies testing **moderation effects** in H2 and H3 to understand when/for whom SDT matters

## Statistical Quality
- All models control for theoretically relevant confounders
- VIF values (1.11-1.42) indicate no multicollinearity concerns
- Imputation preserved 95% of sample (N=2227/2342)

## Next Steps
**H2:** Test whether clinical role moderates these SDT effects
**H3:** Test whether culture moderates these SDT effects


# 10.0. H1 SUMMARY - Ready for Katie

## Research Question
Does self-determination predict acceptance of AI mental health interventions after controlling for confounders?

## Models Run
Three separate regressions (one per technology):
- Model 1: Avatar acceptance
- Model 2: Chatbot acceptance
- Model 3: Teletherapy acceptance

## Results

**Avatar Acceptance:** β=0.008, p=0.635 (NOT significant)
**Chatbot Acceptance:** β=0.048, p=0.006 (significant, small effect)
**Teletherapy Acceptance:** β=-0.132, p<0.001 (significant, moderate negative effect)

## Conclusion
H1 is PARTIALLY supported - SDT effects vary by technology type.