# Self-Determination Theory (SDT) → Acceptance of AI Mental-Health Interventions (H1)

### Goal of H1

Test whether self-determination (SDT; TENS_Life_mean_imputed) predicts acceptance of:
- Accept_avatar_imputed (AI avatar / generic AI therapist)
- Accept_chatbot_imputed (AI chatbot)
- Accept_tele_imputed (teletherapy / human therapist)
- UTAUT_AI_mean_imputed (general AI-assisted mental-health interventions)

Step 1: Model acceptance as a function of confounders
- General AI attitudes (GAAIS_mean_imputed)
- Epistemic trust (ET_mean_imputed)
- Symptoms, stigma, age, and demographics (controls)

Step 2: Add SDT (TENS) and evaluate its incremental contribution (ΔR²).

No role moderator here (role is unclear in USA); role moderation will be China-only later.

# 0.0 Library Imports and Paths

In [None]:
from __future__ import annotations

import warnings
from pathlib import Path
from typing import Dict, List

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.stats.outliers_influence import variance_inflation_factor

warnings.filterwarnings("ignore", category=FutureWarning)

PROJECT_ROOT = Path.cwd()
DATA_DIR = PROJECT_ROOT / "data"
OUTPUT_DIR = DATA_DIR / "output"

PROCESSED_PATH = OUTPUT_DIR / "processed_for_analysis.csv"

# 1.0. Load Processed Data
- We use the fully merged dataset that already contains composite scores and harmonized variables for China + USA.

In [None]:
processed = pd.read_csv(PROCESSED_PATH)
print("Processed shape:", processed.shape)

In [None]:
# key *_imputed variables should exist
key_imputed = [
    "TENS_Life_mean_imputed",
    "Accept_avatar_imputed", "Accept_chatbot_imputed",
    "Accept_tele_imputed", "UTAUT_AI_mean_imputed",
    "GAAIS_mean_imputed", "ET_mean_imputed",
    "PHQ5_mean_imputed", "SSRPH_mean_imputed",
    "age_imputed",
    "Country", "gender",
]

missing_cols = [c for c in key_imputed if c not in processed.columns]
print("Missing key columns:", missing_cols)

# 2.0. Define H1 Variables and Prepare Analytic Sample
Predictor
- TENS_Life_mean_imputed (SDT / basic psychological needs satisfaction)

Outcomes
- Accept_avatar_imputed
- Accept_chatbot_imputed
- Accept_tele_imputed
- UTAUT_AI_mean_imputed

Confounders / Controls
- Primary confounders:
    - GAAIS_mean_imputed (general AI attitudes)
    - ET_mean_imputed (epistemic trust)
- Additional controls:
    - PHQ5_mean_imputed (depressive symptoms)
    - SSRPH_mean_imputed (mental-health stigma)
    - age_imputed (age)
    - gender (categorical)
    - Country (China vs. USA; treated as covariate, not moderator here)

In [None]:
# Outcomes (acceptance of interventions and global AI mental-health interventions)
h1_outcomes = [
    "Accept_avatar_imputed",
    "Accept_chatbot_imputed",
    "Accept_tele_imputed",
    "UTAUT_AI_mean_imputed",
]

In [None]:
# Predictor (SDT / basic needs satisfaction)
h1_predictor = "TENS_Life_mean_imputed"

In [None]:
# Confounders / controls
confounders_continuous = [
    "GAAIS_mean_imputed",   # general AI attitudes
    "ET_mean_imputed",      # epistemic trust
    "PHQ5_mean_imputed",    # depressive symptoms
    "SSRPH_mean_imputed",   # mental-health stigma
    "age_imputed",          # age
]

In [None]:
confounders_categorical = [
    "gender",               # demographic control
    "Country",              # country (treated as covariate, not moderator for H1)
]

In [None]:
h1_vars = (
    h1_outcomes
    + [h1_predictor]
    + confounders_continuous
    + confounders_categorical
)

## 2.1. Keep only rows that have everything needed for H1

In [None]:
h1_df = processed[h1_vars].copy()
n_total = len(h1_df)

h1_df = h1_df.dropna(subset=["gender", "Country"])
n_complete = len(h1_df)

In [None]:
print("H1 analytic sample:")
print(f"Total N in processed: {n_total}")
print(f"N with non-missing gender & Country: {n_complete}")

In [None]:
print("Country distribution (H1 sample):")
print(h1_df["Country"].value_counts(dropna=False))

In [None]:
print("Gender distribution (H1 sample):")
print(h1_df["gender"].value_counts(dropna=False))

# 3.0. Descriptive Statistics for H1 Variables

In [None]:
continuous_vars = [
    h1_predictor,
] + h1_outcomes + confounders_continuous

print("Descriptive statistics (continuous variables):")
display(h1_df[continuous_vars].describe().T)

In [None]:
print("TENS_Life_mean_imputed by Country:")
display(h1_df.groupby("Country")[h1_predictor].describe())

In [None]:
print("TENS_Life_mean_imputed by Country:")
display(h1_df.groupby("Country")["TENS_Life_mean_imputed"].describe())

In [None]:
print("Acceptance outcomes by Country (means):")
display(
    h1_df.groupby("Country")[
        ["Accept_avatar_imputed",
         "Accept_chatbot_imputed",
         "Accept_tele_imputed",
         "UTAUT_AI_mean_imputed"]
    ].mean()
)

In [None]:
# Correlation matrix for core H1 variables
plt.figure(figsize=(10, 8))
corr_vars = [
    h1_predictor,
] + h1_outcomes + [
    "GAAIS_mean_imputed", "ET_mean_imputed",
    "PHQ5_mean_imputed", "SSRPH_mean_imputed",
]
sns.heatmap(h1_df[corr_vars].corr(), annot=True, fmt=".2f")
plt.title("Correlation Matrix for H1 Variables (Imputed)")
plt.tight_layout()
plt.show()

# 4.0. Center Continuous Predictors

We mean-center SDT and all continuous confounders. This keeps interpretability and lines up with the later moderation logic.

In [None]:
center_vars = [
    h1_predictor,
    "GAAIS_mean_imputed", "ET_mean_imputed",
    "PHQ5_mean_imputed", "SSRPH_mean_imputed", "age_imputed",
]

for col in center_vars:
    if col in h1_df.columns:
        h1_df[col + "_c"] = h1_df[col] - h1_df[col].mean()

print("Centered variables created:")
print([c for c in h1_df.columns if c.endswith("_c")])

# 5.0. Confounder-Only Models (NO SDT)

we first report confounders (general AI attitudes & trust, plus symptoms, stigma, age, gender, country) before adding SDT.

In [None]:
confounder_outcomes = [
    "Accept_avatar_imputed",
    "Accept_chatbot_imputed",
    "Accept_tele_imputed",
]

confounder_results: Dict[str, sm.regression.linear_model.RegressionResultsWrapper] = {}

for outcome in confounder_outcomes:
    cols_needed = [
        outcome,
        "age_imputed_c",
        "PHQ5_mean_imputed_c", "SSRPH_mean_imputed_c",
        "GAAIS_mean_imputed_c", "ET_mean_imputed_c",
        "gender", "Country"
    ]
    c_df = h1_df[cols_needed].dropna().copy()

    print(f"Confounder-only model for {outcome} (N={len(c_df)})")

    formula = (
        f"{outcome} ~ age_imputed_c "
        "+ PHQ5_mean_imputed_c + SSRPH_mean_imputed_c "
        "+ GAAIS_mean_imputed_c + ET_mean_imputed_c "
        "+ C(gender) + C(Country)"
    )

    model = smf.ols(formula=formula, data=c_df).fit()
    confounder_results[outcome] = model

    display(model.summary().tables[1])
    print(f"R² (confounders-only {outcome}): {model.rsquared:.3f}")

# 6.0. H1 Models – H1 Models: Add SDT (TENS_Life_mean_imputed_c)

- We test whether SDT adds explanatory power beyond confounders.

In [None]:
print("=== H1: SDT (TENS) → Acceptance per Intervention and Global AI Interventions ===")

h1_results: Dict[str, sm.regression.linear_model.RegressionResultsWrapper] = {}

for outcome in h1_outcomes:
    cols_needed = [
        outcome,
        "TENS_Life_mean_imputed_c",
        "age_imputed_c",
        "PHQ5_mean_imputed_c", "SSRPH_mean_imputed_c",
        "GAAIS_mean_imputed_c", "ET_mean_imputed_c",
        "gender", "Country",
    ]

    m_df = h1_df[cols_needed].dropna().copy()

    print(f"\nH1 model for {outcome} (N={len(m_df)})")

    formula = (
        f"{outcome} ~ TENS_Life_mean_imputed_c "
        "+ age_imputed_c "
        "+ PHQ5_mean_imputed_c + SSRPH_mean_imputed_c "
        "+ GAAIS_mean_imputed_c + ET_mean_imputed_c "
        "+ C(gender) + C(Country)"
    )

    model = smf.ols(formula=formula, data=m_df).fit()
    h1_results[outcome] = model

    display(model.summary().tables[1])
    print(f"R² (H1 {outcome}): {model.rsquared:.3f}")


# 7.0. ΔR² and Effect Size Summary for TENS
- β_TENS, SE, p, 95% CI
- R²_baseline (where available)
- R²_H1
- ΔR² = R²_H1 - R²_baseline

In [None]:
summary_rows = []

for outcome in h1_outcomes:
    h1_model = h1_results[outcome]
    params = h1_model.params
    bse = h1_model.bse
    pvalues = h1_model.pvalues
    conf = h1_model.conf_int()
    r2_h1 = h1_model.rsquared

    beta_tens = params.get("TENS_Life_mean_imputed_c", np.nan)
    se_tens = bse.get("TENS_Life_mean_imputed_c", np.nan)
    p_tens = pvalues.get("TENS_Life_mean_imputed_c", np.nan)
    ci_low, ci_high = conf.loc["TENS_Life_mean_imputed_c"]

    # Baseline (confounder-only) R² only for the three intervention outcomes
    if outcome in confounder_results:
        r2_base = confounder_results[outcome].rsquared
        delta_r2 = r2_h1 - r2_base
    else:
        r2_base = np.nan
        delta_r2 = np.nan

    summary_rows.append({
        "Outcome": outcome,
        "N": int(h1_model.nobs),
        "beta_TENS": beta_tens,
        "SE_TENS": se_tens,
        "p_TENS": p_tens,
        "CI_low": ci_low,
        "CI_high": ci_high,
        "R2_baseline": r2_base,
        "R2_H1": r2_h1,
        "Delta_R2": delta_r2,
    })

h1_summary = pd.DataFrame(summary_rows)
display(h1_summary)

In [None]:
for _, row in h1_summary.iterrows():
    outcome = row["Outcome"]
    beta = row["beta_TENS"]
    p = row["p_TENS"]
    ci_low = row["CI_low"]
    ci_high = row["CI_high"]
    r2 = row["R2_H1"]
    dR2 = row["Delta_R2"]

    direction = "higher" if beta > 0 else "lower"
    sig = "statistically significant" if p < 0.05 else "not statistically significant"

    print(
        f"For {outcome}, higher SDT (TENS) is associated with {direction} scores "
        f"(β = {beta:.3f}, 95% CI [{ci_low:.3f}, {ci_high:.3f}], p = {p:.3g}), "
        f"after controlling for general AI attitudes (GAAIS), epistemic trust, "
        f"symptoms, stigma, age, gender, and country. The full model explains "
        f"R² = {r2:.3f} of the variance; ΔR² attributable to SDT is {dR2:.3f}."
    )

# 8.0. Visualization – Predicted Acceptance Across SDT Levels
- For each outcome, we plot model-predicted acceptance as a function of TENS_Life_mean (z-transformed) while holding covariates at their mean or reference categories.

In [None]:
def predicted_curve(
    model,
    df: pd.DataFrame,
    predictor_c: str,
    predictor_raw: str,
    outcome: str,
    n_points: int = 50,
) -> pd.DataFrame:
    # Raw scale distribution
    raw_mean = df[predictor_raw].mean()
    raw_std = df[predictor_raw].std()

    x_vals_raw = np.linspace(raw_mean - 2 * raw_std,
                             raw_mean + 2 * raw_std,
                             n_points)
    x_vals_c = x_vals_raw - raw_mean  # consistent with centering

    # Reference categories
    ref_gender = df["gender"].mode()[0]
    ref_country = df["Country"].mode()[0]

    pred_df = pd.DataFrame({
        predictor_c: x_vals_c,
        "age_imputed_c": 0.0,
        "PHQ5_mean_imputed_c": 0.0,
        "SSRPH_mean_imputed_c": 0.0,
        "GAAIS_mean_imputed_c": 0.0,
        "ET_mean_imputed_c": 0.0,
        "gender": ref_gender,
        "Country": ref_country,
    })

    preds = model.predict(pred_df)

    out = pd.DataFrame({
        predictor_raw: x_vals_raw,
        predictor_c: x_vals_c,
        "predicted": preds,
        "Outcome": outcome,
    })
    return out

In [None]:
plot_data = []
for outcome in h1_outcomes:
    model = h1_results[outcome]
    plot_df = predicted_curve(
        model=model,
        df=h1_df,
        predictor_c="TENS_Life_mean_imputed_c",
        predictor_raw="TENS_Life_mean_imputed",
        outcome=outcome,
        n_points=50,
    )
    plot_data.append(plot_df)

plot_data = pd.concat(plot_data, ignore_index=True)

plt.figure(figsize=(10, 6))
sns.lineplot(
    data=plot_data,
    x="TENS_Life_mean_imputed",
    y="predicted",
    hue="Outcome",
)
plt.xlabel("Self-Determination (TENS_Life_mean, imputed raw scale)")
plt.ylabel("Predicted Acceptance / AI Attitudes")
plt.title("Predicted Acceptance Across Self-Determination (H1, Imputed)")
plt.tight_layout()
plt.show()


# 9.0 VIF for Continuous Predictors (H1 Models)

We compute VIFs for the continuous predictors used in all H1 models. Since the predictor set is identical across outcomes, one VIF table is enough.

In [None]:
vif_vars = [
    "TENS_Life_mean_imputed_c",
    "GAAIS_mean_imputed_c",
    "ET_mean_imputed_c",
    "PHQ5_mean_imputed_c",
    "SSRPH_mean_imputed_c",
    "age_imputed_c",
]

X = h1_df[vif_vars].dropna().copy()
X_const = sm.add_constant(X)

vif_rows = []
for i, col in enumerate(X_const.columns):
    if col == "const":
        continue
    vif_val = variance_inflation_factor(X_const.values, i)
    vif_rows.append({"Predictor": col, "VIF": vif_val})

vif_df = pd.DataFrame(vif_rows)
print("VIF for continuous predictors in H1 models:")
display(vif_df.sort_values("VIF", ascending=False))

Variance inflation factors (VIFs) for all continuous predictors ranged from 1.11 to 1.42, indicating negligible multicollinearity. This confirms that SDT, general AI attitudes, epistemic trust, symptoms, stigma, and age capture distinct constructs and do not distort regression estimates.