<a href="https://colab.research.google.com/github/MANYI-anewbird/775/blob/main/830stat_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Key
*   gender
   * 0 = Male
   * 1 = Female
   * 2 = Non-binary / Third Gender
*   age
   * 0 = 18 - 24
   * 1 = 25 - 34
   * 2 = 35+
*   gym_exp
   * 0 = Less than 1 year
   * 1 = 1 - 3 years
   * 2 = 3 - 5 years
   * 3 = 5+ years
*   student
   * 0 = No
   * 1 = Yes
*   treatment
   * 0 = control (Human Expert survey)
   * 1 = treatment (AI survey)

In [3]:
!pip install pingouin
import pandas as pd
import pingouin
import statsmodels.formula.api as smf
import seaborn as sns

Collecting pingouin
  Downloading pingouin-0.5.5-py3-none-any.whl.metadata (19 kB)
Collecting pandas-flavor (from pingouin)
  Downloading pandas_flavor-0.6.0-py3-none-any.whl.metadata (6.3 kB)
Downloading pingouin-0.5.5-py3-none-any.whl (204 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m204.4/204.4 kB[0m [31m14.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pandas_flavor-0.6.0-py3-none-any.whl (7.2 kB)
Installing collected packages: pandas-flavor, pingouin
Successfully installed pandas-flavor-0.6.0 pingouin-0.5.5


In [4]:
exp = pd.read_csv("/content/exp830.csv")

In [5]:
exp.drop(columns=['Unnamed: 0'], inplace=True)

In [6]:
exp

Unnamed: 0,gender,age,gym_exp,student,program/occupation,shoulders_reasonable,shoulders_trust,shoulders_follow,eating_reasonable,eating_trust,eating_follow,rest_reasonable,rest_trust,rest_followable,treatment
0,0,0,2,1,Computer Science,2,2,2,10,10,10,7,7,5,0
1,0,1,1,0,Software Engineer\n,8,8,9,7,7,4,8,8,7,0
2,0,0,2,1,BUSINESS ANALYTIC,4,4,2,8,7,6,9,10,10,0
3,0,1,0,0,investment manager,7,6,7,6,6,7,7,6,8,0
4,1,0,2,1,EMS com,9,8,8,7,7,7,6,6,6,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
85,1,0,0,0,Teaching Fellow,8,10,2,10,10,10,10,10,10,1
86,1,1,0,1,Mph,7,5,2,10,10,10,10,10,9,1
87,0,0,1,1,Data Science,7,6,6,5,5,6,1,0,6,1
88,0,2,3,0,Psychologist,6,4,4,6,5,5,6,5,5,1


In [25]:
exp["total_reasonable"] = exp[["shoulders_reasonable", "eating_reasonable", "rest_reasonable"]].sum(axis=1)
exp["total_trust"] = exp[["shoulders_trust", "eating_trust", "rest_trust"]].sum(axis=1)
exp["total_follow"] = exp[["shoulders_follow", "eating_follow", "rest_followable"]].sum(axis=1)
exp

Unnamed: 0,gender,age,gym_exp,student,program/occupation,shoulders_reasonable,shoulders_trust,shoulders_follow,eating_reasonable,eating_trust,...,rest_reasonable,rest_trust,rest_followable,treatment,average_reasonable,average_trust,average_follow,total_reasonable,total_trust,total_follow
0,0,0,2,1,Computer Science,2,2,2,10,10,...,7,7,5,0,19,19,17,19,19,17
1,0,1,1,0,Software Engineer\n,8,8,9,7,7,...,8,8,7,0,23,23,20,23,23,20
2,0,0,2,1,BUSINESS ANALYTIC,4,4,2,8,7,...,9,10,10,0,21,21,18,21,21,18
3,0,1,0,0,investment manager,7,6,7,6,6,...,7,6,8,0,20,18,22,20,18,22
4,1,0,2,1,EMS com,9,8,8,7,7,...,6,6,6,0,22,21,21,22,21,21
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
85,1,0,0,0,Teaching Fellow,8,10,2,10,10,...,10,10,10,1,28,30,22,28,30,22
86,1,1,0,1,Mph,7,5,2,10,10,...,10,10,9,1,27,25,21,27,25,21
87,0,0,1,1,Data Science,7,6,6,5,5,...,1,0,6,1,13,11,18,13,11,18
88,0,2,3,0,Psychologist,6,4,4,6,5,...,6,5,5,1,18,14,14,18,14,14


## demographic information

# EXPERIEMENT

## 01 Checking for proper randomization

### Check the proportion

In [9]:
from statsmodels.stats.proportion import proportions_ztest
import pandas as pd

In [13]:
count = 45
nobs = 90
values = .5
stat, pval = proportions_ztest(count, nobs, value=values)
print('{0:0.3}'.format(pval))

1.0


p-value larger than 0.05, We fail to reject the null hypothesis, indicating that the data distribution aligns with the expected 1:1 ratio. Randomization is successful.

### Check if treatment and control have similar pre-experiment characteristics

In [17]:
import statsmodels.api as sm
balance_vars = ["gender", "age", "gym_exp", "student"]

regression_results = []
for col in balance_vars:
    X = exp["treatment"]
    X = sm.add_constant(X)
    y = exp[col]

    model = sm.OLS(y, X).fit()
    coef = model.params["treatment"]
    p_value = model.pvalues["treatment"]

    regression_results.append([col, coef, p_value])

# transform to DataFrame
regression_df = pd.DataFrame(regression_results, columns=["Variable", "Treatment Effect (Coef.)", "P-Value"])
regression_df

Unnamed: 0,Variable,Treatment Effect (Coef.),P-Value
0,gender,0.022222,0.841942
1,age,-0.111111,0.244704
2,gym_exp,-0.2,0.337615
3,student,-0.111111,0.200761


All P-Values are greater than 0.05, indicating that these variables are not affected by the Treatment, andomization is successful for gender, age, gym experience, and student status show no significant differences between the Treatment and Control groups.

## 02 ATE

In [31]:
# ATE for Reasonability
mean_ctr_re = exp[exp['treatment'] == 0]['total_reasonable'].mean()
mean_trt_re = exp[exp['treatment'] == 1]['total_reasonable'].mean()
print("ATE of Reasonability :",(mean_trt_re - mean_ctr_re).round(3))

# ATE for Trust
mean_ctr_tr = exp[exp['treatment'] == 0]['total_trust'].mean()
mean_trt_tr = exp[exp['treatment'] == 1]['total_trust'].mean()
print("ATE of Trust :",(mean_trt_tr - mean_ctr_tr).round(3))

# ATE for Followability
mean_ctr_fo = exp[exp['treatment'] == 0]['total_follow'].mean()
mean_trt_fo = exp[exp['treatment'] == 1]['total_follow'].mean()
print("ATE of Followability :",(mean_trt_fo - mean_ctr_fo).round(3))

ATE of Reasonability : 0.244
ATE of Trust : -0.267
ATE of Followability : -0.2


**01 Reasonability:** AI-generated recommendations are perceived as more reasonable than expert recommendations (ATE = +0.244). This may be because AI provides structured analysis, making the recommendations appear more logical to users.

**02 Trust**: Users trust human experts more than AI, as the trust score for the AI group is 0.267 lower. This could be due to AI-generated recommendations lacking personalization or users harboring doubts about AI-generated content.

**03 Followability**: Users are more willing to follow expert recommendations, as the followability score for the AI group is 0.2 lower. This may be because AI-generated advice lacks human experience and intuition, making users uncertain about its applicability.

## 03 Regression

In [32]:
%pip install stargazer

Collecting stargazer
  Downloading stargazer-0.0.7-py3-none-any.whl.metadata (6.3 kB)
Downloading stargazer-0.0.7-py3-none-any.whl (15 kB)
Installing collected packages: stargazer
Successfully installed stargazer-0.0.7


In [33]:
import statsmodels.api as sm
import numpy as np
import pandas as pd
import statsmodels.api as sm
from stargazer.stargazer import Stargazer
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
import seaborn as sns

### Reasonability

In [37]:
reg_robust_re = smf.ols('total_reasonable ~ treatment', data=exp).fit(cov_type='HC1')
Stargazer([fit, reg_robust_re])

0,1,2
,,
,Dependent variable: total_reasonable,Dependent variable: total_reasonable
,,
,(1),(2)
,,
Intercept,19.156***,19.156***
,(0.730),(0.648)
treatment,0.244,0.244
,(1.032),(1.032)
Observations,90,90


The treatment coefficient is 0.244, suggesting that AI-generated recommendations are perceived as slightly more reasonable than expert recommendations. However, the effect is not statistically significant due to the large standard error (1.032), meaning this observed difference could be due to random variation. Additionally, the R² value is extremely low (0.001), indicating that the treatment variable explains almost none of the variation in reasonability scores.

### Trust

In [45]:
reg_robust_tr = smf.ols('total_trust ~ treatment ', data=exp).fit(cov_type='HC1')
Stargazer([fit, reg_robust_tr])

0,1,2
,,
,,
,(1),(2)
,,
Intercept,19.156***,18.200***
,(0.730),(0.836)
treatment,0.244,-0.267
,(1.032),(1.247)
Observations,90,90
R2,0.001,0.001


The treatment coefficient is -0.267, meaning users tend to trust expert recommendations more than AI-generated recommendations. However, this effect is not statistically significant (standard error 1.247), so we cannot confidently conclude that AI has a negative impact on trust. The R² value remains very low (0.001), suggesting that the treatment variable alone does not meaningfully explain trust levels.

### Followability

In [47]:
reg_robust_fl = smf.ols('total_follow ~ treatment', data=exp).fit(cov_type='HC1')
Stargazer([fit, reg_robust_fl])

0,1,2
,,
,,
,(1),(2)
,,
Intercept,19.156***,17.156***
,(0.730),(0.923)
treatment,0.244,-0.200
,(1.032),(1.315)
Observations,90,90
R2,0.001,0.000


The treatment coefficient is -0.200, indicating that users are less likely to follow AI-generated recommendations compared to expert recommendations. However, the effect is not statistically significant (standard error 1.315), meaning the observed difference could be due to random chance. The R² value is nearly zero (0.000), showing that the treatment variable does not explain variation in followability scores at all.