In [1]:
#Dependencies
import pandas as pd
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
import seaborn as sns


In [3]:
df = pd.read_excel("Exp2_AnalysisData.xlsx")

df.head()
df.info()
print(df.groupby(['Patient_ID', 'Donation_Condition']).size())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 108 entries, 0 to 107
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Patient_ID          108 non-null    object 
 1   Donation_Condition  108 non-null    object 
 2   Trial               108 non-null    int64  
 3   Risk_Score          108 non-null    float64
 4   Parameter_Score     108 non-null    float64
dtypes: float64(2), int64(1), object(2)
memory usage: 4.3+ KB
Patient_ID  Donation_Condition
P00         Always                12
            Never                 12
            Sometimes             12
P01         Always                12
            Never                 12
            Sometimes             12
P02         Always                12
            Never                 12
            Sometimes             12
dtype: int64


In [9]:
#Correlations 

from scipy.stats import pearsonr, spearmanr

mapping = {"Never": 0, "Sometimes": 1, "Always": 2}
df["Donation_Encoded"] = df["Donation_Condition"].map(mapping)

# Pearson
pearson_corr, pearson_p = pearsonr(df["Donation_Encoded"], df["Risk_Score"])
print(f"Pearson correlation: r = {pearson_corr:.3f}, p = {pearson_p:.3f}")

# Spearman
spearman_corr, spearman_p = spearmanr(df["Donation_Encoded"], df["Risk_Score"])
print(f"Spearman correlation: r = {spearman_corr:.3f}, p = {spearman_p:.3f}")


Pearson correlation: r = -0.157, p = 0.105
Spearman correlation: r = -0.178, p = 0.065


As donation habits increase from never to always, the predicted risk scores slightly decrease on average, so the LLM might be slihgtly assigning lower risk scores to patients with more altrustic behavior, but the p-values indicate that it isnt significant. However, while one variable may not result in a significant alteration to the predicted risk score, the combination of multiple may still affect it significantly, which I believe we are seeing.

**Mixed-Effects Model**

In [4]:
import statsmodels.formula.api as smf

model = smf.mixedlm(
    "Risk_Score ~ Donation_Condition",
    df,
    groups=df["Patient_ID"]
)
result = model.fit()
print(result.summary())


                 Mixed Linear Model Regression Results
Model:                  MixedLM      Dependent Variable:      Risk_Score
No. Observations:       108          Method:                  REML      
No. Groups:             3            Scale:                   0.9173    
Min. group size:        36           Log-Likelihood:          -155.5696 
Max. group size:        36           Converged:               Yes       
Mean group size:        36.0                                            
------------------------------------------------------------------------
                                Coef. Std.Err.   z   P>|z| [0.025 0.975]
------------------------------------------------------------------------
Intercept                       4.894    1.629 3.004 0.003  1.701  8.087
Donation_Condition[T.Never]     0.975    0.226 4.319 0.000  0.533  1.417
Donation_Condition[T.Sometimes] 1.256    0.226 5.562 0.000  0.813  1.698
Group Var                       7.885    8.338                       

The LLM systematically changes its predicted risk based on the Donation Habit variable even though this feature is medically irrelevant

Sometimes donates on average has +1.256 higher risk score and never has +0.975 points higher risk and both of these are statistically significant. So, the model seems a little biased towards giving lowered risk scores for morally good attributes.


Mixed Anova?