# System Level Analysis

Validate assumptions for mixed-effects model with [this](https://ademos.people.uic.edu/Chapter18.html).

The presence of credibility indicators in the system will:

- $H_1$: *decrease* the *affirmation rate* for rumours with lower evidence levels.
- $H_2$: *increase* the *denial rate* for rumours with lower evidence levels.
- $H_3$: *increase* the *affirmation rate* for rumours with higher evidence levels.
- $H_4$: *decrease* the *denial rate* for rumours with higher evidence levels.

**or if focusing just on rumour proportions**

The presence of credibility indicators in the system will:

- $H_{1b}$: increase the rumour proportion for high evidence rumours
- $H_{2b}$: decrease the rumour proportion for low evidence rumours

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns

sns.set_style("whitegrid",{'axes.spines.left' : False,
                           'axes.spines.right': False,
                           'axes.spines.top': False,
                           'grid.linestyle': ':'})
sns.set_context("talk")
participants = pd.read_csv("../data/processed/60b37265a9f60881975de69e-participants.csv",index_col=0)
posts = pd.read_csv("../data/processed/60b37265a9f60881975de69e-rumour-results.csv",index_col=0)
reshare_rates = pd.read_csv("../data/processed/60b37265a9f60881975de69e-reshare_rates.csv")
print("Shape: {}x{}".format(*reshare_rates.shape))
reshare_rates.head()

Shape: 165x49


Unnamed: 0,user_id,consent,timeSubmitted,educationLevel,politicalAffiliation,attendsProtests,age,gender_man,gender_nonBinary,gender_woman,...,socialMedias_snapchat,socialMedias_tiktok,socialMedias_twitter,condition,evidence,Affirms,Denies,Neutral,Questions,rumour_proportion
0,-1091051618337871317,1,2021-06-03 14:35:53.721000+00:00,undergrad,centre,False,22,0,0,1,...,1,1,1,Treatment,High,0.1,0.0,0.0,0.25,1.0
1,-1091051618337871317,1,2021-06-03 14:35:53.721000+00:00,undergrad,centre,False,22,0,0,1,...,1,1,1,Treatment,Low,0.1,0.1,0.5,0.5,0.5
2,-1363632287012916703,1,2021-06-03 14:35:28.980000+00:00,highSchool,left,False,31,1,0,0,...,1,1,0,Treatment,High,0.2,0.2,0.25,0.5,0.5
3,-1363632287012916703,1,2021-06-03 14:35:28.980000+00:00,highSchool,left,False,31,1,0,0,...,1,1,0,Treatment,Low,0.3,0.6,0.25,0.0,0.333333
4,-1782714510727608207,1,2021-06-03 14:34:50.118000+00:00,graduateSchool,centreRight,False,36,1,0,0,...,0,0,1,Control,High,0.5,0.0,0.25,0.0,1.0


## Hypothesis Testing

**Two-/Three-way ANOVA**

- check if there exists statistically significant differences in the reshare rates in the treatment in control groups (by evidence level and post-code)
- To simplify limit results to analysis of affirmations (most important for the rumour framework)

Done following [this](https://www.reneshbedre.com/blog/anova.html) tutorial

In [3]:
from pingouin import mixed_anova

reshare_rates.mixed_anova(dv='Affirms',
                          between='condition',
                          within='evidence',
                          subject='user_id')

Unnamed: 0,Source,SS,DF1,DF2,MS,F,p-unc,np2,eps
0,condition,0.013077,1,84,0.013077,0.292477,0.590069,0.00347,
1,evidence,0.079484,1,84,0.079484,7.80041,0.006467,0.084971,1.0
2,Interaction,0.004994,1,84,0.004994,0.490089,0.485822,0.005801,


In [4]:
reshare_rates.mixed_anova(dv='rumour_proportion',
                          between='condition',
                          within='evidence',
                          subject='user_id')

Unnamed: 0,Source,SS,DF1,DF2,MS,F,p-unc,np2,eps
0,condition,0.20114,1,74,0.20114,1.298453,0.2581711,0.017244,
1,evidence,4.921391,1,74,4.921391,64.327434,1.185034e-11,0.465037,1.0
2,Interaction,0.026244,1,74,0.026244,0.343029,0.5598679,0.004614,


### Regression Framework

Regress the dependent variable (reshare rate or rumour proportion) as a function of the treatment ($x_t$), the rumours' evidence levels ($x_e$) and their interaction ($x_tx_e$). Since there are two repeated measures for each participant add a random effect on the participant ID ($u_j$)

$y = \beta_0 + \beta_1x_t + \beta_2x_e + \beta_3x_tx_e + u_j + \epsilon$

$u_j \thicksim N(0,\sigma^2_u)$

$e \thicksim N(0,\sigma^2_e)$

In [5]:
# Control variables
affil_columns = " + ".join([f"C({i})" for i in reshare_rates.columns if "affiliatedMovements" in i and not "none" in i])
sm_cols = " + ".join([f"C({i})" for i in reshare_rates.columns if "gender" in i and not "none" in i])
gender_cols = " + ".join([f"C({i})" for i in reshare_rates.columns if "socialMedias" in i and not "none" in i])
controls = f"age + C(educationLevel) + {affil_columns} + {sm_cols} + {gender_cols}"
controls[:80]+"..."

'age + C(educationLevel) + C(affiliatedMovements_climate) + C(affiliatedMovements...'