# System Level Analysis

Validate assumptions for mixed-effects model with [this](https://ademos.people.uic.edu/Chapter18.html).

The presence of credibility indicators in the system will:

- $H_1$: *decrease* the *affirmation rate* for rumours with lower evidence levels.
- $H_2$: *increase* the *denial rate* for rumours with lower evidence levels.
- $H_3$: *increase* the *affirmation rate* for rumours with higher evidence levels.
- $H_4$: *decrease* the *denial rate* for rumours with higher evidence levels.

**or if focusing just on rumour proportions**

The presence of credibility indicators in the system will:

- $H_{1b}$: increase the rumour proportion for high evidence rumours
- $H_{2b}$: decrease the rumour proportion for low evidence rumours

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns

sns.set_style("whitegrid",{'axes.spines.left' : False,
                           'axes.spines.right': False,
                           'axes.spines.top': False,
                           'grid.linestyle': ':'})
sns.set_context("talk")
participants = pd.read_csv("../data/processed/607d6b3929befce813fe5ba2-participants.csv",index_col=0)
posts = pd.read_csv("../data/processed/607d6b3929befce813fe5ba2-rumour-results.csv",index_col=0)
reshare_rates = pd.read_csv("../data/processed/607d6b3929befce813fe5ba2-reshare_rates.csv")
print("Shape: {}x{}".format(*reshare_rates.shape))
reshare_rates.head()

Shape: 16x32


Unnamed: 0,user_id,consent,timeSubmitted,educationLevel,attendsProtests,politicalAffiliation,age,gender_man,gender_woman,affiliatedMovements_climate,...,socialMedias_snapchat,socialMedias_tiktok,socialMedias_twitter,condition,evidence,Affirms,Denies,Neutral,Questions,rumour_proportion
0,-7159046693702464628,1,2021-05-24 11:30:25.884000+00:00,highSchool,False,centreRight,28,1,0,1,...,0,0,0,Control,High,0.0,0.0,0.0,0.0,
1,-7159046693702464628,1,2021-05-24 11:30:25.884000+00:00,highSchool,False,centreRight,28,1,0,1,...,0,0,0,Control,Low,0.0,0.0,0.0,0.0,
2,-6619931360310195064,1,2021-05-24 12:17:00.336000+00:00,highSchool,0,centreLeft,19,1,0,1,...,0,0,1,Treatment,High,0.1,0.0,0.0,0.5,1.0
3,-6619931360310195064,1,2021-05-24 12:17:00.336000+00:00,highSchool,0,centreLeft,19,1,0,1,...,0,0,1,Treatment,Low,0.1,0.1,0.0,0.0,0.5
4,-6058563518878514163,1,2021-05-24 11:19:59.690000+00:00,communityCollege,False,centre,50,1,0,0,...,1,1,0,Control,High,0.0,0.2,0.0,0.0,0.0


## Hypothesis Testing

**Two-/Three-way ANOVA**

- check if there exists statistically significant differences in the reshare rates in the treatment in control groups (by evidence level and post-code)
- To simplify limit results to analysis of affirmations (most important for the rumour framework)

Done following [this](https://www.reneshbedre.com/blog/anova.html) tutorial

In [3]:
from pingouin import mixed_anova

reshare_rates.mixed_anova(dv='Affirms',
                          between='condition',
                          within='evidence',
                          subject='user_id')

Unnamed: 0,Source,SS,DF1,DF2,MS,F,p-unc,np2,eps
0,condition,0.0375,1,6,0.0375,1.25,0.306309,0.172414,
1,evidence,0.0225,1,6,0.0225,0.733696,0.424571,0.108959,1.0
2,Interaction,0.0135,1,6,0.0135,0.440217,0.531668,0.068354,


In [4]:
reshare_rates.mixed_anova(dv='rumour_proportion',
                          between='condition',
                          within='evidence',
                          subject='user_id')

Unnamed: 0,Source,SS,DF1,DF2,MS,F,p-unc,np2,eps
0,condition,0.163333,1,4,0.163333,2.35514,0.19966,0.370588,
1,evidence,0.717037,1,4,0.717037,4.453134,0.102462,0.526803,1.0
2,Interaction,0.013333,1,4,0.013333,0.082806,0.787823,0.020282,


### Regression Framework

Regress the dependent variable (reshare rate or rumour proportion) as a function of the treatment ($x_t$), the rumours' evidence levels ($x_e$) and their interaction ($x_tx_e$). Since there are two repeated measures for each participant add a random effect on the participant ID ($u_j$)

$y = \beta_0 + \beta_1x_t + \beta_2x_e + \beta_3x_tx_e + u_j + \epsilon$

$u_j \thicksim N(0,\sigma^2_u)$

$e \thicksim N(0,\sigma^2_e)$

In [5]:
# Control variables
affil_columns = " + ".join([f"C({i})" for i in reshare_rates.columns if "affiliatedMovements" in i and not "none" in i])
sm_cols = " + ".join([f"C({i})" for i in reshare_rates.columns if "gender" in i and not "none" in i])
gender_cols = " + ".join([f"C({i})" for i in reshare_rates.columns if "socialMedias" in i and not "none" in i])
controls = f"age + C(educationLevel) + {affil_columns} + {sm_cols} + {gender_cols}"
controls[:80]+"..."

'age + C(educationLevel) + C(affiliatedMovements_climate) + C(affiliatedMovements...'