# System Level Analysis

The presence of credibility indicators in the system will:

- $H_1$: *decrease* the *affirmation rate* for rumours with lower evidence levels.
- $H_2$: *increase* the *denial rate* for rumours with lower evidence levels.
- $H_3$: *increase* the *affirmation rate* for rumours with higher evidence levels.
- $H_4$: *decrease* the *denial rate* for rumours with higher evidence levels.

**or if focusing just on rumour proportions**

The presence of credibility indicators in the system will:

- $H_{1b}$: increase the rumour proportion for high evidence rumours
- $H_{2b}$: decrease the rumour proportion for low evidence rumours

In [3]:
import pandas as pd
import numpy as np
import seaborn as sns

sns.set_style("whitegrid",{'axes.spines.left' : False,
                           'axes.spines.right': False,
                           'axes.spines.top': False,
                           'grid.linestyle': ':'})
sns.set_context("talk")
participants = pd.read_csv("../data/processed/mock_data/participant-schema.csv",index_col=0)
posts = pd.read_csv("../data/processed/mock_data/posts-schema.csv",index_col=0)
reshare_rates = pd.read_csv("../data/processed/mock_data/reshare_rates.csv")
print("Shape: {}x{}".format(*reshare_rates.shape))
reshare_rates.head()

Shape: 790x34


Unnamed: 0,PROLIFIC_ID,STUDY_ID,SESSION_ID,age,educationLevel,politicalAffiliation,attendsProtests,gender_female,gender_intersex,gender_male,...,socialMedias_tiktok,socialMedias_twitter,PROLIFIC_ID.1,treatment,evidence,Affirms,Denies,Neutral,Questions,rumour_proportion
0,1,1,79,146.0,,right,0.0,0,0,1,...,0,1,1,Treatment,High,0.8,0.5,0.0,0.5,0.8
1,1,1,79,146.0,,right,0.0,0,0,1,...,0,1,1,Treatment,Low,0.25,1.0,0.0,0.333333,0.25
2,2,1,28,100.0,,centreRight,0.0,1,0,0,...,0,1,2,Treatment,High,0.333333,0.75,0.0,0.0,0.25
3,2,1,28,100.0,,centreRight,0.0,1,0,0,...,0,1,2,Treatment,Low,0.0,1.0,1.0,0.0,0.0
4,3,1,32,64.0,communityCollege,centreLeft,1.0,0,1,1,...,0,1,3,Treatment,High,0.5,0.0,0.0,1.0,1.0


## Hypothesis Testing

**Two-/Three-way ANOVA**

- check if there exists statistically significant differences in the reshare rates in the treatment in control groups (by evidence level and post-code)
- To simplify limit results to analysis of affirmations (most important for the rumour framework)

Done following [this](https://www.reneshbedre.com/blog/anova.html) tutorial

In [5]:
from pingouin import mixed_anova

reshare_rates.mixed_anova(dv='Affirms',
                          between='treatment',
                          within='evidence',
                          subject='PROLIFIC_ID')

Unnamed: 0,Source,SS,DF1,DF2,MS,F,p-unc,np2,eps
0,treatment,0.001603,1,393,0.001603,0.017718,0.8941745,4.5e-05,
1,evidence,8.570643,1,393,8.570643,105.88687,3.782567e-22,0.212246,1.0
2,Interaction,2.612034,1,393,2.612034,32.27063,2.613915e-08,0.075883,


In [6]:
reshare_rates.mixed_anova(dv='rumour_proportion',
                          between='treatment',
                          within='evidence',
                          subject='PROLIFIC_ID')

Unnamed: 0,Source,SS,DF1,DF2,MS,F,p-unc,np2,eps
0,treatment,0.001112,1,360,0.001112,0.015491,0.9010174,4.3e-05,
1,evidence,7.723283,1,360,7.723283,103.713471,1.426024e-21,0.223659,1.0
2,Interaction,2.120741,1,360,2.120741,28.47875,1.679656e-07,0.073308,


### Regression Framework

Regress the dependent variable (reshare rate or rumour proportion) as a function of the treatment ($x_t$), the rumours' evidence levels ($x_e$) and their interaction ($x_tx_e$). Since there are two repeated measures for each participant add a random effect on the participant ID ($u_j$)

$y = \beta_0 + \beta_1x_t + \beta_2x_e + \beta_3x_tx_e + u_j + \epsilon$

$u_j \thicksim N(0,\sigma^2_u)$

$e \thicksim N(0,\sigma^2_e)$

In [13]:
# Control variables
affil_columns = " + ".join([f"C({i})" for i in reshare_rates.columns if "affiliatedMovements" in i and not "none" in i])
sm_cols = " + ".join([f"C({i})" for i in reshare_rates.columns if "gender" in i and not "none" in i])
gender_cols = " + ".join([f"C({i})" for i in reshare_rates.columns if "socialMedias" in i and not "none" in i])
controls = f"age + C(educationLevel) + {affil_columns} + {sm_cols} + {gender_cols}"
controls[:80]+"..."

'age + C(educationLevel) + C(affiliatedMovements_climate) + C(affiliatedMovements...'