# Exploratory Analysis

* $RQ_1$: How does temporal proximity affect sharing behaviour during protest demonstrations?
* $RQ_2$: How do source attributes like the poster's gender affect sharing behaviour during protest demonstrations?
* $RQ_3$: Is there evidence of a "backfire effect" where misleading posts that *don't* have a credibility indicator are assumed to be true and more likely to be reshared.

In [29]:
library(lme4)
library(e1071) 
library(parameters)

data <- read.csv("../../data/processed/60b37265a9f60881975de69e-rumour-results.csv")
data$reshared <- as.integer(as.logical(data$reshared))
data$code = relevel(as.factor(data$code), ref = "neutral") 
data$evidence = relevel(as.factor(data$evidence), ref = "high") 
head(data)

Unnamed: 0_level_0,user_id,condition,timeSubmitted,secondsTaken,id,rumour,code,evidence,warning,reshared,clickedWarning,timestamp,posterGender,posterId,misleading,untagged
Unnamed: 0_level_1,<dbl>,<chr>,<chr>,<int>,<dbl>,<chr>,<fct>,<fct>,<chr>,<int>,<chr>,<int>,<chr>,<chr>,<chr>,<chr>
1,6.444755e+18,treatment,2021-06-03 09:50:15.953000+00:00,120,1.283849e+18,R1,neutral,high,False,0,False,3,male,RnJlZGVyaWNrIFdvb2RodHRwczovL3JhbmRvbXVzZXIubWUvYXBpL3BvcnRyYWl0cy90aHVtYi9tZW4vMTkuanBn,False,False
2,6.444755e+18,treatment,2021-06-03 09:50:15.953000+00:00,120,1.283767e+18,R1,neutral,high,False,0,False,4,female,S2F0aWUgTXVycGh5aHR0cHM6Ly9yYW5kb211c2VyLm1lL2FwaS9wb3J0cmFpdHMvdGh1bWIvd29tZW4vNjQuanBn,False,False
3,6.444755e+18,treatment,2021-06-03 09:50:15.953000+00:00,120,1.283508e+18,R1,denies,high,True,0,False,11,male,R2FyeSBIb3BraW5zaHR0cHM6Ly9yYW5kb211c2VyLm1lL2FwaS9wb3J0cmFpdHMvdGh1bWIvbWVuLzc4LmpwZw==,True,False
4,6.444755e+18,treatment,2021-06-03 09:50:15.953000+00:00,120,1.283371e+18,R1,affirms,high,False,0,False,11,female,V2lsbGllIENsYXJraHR0cHM6Ly9yYW5kb211c2VyLm1lL2FwaS9wb3J0cmFpdHMvdGh1bWIvd29tZW4vNy5qcGc=,False,False
5,6.444755e+18,treatment,2021-06-03 09:50:15.953000+00:00,120,1.283538e+18,R1,affirms,high,False,1,False,13,female,TWVsaW5kYSBCYXJyZXR0aHR0cHM6Ly9yYW5kb211c2VyLm1lL2FwaS9wb3J0cmFpdHMvdGh1bWIvd29tZW4vNjIuanBn,False,False
6,6.444755e+18,treatment,2021-06-03 09:50:15.953000+00:00,120,1.283492e+18,R1,questions,high,False,1,False,14,male,TG9ubmllIE1pbGVzaHR0cHM6Ly9yYW5kb211c2VyLm1lL2FwaS9wb3J0cmFpdHMvdGh1bWIvbWVuLzEwLmpwZw==,False,False


### Binomial Logistic regression

Alt add an interaction between warning and evidence (if we expect there to be different effects for putting a cred indicator on a post denying a high evidence rumour vs affirming a low evidence rumour.

* $RQ_1$: How does temporal proximity affect sharing behaviour during protest demonstrations?
* $RQ_2$: How do source attributes like the poster's gender affect sharing behaviour during protest demonstrations?
* $RQ_3$: Is there evidence of a "backfire effect" where misleading posts that *don't* have a credibility indicator are assumed to be true and more likely to be reshared.

> Our outcome variable was the intent to share the article corresponding to a headline with friends on social media. Since the outcome variable is a binary choice between the intent to share or not share, we employed Binomial Logistic Regression (BLR). The BLR incorporated participant ID and headline ID as random effects to account for repeated measures for varying items [5, 68]. Our independent variables included the treatment, headline category, political affiliation, social media use, and common demographic factors, such as age and gender. (Yaqub)


What I write:

> The outcome variable was the decision to share the post discussing one of the two rumours in the mock social media environment. Since the outcome variable is a binary choice between the choice to share or not share (56 observations per participant), I employed Binomial Logistic Regression (BLR). The BLR incorporated the participant ID and post ID (unique ID from the Twitter API) as random effects to account for repeated measures for varying items and to measure the proportion of variance explained by post attributes. Our independent variables included the post's evidence level (high/low), the post's coding (affirms/denies/neutral/questions) and their interaction. In addition, I included the ''source gender'' and timestamp randomly assigned to each post. Similar to Pennycook (implied truth), I additionally include whether the post had a credibility indicator attached to it whether the post went ``untagged'' indicating that it was misleading but did not have a warning.


In [30]:
md <- glmer(reshared ~ posterGender + timestamp + warning + code * evidence + untagged + (1 | user_id) + (1 | id),
            data = data,
            family = binomial, 
            control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5))
           )
summary(md)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
    untagged + (1 | user_id) + (1 | id)
   Data: data
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))

     AIC      BIC   logLik deviance df.resid 
  4938.0   5033.7  -2455.0   4910.0     6874 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.8314 -0.3941 -0.2286 -0.1063 11.6068 

Random effects:
 Groups  Name        Variance Std.Dev.
 id      (Intercept) 0.6139   0.7835  
 user_id (Intercept) 1.9114   1.3825  
Number of obs: 6888, groups:  id, 167; user_id, 125

Fixed effects:
                           Estimate Std. Error z value Pr(>|z|)    
(Intercept)               -1.744585   0.274022  -6.367 1.93e-10 ***
posterGendermale           0.014123   0.076206   0.185  0.85297    
timestamp                 -0.023806   0.002306 -10.324  < 2e-16 ***
codeaffirms               -0.065352   0.298341  -0.219  0.82661    
codedenies