# Exploratory Analysis

* $RQ_1$: How does temporal proximity affect sharing behaviour during protest demonstrations?
* $RQ_2$: How do source attributes like the poster's gender affect sharing behaviour during protest demonstrations?
* $RQ_3$: Is there evidence of a "backfire effect" where misleading posts that *don't* have a credibility indicator are assumed to be true and more likely to be reshared.
* $RQ_4$: Does the treatment effectiveness vary by political affiliation?

In [1]:
library(lme4)
library(AER)
library(dplyr)
library(purrr)
library(magrittr)
library(ggplot2)
require(ggiraph)
require(plyr)
require(moonBook)   # for use of data radial
require(GGally)
require(reshape2)
require(compiler)
require(parallel)
require(boot)
require(lattice)

data <- read.csv("../../data/processed/60b37265a9f60881975de69e-rumour-results.csv")
data$reshared <- as.integer(as.logical(data$reshared))
head(data)

Loading required package: Matrix

Loading required package: car

Loading required package: carData

Registered S3 methods overwritten by 'car':
  method                          from
  influence.merMod                lme4
  cooks.distance.influence.merMod lme4
  dfbeta.influence.merMod         lme4
  dfbetas.influence.merMod        lme4

Loading required package: lmtest

Loading required package: zoo


Attaching package: ‘zoo’


The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric


Loading required package: sandwich

Loading required package: survival


Attaching package: ‘dplyr’


The following object is masked from ‘package:car’:

    recode


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union



Attaching package: ‘purrr’


The following object is masked from ‘package:car’:

    some



Attaching package: ‘magrittr’


The following obj

Unnamed: 0_level_0,user_id,condition,timeSubmitted,secondsTaken,id,rumour,code,evidence,warning,reshared,clickedWarning,timestamp,posterGender,posterId
Unnamed: 0_level_1,<dbl>,<chr>,<chr>,<int>,<dbl>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>,<int>,<chr>,<chr>
1,1.525195e+18,treatment,2021-06-03 09:50:15.953000+00:00,120,1.283849e+18,R1,neutral,high,False,0,False,3,male,RnJlZGVyaWNrIFdvb2RodHRwczovL3JhbmRvbXVzZXIubWUvYXBpL3BvcnRyYWl0cy90aHVtYi9tZW4vMTkuanBn
2,1.525195e+18,treatment,2021-06-03 09:50:15.953000+00:00,120,1.283767e+18,R1,neutral,high,False,0,False,4,female,S2F0aWUgTXVycGh5aHR0cHM6Ly9yYW5kb211c2VyLm1lL2FwaS9wb3J0cmFpdHMvdGh1bWIvd29tZW4vNjQuanBn
3,1.525195e+18,treatment,2021-06-03 09:50:15.953000+00:00,120,1.283508e+18,R1,denies,high,True,0,False,11,male,R2FyeSBIb3BraW5zaHR0cHM6Ly9yYW5kb211c2VyLm1lL2FwaS9wb3J0cmFpdHMvdGh1bWIvbWVuLzc4LmpwZw==
4,1.525195e+18,treatment,2021-06-03 09:50:15.953000+00:00,120,1.283371e+18,R1,affirms,high,False,0,False,11,female,V2lsbGllIENsYXJraHR0cHM6Ly9yYW5kb211c2VyLm1lL2FwaS9wb3J0cmFpdHMvdGh1bWIvd29tZW4vNy5qcGc=
5,1.525195e+18,treatment,2021-06-03 09:50:15.953000+00:00,120,1.283538e+18,R1,affirms,high,False,1,False,13,female,TWVsaW5kYSBCYXJyZXR0aHR0cHM6Ly9yYW5kb211c2VyLm1lL2FwaS9wb3J0cmFpdHMvdGh1bWIvd29tZW4vNjIuanBn
6,1.525195e+18,treatment,2021-06-03 09:50:15.953000+00:00,120,1.283492e+18,R1,questions,high,False,1,False,14,male,TG9ubmllIE1pbGVzaHR0cHM6Ly9yYW5kb211c2VyLm1lL2FwaS9wb3J0cmFpdHMvdGh1bWIvbWVuLzEwLmpwZw==


### Logistic regression

Alt add an interaction between warning and evidence (if we expect there to be different effects for putting a cred indicator on a post denying a high evidence rumour vs affirming a low evidence rumour.

* $RQ_1$: How does temporal proximity affect sharing behaviour during protest demonstrations?
* $RQ_2$: How do source attributes like the poster's gender affect sharing behaviour during protest demonstrations?

> Our outcome variable was the intent to share the article corresponding to a headline with friends on social media. Since the outcome variable is a binary choice between the intent to share or not share, we employed Binomial Logistic Regression (BLR). The BLR incorporated participant ID and headline ID as random effects to account for repeated measures for varying items [5, 68]. Our independent variables included the treat- ment, headline category, political affiliation, social media use, and common demographic factors, such as age and gender. (Yaqub)

^ I do the same but incorporating the participant ID, source ID, and post ID to account for repeated measures, and to measure the proportion of variance explained by source and post attributes

In [2]:
md <- glmer(reshared ~ posterGender + timestamp + warning + code * evidence + (1 + condition | user_id) + (1 | posterId) + (1 | id),
            data = data,
            family = binomial, 
            control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5))
           )
summary(md)

boundary (singular) fit: see ?isSingular



Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
    (1 + condition | user_id) + (1 | posterId) + (1 | id)
   Data: data
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))

     AIC      BIC   logLik deviance df.resid 
  3925.2   4031.0  -1946.6   3893.2     5500 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.9846 -0.3900 -0.2348 -0.1226  9.7808 

Random effects:
 Groups   Name               Variance Std.Dev. Corr 
 posterId (Intercept)        0.0000   0.0000        
 id       (Intercept)        0.6686   0.8177        
 user_id  (Intercept)        1.5291   1.2366        
          conditiontreatment 1.0985   1.0481   -0.49
Number of obs: 5516, groups:  posterId, 5515; id, 167; user_id, 100

Fixed effects:
                          Estimate Std. Error z value Pr(>|z|)    
(Intercept)               -1.80900    0.25237  -7.168 7.61e-13 ***
posterGendermale           0.07714

### $RQ_4$

In [3]:
library(lme4)

reshare_rates <- read.csv("../../data/processed/60b37265a9f60881975de69e-reshare_rates.csv")
md.affirms <- lmer(Affirms ~ politicalAffiliation * condition * evidence + (1 | user_id), data = reshare_rates)
summary(md.affirms)

fixed-effect model matrix is rank deficient so dropping 1 column / coefficient


Correlation matrix not shown by default, as p = 23 > 12.
Use print(obj, correlation=TRUE)  or
    vcov(obj)        if you need it




Linear mixed model fit by REML ['lmerMod']
Formula: Affirms ~ politicalAffiliation * condition * evidence + (1 |  
    user_id)
   Data: reshare_rates

REML criterion at convergence: -87.5

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.59088 -0.50191 -0.08972  0.30773  2.31990 

Random effects:
 Groups   Name        Variance Std.Dev.
 user_id  (Intercept) 0.01551  0.1245  
 Residual             0.01294  0.1138  
Number of obs: 165, groups:  user_id, 89

Fixed effects:
                                                               Estimate
(Intercept)                                                     0.14088
politicalAffiliationcentreLeft                                  0.01530
politicalAffiliationcentreRight                                 0.13412
politicalAffiliationleft                                        0.01626
politicalAffiliationnone                                        0.13412
politicalAffiliationright                                      -0.14088
c

In [4]:
coefs <- data.frame(coef(summary(md.affirms)))
# get Satterthwaite-approximated degrees of freedom
coefs$df.Satt <- coef(summary(md.affirms))
# get approximate p-values
coefs$p.Satt <- coef(summary(md.affirms))[, 2]
coefs

Unnamed: 0_level_0,Estimate,Std..Error,t.value,df.Satt,p.Satt
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,"<dbl[,3]>",<dbl>
(Intercept),0.14088446,0.1100623,1.2800433,"0.14088446, 0.1100623, 1.2800433",0.1100623
politicalAffiliationcentreLeft,0.01529923,0.1161102,0.1317647,"0.01529923, 0.1161102, 0.1317647",0.1161102
politicalAffiliationcentreRight,0.13411554,0.1386634,0.9672022,"0.13411554, 0.1386634, 0.9672022",0.1386634
politicalAffiliationleft,0.01625839,0.1189379,0.1366965,"0.01625839, 0.1189379, 0.1366965",0.1189379
politicalAffiliationnone,0.13411554,0.1386634,0.9672022,"0.13411554, 0.1386634, 0.9672022",0.1386634
politicalAffiliationright,-0.14088446,0.2014176,-0.6994646,"-0.14088446, 0.2014176, -0.6994646",0.2014176
conditionTreatment,0.13725719,0.129054,1.0635643,"0.13725719, 0.1290540, 1.0635643",0.129054
evidenceLow,-0.06588446,0.1071058,-0.6151342,"-0.06588446, 0.1071058, -0.6151342",0.1071058
politicalAffiliationcentreLeft:conditionTreatment,-0.19003975,0.1452287,-1.3085549,"-0.19003975, 0.1452287, -1.3085549",0.1452287
politicalAffiliationcentreRight:conditionTreatment,-0.25774457,0.1792385,-1.4379982,"-0.25774457, 0.1792385, -1.4379982",0.1792385


In [5]:
library(effects)
plot(allEffects(md.affirms), multiline=TRUE, ci.style="bars")

Use the command
    lattice::trellis.par.set(effectsTheme())
  to customize lattice options for effects plots.
See ?effectsTheme for details.



ERROR: Error in mod.matrix %*% scoef: non-conformable arguments


In [None]:
md.denies <- lmer(Denies ~ politicalAffiliation * condition * evidence + (1 | user_id), data = reshare_rates)
summary(md.denies)

In [None]:
coefs <- data.frame(coef(summary(md.denies)))
# get Satterthwaite-approximated degrees of freedom
coefs$df.Satt <- coef(summary(md.denies))
# get approximate p-values
coefs$p.Satt <- coef(summary(md.denies))[, 2]
coefs

In [None]:
plot(allEffects(md.denies), multiline=TRUE, ci.style="bars")