The Geopolitics of Deplatforming: A Study of Suspensions of Politically-Interested Iranian Accounts on Twitter
This repository contains the replication material for the paper "The Geopolitics of Deplatforming: A Study of Suspensions of Politically-Interested Iranian Accounts on Twitter", by Andreu Casas, to be published at Political Communication.
The ./data/
directory contains the necessary data to replicate the analytical figures and tables of the paper. Below, I describe each of the datasets in this directory:
accuracy-5fold-hateful.csv
: contains data about the performance of the multilingual BERT model fine-tuned for predicting hateful tweets. In this and the following two files,epoch
variable provides info about the training epoch with the lowest training loss,precision
about the % of model predictions that were correct,recall
about the % of true positives correctly predicted by the model,fscore
about an harmonized average of the precision and recall,accuracy
about the overall % of correct predictions, andfold
about training fold.accuracy-5fold-political.csv
: contains data about the performance of the multilingual BERT model fine-tuned for predicting political tweets.accuracy-5fold-proirangov.csv
: contains data about the performance of the multilingual BERT model fine-tuned for tweets in favor of the Iranian government.elite-twitter-handles.csv
: contains information about the Iranian elites used in the paper.Name of official/organization
provides the name of the politician or media organization,Twitter handle
contains the handle for those with a Twitter account (blank otherwise),Official position
reports the official position for politicians (or indicates whether this is a media organization/account),Faction
reports the political faction of the politician (blank if unknown or if media), andPolitical affiliation
reports the higher-level political affiliation of the politician (blank if unkown or if media). These dataset contains 179 elites for which a Twitter handle was identified. However, three of them were excluded from the analysis because they were protected and key information such as their list of followers was inaccessible.elite-accounts-ideo-scores.csv
: contains ideology estimates (pe
) and 95% confidence interval (lwr
&upr
) for the elite accounts included in the analysis (twitter
column contains the Twitter handle for the elites).elite-freq-diff-suspended-nonsuspended.csv
: contains information about the proportion of suspended and non-suspended users that follow each of the elite accounts used in the paper.elite
is the Twitter handle of the elite,nonsuspeded
is the proportion of non-suspended users that follow that elite,suspended
is the proportion of suspended users that follow that elite,diff
is the difference between the suspended and non-suspended proportions.hash-freq-diff-suspended-nonsuspended.csv
: contains information about the proportion of suspended and non-suspended users that used each unique hashtag in the dataset in at least 1 of her/his tweets in 2020.hashtag
is the hashtag,prop_nonsuspended
is the proportion of non-suspended users that used that hashtag at least once,prop_suspended
is the proportion of suspended users that used that hashtag at least once,dif
is the difference betweenprop_suspended
andprop_nonsuspended
.stopped-existing-LABELED.csv
: contains information about when we detected users not being active anymore.user_id_anon
is a new id given to each user for pseudonymization purposes,stop_existing
is variable in the original MySQL table indicating non-active accounts (this is constant in this dataset, all rows = 1),tstamp
is the exact date-time we identified an account as being no longer active,status
indicates whether identified account is not active because suspended by Twitter (suspended
), we simply know that it doesn't exists and so don't know for sure whether deleted by Twitter or the user (no exists
), whether the account is back to being active (exists
), or the account has been moved to being private (restricted
).model-data-anon.csv
: this is the dataset used to estimate the statistical models reported in the paper. A detailed description of these variables is available in Appendix B of the paper.
The ./code/
directory contains separate scripts to replicate each analytical figure in the article. The ./figures/
directory contains a copy of each of the figures generated by these scripts. Here a list and explanation for a few tables for which replication material is not provided:
Tables C1, C2, and C3 in Appendix C
: in these I provide some example of tweets manually coded as true positives and true negatives for each of the 3 machine learning classifiers used in the paper. These are not the result of any analysis -- I simply picked a few illustrative examples from the population of manually annotated tweets.Table D1
in Appendix D: in this table I report the list of keywords I used to generate an initial sample of tweets discussing COVID-19. This list was self-assembled after a non-systematic manual exploration of the collected messages, and not the result of a systematic analysis for which data/code can be reported here.Tables F1-F6
: in these I provide information about the hashtags/ngrams most associated with positive/negative predictions from each of the 3 machine learning classifiers used in the paper. Unfortunately, I am unable to share replication code for this because it directly uses the original text of the collected tweets, and it would be a violation of Twitter's Terms of Service to share the original tweets.
The replication R
code in this repository was developed in the following environment
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.5.2
01-table01.R
: replicates Table 1 of the paper, where I provide information on the performance of the machine learning models used to classify political, pro Iran government, and hateful tweets. The same exact table is reported again inTable C4
in Appendix C -- so the same replication code/data applies for this other table.
02-figure01.R
: replicates Figure 1 of the paper, where I show cumulative amount suspensions for the period under analysis.
03-table02.R
: replicates Table 2 of the paper, where I show simple descriptives for the covariates of interest, comparing suspended and non-suspended users.
04-figure02.R
: replicates Figure 2 of the paper, where I show suspension rates by ideological bins, and levels of support for the Iranian Government.
05-figure03.R
: replicates Figure 3 of the paper, where I show the marginal effects from a logistic regression predicting account suspension as a function of many covariates, plus the two key variables of interest (ideology and support for the Iranian Government).
06-figure04.R
: replicates Figure 4 of the paper, where I show the hashtags and elite accounts used/followed at higher or lower rate by (non)suspended accounts.
App01-figureA1.R
: replicates Figure A1 in Appendix A, where I show the average ideology score attributed to Reformist-Independent-Principlist politicians.
App02-tableB1-B2.R
: replicates Tables B1 and B2 in Appendix B, where I report coefficient tables for the main model in Figure 3, as well as five additional model specifications.
App03-figureB1-B2.R
: replicates Figures B1 and B2 in Appendix B, where I show the distribution of count/continuous variables to identify skewed ones to log transform in the regression analyses.