Skip to content

This repository contains the replication material for the paper "The Geopolitics of Deplatforming: A Study of Suspensions of Politically-Interested Iranian Accounts on Twitter", by Mehdi Zamani and Andreu Casas, to be published at Political Communication.

CasAndreu/twitter-iran-moderation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Geopolitics of Deplatforming: A Study of Suspensions of Politically-Interested Iranian Accounts on Twitter

This repository contains the replication material for the paper "The Geopolitics of Deplatforming: A Study of Suspensions of Politically-Interested Iranian Accounts on Twitter", by Andreu Casas, to be published at Political Communication.

Data

The ./data/ directory contains the necessary data to replicate the analytical figures and tables of the paper. Below, I describe each of the datasets in this directory:

  • accuracy-5fold-hateful.csv: contains data about the performance of the multilingual BERT model fine-tuned for predicting hateful tweets. In this and the following two files, epoch variable provides info about the training epoch with the lowest training loss, precision about the % of model predictions that were correct, recall about the % of true positives correctly predicted by the model, fscore about an harmonized average of the precision and recall, accuracy about the overall % of correct predictions, and fold about training fold.
  • accuracy-5fold-political.csv: contains data about the performance of the multilingual BERT model fine-tuned for predicting political tweets.
  • accuracy-5fold-proirangov.csv: contains data about the performance of the multilingual BERT model fine-tuned for tweets in favor of the Iranian government.
  • elite-twitter-handles.csv: contains information about the Iranian elites used in the paper. Name of official/organization provides the name of the politician or media organization, Twitter handle contains the handle for those with a Twitter account (blank otherwise), Official position reports the official position for politicians (or indicates whether this is a media organization/account), Faction reports the political faction of the politician (blank if unknown or if media), and Political affiliation reports the higher-level political affiliation of the politician (blank if unkown or if media). These dataset contains 179 elites for which a Twitter handle was identified. However, three of them were excluded from the analysis because they were protected and key information such as their list of followers was inaccessible.
  • elite-accounts-ideo-scores.csv: contains ideology estimates (pe) and 95% confidence interval (lwr & upr) for the elite accounts included in the analysis (twitter column contains the Twitter handle for the elites).
  • elite-freq-diff-suspended-nonsuspended.csv: contains information about the proportion of suspended and non-suspended users that follow each of the elite accounts used in the paper. elite is the Twitter handle of the elite, nonsuspeded is the proportion of non-suspended users that follow that elite, suspended is the proportion of suspended users that follow that elite, diff is the difference between the suspended and non-suspended proportions.
  • hash-freq-diff-suspended-nonsuspended.csv: contains information about the proportion of suspended and non-suspended users that used each unique hashtag in the dataset in at least 1 of her/his tweets in 2020. hashtag is the hashtag, prop_nonsuspended is the proportion of non-suspended users that used that hashtag at least once, prop_suspended is the proportion of suspended users that used that hashtag at least once, dif is the difference between prop_suspended and prop_nonsuspended.
  • stopped-existing-LABELED.csv: contains information about when we detected users not being active anymore. user_id_anon is a new id given to each user for pseudonymization purposes, stop_existing is variable in the original MySQL table indicating non-active accounts (this is constant in this dataset, all rows = 1), tstamp is the exact date-time we identified an account as being no longer active, status indicates whether identified account is not active because suspended by Twitter (suspended), we simply know that it doesn't exists and so don't know for sure whether deleted by Twitter or the user (no exists), whether the account is back to being active (exists), or the account has been moved to being private (restricted).
  • model-data-anon.csv: this is the dataset used to estimate the statistical models reported in the paper. A detailed description of these variables is available in Appendix B of the paper.

Code

The ./code/ directory contains separate scripts to replicate each analytical figure in the article. The ./figures/ directory contains a copy of each of the figures generated by these scripts. Here a list and explanation for a few tables for which replication material is not provided:

  • Tables C1, C2, and C3 in Appendix C: in these I provide some example of tweets manually coded as true positives and true negatives for each of the 3 machine learning classifiers used in the paper. These are not the result of any analysis -- I simply picked a few illustrative examples from the population of manually annotated tweets.
  • Table D1 in Appendix D: in this table I report the list of keywords I used to generate an initial sample of tweets discussing COVID-19. This list was self-assembled after a non-systematic manual exploration of the collected messages, and not the result of a systematic analysis for which data/code can be reported here.
  • Tables F1-F6: in these I provide information about the hashtags/ngrams most associated with positive/negative predictions from each of the 3 machine learning classifiers used in the paper. Unfortunately, I am unable to share replication code for this because it directly uses the original text of the collected tweets, and it would be a violation of Twitter's Terms of Service to share the original tweets.

Environment

The replication R code in this repository was developed in the following environment

R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.5.2

Tables/Figures Main Paper


  • 01-table01.R: replicates Table 1 of the paper, where I provide information on the performance of the machine learning models used to classify political, pro Iran government, and hateful tweets. The same exact table is reported again in Table C4 in Appendix C -- so the same replication code/data applies for this other table.



  • 02-figure01.R: replicates Figure 1 of the paper, where I show cumulative amount suspensions for the period under analysis.



  • 03-table02.R: replicates Table 2 of the paper, where I show simple descriptives for the covariates of interest, comparing suspended and non-suspended users.



  • 04-figure02.R: replicates Figure 2 of the paper, where I show suspension rates by ideological bins, and levels of support for the Iranian Government.



  • 05-figure03.R: replicates Figure 3 of the paper, where I show the marginal effects from a logistic regression predicting account suspension as a function of many covariates, plus the two key variables of interest (ideology and support for the Iranian Government).



  • 06-figure04.R: replicates Figure 4 of the paper, where I show the hashtags and elite accounts used/followed at higher or lower rate by (non)suspended accounts.



Tables/Figures Appendix


  • App01-figureA1.R: replicates Figure A1 in Appendix A, where I show the average ideology score attributed to Reformist-Independent-Principlist politicians.



  • App02-tableB1-B2.R: replicates Tables B1 and B2 in Appendix B, where I report coefficient tables for the main model in Figure 3, as well as five additional model specifications.



  • App03-figureB1-B2.R: replicates Figures B1 and B2 in Appendix B, where I show the distribution of count/continuous variables to identify skewed ones to log transform in the regression analyses.

About

This repository contains the replication material for the paper "The Geopolitics of Deplatforming: A Study of Suspensions of Politically-Interested Iranian Accounts on Twitter", by Mehdi Zamani and Andreu Casas, to be published at Political Communication.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages