Skip to content

GitHub Repository for the paper [Political Partisanship and Anti-Science Attitudes in Online Discussions about COVID-19]. This repository consists of code that perform pre-processing, embedding generation and classification of user's multi-dimensional ideological polarity

License

Notifications You must be signed in to change notification settings

ashwinshreyas96/Multidimensional-Ideological-Polarization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multidimensional Ideological Polarization

GitHub Repository for the paper - Political Partisanship and Anti-Science Attitudes in Online Discussions about COVID-19. This repository consists of code that perform pre-processing, embedding generation and classification of user's multi-dimensional ideological polarity.

The COVID-19 pandemic has laid bare our differences and exposed crevices in national unity. Ideological fissures on social media have fragmented the nation into seemingly distinct politico-scientific groups. In this repository, we publish the code to identify such groups on Twitter data and analyze the interplay between them.

Usage

FastText Model

Tweet-ids corresponding to COVID-19 tweets collected from January 21, 2020 to July 31, 2020 have been sourced from COVID-19-TweetIDs. Owing to Twitter's policy we are restricted to sharing tweet-ids and users can rehydrate this dataset using hydrator. Upon rehydrating the content we can start pre-processing our tweets to compute ground truth domain scores. Ensure that the folder structure of rehydrated tweets match the folder structure of tweet-ids.

Refer to twitter-locations-us-state to perform filtering of rehydrated tweets by their location.

To extract domains and users from rehyrdated tweets:

python extract_domains.py <path_to_rehydrated_tweets> #extracts domains and users from rehydrated tweets

Next, compute ideological domain scores for each of the Science, Political and Moderacy dimensions.

python Domain-Score-Calc.py

Concatenate extracted tweets over time into one .csv file and execute the following statement to generate user specific tweet embeddings. To generate embeddings you need to download the FastText Twitter bi-gram model from Sent2Vec and place it in the models directory.

python mbfc-covid-fasttext.py <path to concatenated .csv file consisting of users and tweets>

You are then ready to execute the prediction model as follows and the final prediction results are saved in the results folder.

python FastText-Pred.py

LDA Model

Prepare a dataset consisting of users and the hashtags they generate over time and use the following command to conduct LDA analysis.

python LDA-Analysis.py <path_to_.csv containing user-hashtag data>

LPA Model

Create a .csv file consisting of edges of a retweet network extracted from rehydrated data and run LPA analysis as shown below. Ensure column names are in sync with the code.

python LPA-Analysis.py <path_to_.csv containing edges from retweet network>

Annotated Pay-Level Domain Data

The data folder consists of seeds (Twitter handles) for Label Propagation analysis and ideology annotated Pay-Level Domains (URLs) for domain score computation, Latent Dirichlet Allocation and our SOTA embedding model.

Citation

If yopu find this code useful please cite Political Partisanship and Anti-Science Attitudes in Online Discussions about COVID-19 as follows:

@misc{rao2020political,
      title={Political Partisanship and Anti-Science Attitudes in Online Discussions about Covid-19}, 
      author={Ashwin Rao and Fred Morstatter and Minda Hu and Emily Chen and Keith Burghardt and Emilio Ferrara and Kristina Lerman},
      year={2020},
      eprint={2011.08498},
      archivePrefix={arXiv},
      primaryClass={cs.SI}
}

About

GitHub Repository for the paper [Political Partisanship and Anti-Science Attitudes in Online Discussions about COVID-19]. This repository consists of code that perform pre-processing, embedding generation and classification of user's multi-dimensional ideological polarity

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages