Skip to content

alexdrk14/RussoUkrainianWar_Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Russo Ukrainian War Collection of Tweet IDs

The repository contains an ongoing collection of tweets IDs associated with the current war between Russia and Ukraine, which we commenced collecting on Februrary 24, 2022. We leveraged Twitter's search API to extract historical tweets, leading our dataset to contain tweets from February 22, 2022. We utilize Twitter’s streaming API to collect dataset based on selected popular hashtags corelated to particullar topic. The list of selected hashtags is presented in "hashtags.txt" file. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

The associated paper to this repository can be found here: Twitter Dataset on the Russo-Ukrainian War

Data Organization

The Tweet-IDs are organized as follows:

  • Tweet-ID files are stored in folders that indicate the year and month of the collection (YEAR-MONTH).
  • Individual Tweet-ID files contain a collection of Tweet IDs, and the file names all follow the same structure, with a prefix “tweet_ids_day_” followed by the YEAR_MONTH_DATE.
  • Note that Twitter returns Tweets in UTC, and thus all Tweet ID folders and file names are all in UTC as well.

Data Statistics and Analysis

We are manage to perform multiple statistical measurments in daily basis over the described dataset such as:

  • User Activity (Traffic volume)
  • Active users
  • Volume of suspended and deactivated accounts
  • Traffic volume based on text language
  • Traffic of hashtags
  • Sentiment analysis between entities of Russia and Ukraine
  • Sentiment analysis between entities of Putin and Zelensky

All described analytics are published in Parasecurity Group webpage.

Anonymized Data Sharing

Additionally, we have shared the collected text data sorted by creation date. User IDs, tweet IDs, and user mentions have all been anonymized for privacy. You can access the data via the following link: Zenodo repository.

Data Usage Agreement / How to Cite

By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:

Authors and Paper title with arxiv_id BibTeX:

@misc{shevtsov2022twitter,
  title={Twitter Dataset on the Russo-Ukrainian War},
  author={Shevtsov, Alexander and Tzagkarakis, Christos and Antonakaki, Despoina and Pratikakis, Polyvios and Ioannidis, Sotiris},
  journal={arXiv preprint arXiv:2204.08530},
  year={2022}
}

Statistics Summary (v1.0)

Number of Tweets : 127,275,386

Inquiries

Please read through the README and the closed issues to see if your question has already been addressed first.

If you have any questions about this dataset/analysis, please contact Alexander Shevtsov at asevtsov[at]tuc.gr.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages