This script suite grabs mentions of a Twitter user after a specific tweet and dumps them to a csv (comma-separated values) file.
It was created in response to an uptick in Twitter-based harassment of organizers by white supremacist and neo-Nazi organizations, and specifically in response to a Nov-Dec 2016 influx of hate tweets against two organizers in Minneapolis (see originating target tweet in the last comment of grab_last_week_of_tweets.py
).
The data produced is intended to:
- Maintain copies of tweets that may be deleted later as Twitter maintains policies that limit their own ability to reproduce deleted tweets
- Back up or replace screenshots with more forensically sound data (i.e. "real evidence" vs direct evidence"), because IDs and timestamps are taken directly from Twitter's own database. Ideally this helps to protect targeted organizers from accusations of fabricated evidence.
- Protect targeted organizers from having to repeatedly view triggering photos or disrupt the work day to take screenshots.
- Produce a spreadsheet that makes it easier to search for text key phrases/words and/or do statistical analysis on it.
Two scripts exist for grabbing tweets and dumping to csv, drawing on the same data set but two difference presentations of it.:
-
grab_last_week_of_tweets.py
Uses the Twitter search API and the tweepy module to get the last week or so of tweets (Twitter API limitation) matching a search query. It prompts users to enter the Twitter handle of the person experiencing harassment and the tweet ID of a tweet occurring on the first day of the harassment (may or may not be a tweet whose contented started a response thread). -
parse_advanced_search_results.py
Uses lxml to scrape results from the HTML generated by an advanced search on Twitter's site. Needs debugging to capture cleaner text fields, especially for image-based tweets.
Both scripts output:
Tweet_id
: generated from Twitter's dataDirect response to original thread?
: Places 'Yes' or blank. Used to help filter out unrelated Tweets. This is not as accurate as it could be and needs debugging.Sent when?
: Time stamp generated from Twitter'screated_at
fieldSender
: Twitter handle of authorSender's name
: Name (as entered by user onto profile) of authorTweet text
Data source
: Confirms which Twitter data source was used. The Twitter search API is used for the last week of Tweets, and the Advanced Search web server is used for older tweets.