Twitter hate speech collector

This script suite grabs mentions of a Twitter user after a specific tweet and dumps them to a csv (comma-separated values) file.

It was created in response to an uptick in Twitter-based harassment of organizers by white supremacist and neo-Nazi organizations, and specifically in response to a Nov-Dec 2016 influx of hate tweets against two organizers in Minneapolis (see originating target tweet in the last comment of grab_last_week_of_tweets.py).

The data produced is intended to:

Maintain copies of tweets that may be deleted later as Twitter maintains policies that limit their own ability to reproduce deleted tweets
Back up or replace screenshots with more forensically sound data (i.e. "real evidence" vs direct evidence"), because IDs and timestamps are taken directly from Twitter's own database. Ideally this helps to protect targeted organizers from accusations of fabricated evidence.
Protect targeted organizers from having to repeatedly view triggering photos or disrupt the work day to take screenshots.
Produce a spreadsheet that makes it easier to search for text key phrases/words and/or do statistical analysis on it.

Two scripts exist for grabbing tweets and dumping to csv, drawing on the same data set but two difference presentations of it.:

grab_last_week_of_tweets.py Uses the Twitter search API and the tweepy module to get the last week or so of tweets (Twitter API limitation) matching a search query. It prompts users to enter the Twitter handle of the person experiencing harassment and the tweet ID of a tweet occurring on the first day of the harassment (may or may not be a tweet whose contented started a response thread).
parse_advanced_search_results.py Uses lxml to scrape results from the HTML generated by an advanced search on Twitter's site. Needs debugging to capture cleaner text fields, especially for image-based tweets.

Both scripts output:

Tweet_id: generated from Twitter's data
Direct response to original thread?: Places 'Yes' or blank. Used to help filter out unrelated Tweets. This is not as accurate as it could be and needs debugging.
Sent when?: Time stamp generated from Twitter's created_at field
Sender: Twitter handle of author
Sender's name: Name (as entered by user onto profile) of author
Tweet text
Data source: Confirms which Twitter data source was used. The Twitter search API is used for the last week of Tweets, and the Advanced Search web server is used for older tweets.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
grab_last_week_of_tweets.py		grab_last_week_of_tweets.py
parse_advanced_search_results.py		parse_advanced_search_results.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter hate speech collector

About

Releases

Packages

Languages

WellstoneAction/twitter_hate_speech_collector

Folders and files

Latest commit

History

Repository files navigation

Twitter hate speech collector

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages