TweetUtils is a set of modules and command-line utilities for dealing with common-use cases for tweet datasets, as well as data collection from Twitter. A Twitter Developer Account is required for the data collection functionalities.
streaming_gathering.py:
Gather real-time streaming data from Twitter with keyword filterssearch_gathering.py:
Gather a collection of tweets matching a specific Twitter queryprofile_gathering.py:
Gather a collection of tweets from a specific user profilesanitize_tweets.py:
Remove stopwords, emoji and other customizable optionsquick_report.py:
Prints out a summary of contents for a dataset, such as top tweets, date range, most tweeted words, hashtags, tag cloud sizes...
Additionally, you may opt to use the modules as a programming interface.
token_manager.py:
Handles OAuth authentication within the Twitter API using Tweepysanitizer.py:
Stores all text-cleaning functions, such as stopword removal, symbol removal, emoji removal...io.py:
Reads and writes JSON and CSV files in a streamlined way
Install the dependencies and run the scripts/import the modules.
$ cd TweetUtils
$ pip install -r requirements.txt