A final project for CS 108: Ethics of Intelligent Systems at Harvard.
Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
src
.gitignore
README.md

README.md

Usage

  1. Get Twitter API keys.
    • Create a file called APIKeys.json, and store your API keys in there. You can use APIkeyexample.txt as a reference.
    • Note that this .json will not be pushed to git, unless you change the .gitignore.
  2. Generate tweets for a user or set of users
    • Navigate to the src directory
    • Run python main.py --names <NAME1> <NAME2> ... where each of the NAMEi can be replaced with a twitter handle.
    • The code will pull tweets and save them to the data directory
    • This will also print generated tweets to the console
  3. Determine sentence similarity
    • Navigate to the src directory
    • Run python model_test.py <tweet_file> <K>, where <tweet_file> is the relative path to a file in the data folder (for example, ../data/Harvard.csv), and K designates how big your K-mer will be. K must be at least 2.

Important files

  • main.py: Contains code to generate sentences given a list of Twitter handles at the command line.
  • model_generator.py: Contains functions to generate the Markov model for a user. This includes getting tweets from a file, extracting K-mers, forming the model, and determining next words given the current K-1 words.
  • model_test.py: Contains functions generate sentences from a model, and test their similarity to the original tweets. Note that when run as driver program, this file will default to determining sentence accuracy.
  • twitter_extractor.py: Contains functions to connect to Twitter API and extract tweets for user or users.
  • comparison.py: Contains functions to compare words/sentences for quantitative analysis.

Dependencies

numpy, scipy, Tweepy, NLTK