Uses a Naive Bayes classifier to detect whether Trump has authored a given @RealDonaldTrump tweet.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
README.md
requirements.txt
train.py

README.md

Trump Tweet classifer

Uses a Naive Bayes classifier to detect whether Trump has authored a given @RealDonaldTrump tweet.

Background

Before becoming president, Donald Trump personally used a Samsung Galaxy, which runs the Android OS. Clever people quickly realized that tweets sent from an Android phone appeared "Trumpier" — a bit more angry and direct — than tweets sent from an iPhone.

This led folks to conclude that it probably was actually Trump tweeting in posts from the Android, while posts from the iPhone were from staffers.

Alas! Now that he's president, Trump is using his Android phone less and less. So we have to find a different to tell if it's him.

This classifier was featured in this Atlantic story: A Bot That Can Tell When It's Really Donald Trump Who's Tweeting

The classifier

This script uses tweets from 2016 and part of 2017 (before Trump began switching over to using the iPhone almost exclusively) to fuel a bag-of-words Naive Bayes classifier. It parses tweet text into one of two categories: trump (similar to previous tweets posted by the Android) and staff (similiar to tweets posted by an other device).

It discards common words and narrows down the feature list to the top 500 words to prevent overfitting. Even so, I fear this model is biased to Campaign Trump, not Current Trump — it ranks highly terms like bernie and cruz, which are't quite as relevant today. (Update: I've now added 2016 campaign-y stuff to a custom list of stopwords.)

Installation

Clone the repository. Then run:

pip install -r requirements.txt
python train.js

It'll output the current accuracy and save the classifer in two pickle files.

The data

The data comes courtesy of the excellent Trump Twitter Archive by Brendan Brown. Raw csv's are included in the data folder, but the definitive source is Brown's Github repository.