Author: Julian Griggs (jgriggs@princeton.edu)
Advisor: Andrea LaPaugh (aslp@cs.princeton.edu)
####5/6/2014
Stylometric analysis is becoming an increasingly powerful tool for de-anonymizing written texts on the web. Despite the large growth in social media based text, authorship attribution studies focusing on this domain are relatively scarce. In this paper, I analyze the effectiveness of some of the most commonly used linguistic features and machine learning algorithms to quantitatively determine the best combination for authorship attribution in the Twitter domain. Empirical data suggests that across the feature-analysis method combinations tested, pairing Character 2-grams with Linear Support Vector Machines yields the best overall performance.