Authorship Attribution on Twitter: A Comparative Methods Study

Author: Julian Griggs (jgriggs@princeton.edu)

Advisor: Andrea LaPaugh (aslp@cs.princeton.edu)

####5/6/2014

Stylometric analysis is becoming an increasingly powerful tool for de-anonymizing written texts on the web. Despite the large growth in social media based text, authorship attribution studies focusing on this domain are relatively scarce. In this paper, I analyze the effectiveness of some of the most commonly used linguistic features and machine learning algorithms to quantitatively determine the best combination for authorship attribution in the Twitter domain. Empirical data suggests that across the feature-analysis method combinations tested, pairing Character 2-grams with Linear Support Vector Machines yields the best overall performance.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Final Paper		Final Paper
scripts		scripts
src		src
BuildCorpusFile.py		BuildCorpusFile.py
CountCorrect.py		CountCorrect.py
GetCategories.py		GetCategories.py
Griggs_Julian_Final.pdf		Griggs_Julian_Final.pdf
Independent Work Presentation.pptx		Independent Work Presentation.pptx
README.md		README.md
Spring_Poster.pptx		Spring_Poster.pptx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Authorship Attribution on Twitter: A Comparative Methods Study

Author: Julian Griggs (jgriggs@princeton.edu)

Advisor: Andrea LaPaugh (aslp@cs.princeton.edu)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Authorship Attribution on Twitter: A Comparative Methods Study

Author: Julian Griggs (jgriggs@princeton.edu)

Advisor: Andrea LaPaugh (aslp@cs.princeton.edu)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages