Skip to content

Supplementary material for the paper ``Higher Criticism for Discriminating Word-Frequency Tables and Testing Authorship''

Notifications You must be signed in to change notification settings

adonoho/HCAuthorship

 
 

Repository files navigation

HCAuthorship

This repository contains code and dataset or dataset description used to obtain the results reported in

Content:

  • AuthAttLib -- library to facilitate the use of HC-based similarity measure in authorship attribution challenges. See project https://github.com/alonkipnis/AuthorshipAttribution for more details.
  • AuthorshipChallenge -- contains data and code (IPython notebook) for using HC-based similarity in the ``PAN 2018 Cross-domain authorship attribution'' challenge.
  • Federalists -- data and code (IPython notebook) for using HC to attribute authorship in the Federalist papers
  • Gutenberg -- code for attributing authorship using HC on a collection of more than 11,000 titles from the Gutenberg project. Also included is the list of titles and authors in this collection, and the file containing the result of the attribution procedure.
  • var_analysis -- code (R notebook) and data for conducting an anlysis of the variation of words within corpus and the degree by which the affect the HC calculation.

About

Supplementary material for the paper ``Higher Criticism for Discriminating Word-Frequency Tables and Testing Authorship''

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.5%
  • Other 0.5%