Skip to content

alonkipnis/HCAuthorship

Repository files navigation

HCAuthorship

This repository contains code and dataset or dataset description used to obtain the results reported in

Content:

  • AuthAttLib -- library to facilitate the use of HC-based similarity measure in authorship attribution challenges. See project https://github.com/alonkipnis/AuthorshipAttribution for more details.
  • AuthorshipChallenge -- contains data and code (IPython notebook) for using HC-based similarity in the ``PAN 2018 Cross-domain authorship attribution'' challenge.
  • Federalists -- data and code (IPython notebook) for using HC to attribute authorship in the Federalist papers
  • Gutenberg -- code for attributing authorship using HC in a collection of more than 11,000 titles from the Gutenberg project. This folder contains the list of titles and authors in this collection, code for downloading the titles, and the results of the attribution procedure obtained via several cluster computations.
  • var_analysis -- code (R notebook) and data for conducting an anlysis of the variation of words within corpus and the degree by which the affect the HC calculation.
  • compare_HC_types -- code (IPython notebook) for comparing two variants of HC.

About

Supplementary material for the paper ``Higher Criticism for Discriminating Word-Frequency Tables and Testing Authorship''

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages