- AuthAttrib.py -- 2 models for authorship attribution: - AuthorshipAttributionMulti -- comparision of disputed text to each author - AuthorshipAttributionMultiBinary -- head to head comparison of each author against another
- DocTermHC -- model for constructing large-sacle word-frequency table and HC testing against it.
- HC_aux.py -- auxiliary functions to evaluate Higher Criticism tests
To use AuthorshipAttributionMulti and AuthorshipAttributionMultiBinary, arrange your datase in a pandas dataframe with columns author, doc_id, and text
- author is the name of the class the document is assoicated with.
- doc_id is a unique document identifyer.
- text is a string representing the content of the document.
See AuthorshipAttribution_example.ipynb for a use case in authorship attribution challenges. Here is the Binder link:
This code was used to get the results and figures reported in the paper:
Alon Kipnis, ``Higher Criticism for Discriminating Word-Frequency Tables and Testing Authorship'', 2019