Learning Stylometric Representation for Authorship Analysis
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.settings
cache
lib
src
.classpath
.gitignore
.project
LICENSE
README.md
pom.xml

README.md

StyloMatrix

Mining stylometric representation for authorship analysis. This repository contains several multi-language NLP utilities for text proccessing and several models for authorship analysis. The ca.mcgill.sis.dmas.nlp.model.astyle package contains the implementation of the following models:

  • Joint Topical-Lexical Modality
  • Character Modality
  • Syntactic Modality
  • LDA, LSA
  • N-grams, static features, typed N-grams
  • Two baselines from PAN2016

Example runs are included in the ca.mcgill.sis.dmas.nlp.exp package. You can refer to the source code for API usage.

StyloMatrix was developed by Steven H. H. Ding under the supervision of Benjamin C. M. Fung of the Data Mining and Security Lab at McGill University in Canada. If you find StyloMatrix useful, please cite our paper:

  • S. H. H. Ding, B. C. M. Fung, F. Iqbal, and W. K. Cheung. Learning stylometric representations for authorship analysis. IEEE Transactions on Cybernetics (CYB), in press. IEEE Systems, Man, and Cybernetics Society.

Compilation

This project is purely written in Java with Maven. You need the following dependencies:

  • [Required] The latest x64 8.x/9.x JRE/JDK distribution from Oracle.
  • [Required] The latest Maven distribution. Its 'bin' folder should be in your system's 'Path' environment.

The following commands will compile this project (executed at the root directory of the source code).

pushd lib/

# Install the POS tagger for Greek and its resources.
mvn install:install-file -Dfile=${basedir}/lib/GreekTagger-0.0.1.jar -DgroupId=local -DartifactId=greek-tagger -Dversion=0.0.1 -Dpackaging=jar
# Install the hunspell spell checking package.
mvn install:install-file -Dfile=${basedir}/lib/hunspell.jar -DgroupId=local -DartifactId=hunspell -Dversion=0.0.1 -Dpackaging=jar
# Install the AUROC calculation package.
mvn install:install-file -Dfile=${basedir}/lib/auc.jar -DgroupId=local -DartifactId=auc -Dversion=0.0.1 -Dpackaging=jar

popd 

# Build the final jar with all dependencies:
mvn package
# The compiled jar file target/authorship-0.0.1-SNAPSHOT-jar-with-dependencies.jar contains all the dependencies. 
# We suggest to append this jar file into your systems' 'CLASSPATH' environment variable for this session:
SET CLASSPATH=absolute_path_of_the_authorship-0.0.1-SNAPSHOT-jar-with-dependencies.jar

Setting up the development project:

This project is written with Eclipse. You can import it as an existing eclipse maven project. Other Java IDEs that support maven projects are compatible. Please refer to the instruction of your chosen IDE to import this project. You would also need to execute the following maven commands in your IDE to resolve local dependencies:

# Install the POS tagger for Greek and its resources.
mvn install:install-file -Dfile=${basedir}/lib/GreekTagger-0.0.1.jar -DgroupId=local -DartifactId=greek-tagger -Dversion=0.0.1 -Dpackaging=jar
# Install the hunspell spell checking package.
mvn install:install-file -Dfile=${basedir}/lib/hunspell.jar -DgroupId=local -DartifactId=hunspell -Dversion=0.0.1 -Dpackaging=jar
# Install the AUROC calculation package.
mvn install:install-file -Dfile=${basedir}/lib/auc.jar -DgroupId=local -DartifactId=auc -Dversion=0.0.1 -Dpackaging=jar

Licensing

The software was developed by Steven H. H. Ding under the supervision of Benjamin C. M. Fung at the McGill Data Mining and Security Lab. It is distributed under the Creative Commons Attribution-NonCommercial-NoDerivatives License. Please refer to LICENSE.txt for details.

Copyright 2017 McGill University. All rights reserved.