Skip to content
ccg-dev
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
bin
 
 
 
 
 
 
src
 
 
 
 
 
 
 
 

README.md

BMMM

The Bayesian Multinomial Mixture Model code from my 2011 paper (and thesis)

Requirements

  1. Java 1.7
  2. Maven (http://maven.apache.org/download.cgi)

Running BMMM

After cloning the project, or downloading the zip, open the bmmm folder in command line and run:

mvn package
mvn dependency:copy-dependencies

If the build is successful, to see the available runtime configuration options run

java -cp target/bmmm-2.0.11.jar:target/dependency/* tagInducer.Inducer

The main requirement is a CoNLL-style file with UPOS annotation (9 columns in total) as input. If the the input file contains dependencies (column 8) the deps feature can also be used. To use morphology (Morfessor) and PARG-based features you will need the appropriate files. You can convert a raw tokenised corpus to CoNLL format using the following command:

java -cp target/bmmm-2.0.10.jar tagInducer.utils.RawToCoNLL corpus.txt

You can also use a JSON file format with the following fields (one sentence per line):

{
    "words":[{"word":"more","pos":"qn","upos":"DET","cluster":"48"},
        {"word":"juice","pos":"n","upos":"NOUN","cluster":"48"},
        {"word":"?","pos":"?","upos":".","cluster":"-1"}]
}

Evaluating BMMM

To evaluate the output of the Inducer use:

java -cp target/bmmm-2.0.11.jar:target/dependency/* tagInducer.Evaluator

The input can be either a CoNLL-style file, where the clusters are contained in column 5 (4th 0-index-based column). The same file needs to contain either fine-grained tags (3rd 0-index column), UPOS (5th column) or CCG categories (6th column).

About

The Bayesian Multinomial Mixture Model code from my 2011 paper (and thesis)

Resources

You can’t perform that action at this time.