BLLIP Reranking Parser

Copyright Mark Johnson, Eugene Charniak, 24th November 2005 --- August 2006

We request acknowledgement in any publications that make use of this software and any code derived from this software. Please report the release date of the software that you are using, as this will enable others to compare their results to yours.

Overview

BLLIP Parser is a statistical natural language parser including a generative constituent parser (first-stage) and discriminative maximum entropy reranker (second-stage). The latest version can be found on GitHub. This document describes basic usage of the command line interface and describes how to build and run the reranking parser. There are now Python and Java interfaces as well. The Python interface is described in README-python.rst.

Compiling the parser

(optional) For optimal speed, you may want to define $GCCFLAGS specifically for your machine. However, this step can be safely skipped as the defaults are usually fine. With csh or tcsh, try something like:
```
shell> setenv GCCFLAGS "-march=pentium4 -mfpmath=sse -msse2 -mmmx"
```
or:
```
shell> setenv GCCFLAGS "-march=opteron -m64"
```
Build the parser with:
```
shell> make
```
- Sidenote on compiling on OS X
  
  OS X uses the clang compiler by default which cannot currently compile the parser. Try setting this environment variable before building to change the default C++ compiler:
```
shell> setenv CXX g++
```
  Recent versions of OS X may have additional issues. See issues 60, 19, and 13 for more information.

Obtaining parser models

The GitHub repository includes parsing and reranker models, though these are mostly around for historical purposes. See this page on BLLIP Parser models for information about obtaining newer and more accurate parsing models.

Running the parser

After it has been built, the parser can be run with:

shell> parse.sh <sourcefile.txt>

For example:

shell> parse.sh sample-text/sample-data.txt

The input text must be pre-sentence segmented with each sentence in an <s> tag:

<s> Sentence 1 </s>
<s> Sentence 2 </s>
...

Note that there needs to be a space before and after the sentence.

The parser distribution currently includes a basic Penn Treebank Wall Street Journal parsing models which parse.sh will use by default. The Python interface to the parser includes a mechanism for listing and downloading additional parsing models (some of which are more accurate, depending on what you're parsing).

The script parse-and-fuse.sh demonstrates how to run syntactic parse fusion. Fusion can also be run via the Python bindings.

The script parse-eval.sh takes a list of treebank files as arguments and extracts the terminal strings from them, runs the two-stage parser on those terminal strings and then evaluates the parsing accuracy with Sparseval. For example, if the Penn Treebank 3 is installed at /usr/local/data/Penn3/, the following code evaluates the two-stage parser on section 24:

shell> parse-eval.sh /usr/local/data/Penn3/parsed/mrg/wsj/24/wsj*.mrg

The Makefile will attempt to automatically download and build Sparseval for you if you run make sparseval.

For more information on Sparseval see this paper:

@inproceedings{roark2006sparseval,
    title={SParseval: Evaluation metrics for parsing speech},
    author={Roark, Brian and Harper, Mary and Charniak, Eugene and 
            Dorr, Bonnie and Johnson, Mark and Kahn, Jeremy G and 
            Liu, Yang and Ostendorf, Mari and Hale, John and
            Krasnyanskaya, Anna and others},
    booktitle={Proceedings of LREC},
    year={2006}
}

We no longer distribute evalb with the parser since it sometimes skips sentences unnecessarily. Sparseval does not have these issues.

Name		Name	Last commit message	Last commit date
Latest commit History 323 Commits
dockerfiles		dockerfiles
first-stage		first-stage
python		python
sample-text		sample-text
second-stage		second-stage
.gitignore		.gitignore
.hgignore		.hgignore
.travis.yml		.travis.yml
CHECKLIST.txt		CHECKLIST.txt
CONTRIBUTING.rst		CONTRIBUTING.rst
CONTRIBUTORS.rst		CONTRIBUTORS.rst
LICENSE-2.0.txt		LICENSE-2.0.txt
MANIFEST.in		MANIFEST.in
MODELS.rst		MODELS.rst
Makefile		Makefile
Makefile.gavper		Makefile.gavper
Makefile.splh		Makefile.splh
NOTICE		NOTICE
README-python.rst		README-python.rst
README.rst		README.rst
eval-reranker.sh		eval-reranker.sh
nfeatures-train-eval-reranker.sh		nfeatures-train-eval-reranker.sh
oparse.sh		oparse.sh
parse-50best.sh		parse-50best.sh
parse-and-fuse.sh		parse-and-fuse.sh
parse-eval.sh		parse-eval.sh
parse.sh		parse.sh
regression-test		regression-test
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini
train-eval-reranker.sh		train-eval-reranker.sh

BLLIP/bllip-parser

Folders and files

Latest commit

History

Repository files navigation

BLLIP Reranking Parser

Overview

Compiling the parser

Obtaining parser models

Running the parser

More questions?

Parser details

Reranker details

Other versions of the parser

References

About

Topics

Resources

Stars

Watchers

Forks

Languages