GAP C++ C Python Makefile Shell
Pull request Compare This branch is 24 commits ahead of dmcc:master.
Failed to load latest commit information.
dockerfiles dockerfiles/python/Dockerfile: PyStanfordDependencies and friends Jan 15, 2015
first-stage Merge pull request #40 from antimatter15/patch-1 Feb 13, 2016
python python: Add whitespace to meet (new?) pep8 requirements Feb 20, 2017
sample-text sample-test/fails.sgml: Add 500 word sentence Aug 29, 2014
second-stage second-stage/programs/features: Rename fdstream.hpp -> fdstream.h Jan 15, 2016
.gitignore Use tox for multi-version Python testing Oct 31, 2015
.hgignore Use tox for multi-version Python testing Oct 31, 2015
.travis.yml .travis.yml: More switching to swig3.0 Sep 27, 2016
CHECKLIST.txt Updates for 2015.12.3 Python release Dec 4, 2015
CONTRIBUTING.rst CONTRIBUTING.rst: CLA no longer required Feb 13, 2016
CONTRIBUTORS.rst Updates for 2016.9.11 release Sep 25, 2016
LICENSE-2.0.txt Apply Apache 2.0 license. Feb 12, 2013 Fix issue #39 Jul 23, 2015
MODELS.rst python/bllipparser/ use default model directory Aug 17, 2015
Makefile Makefile: allow flags from environment Jan 22, 2016
Makefile.gavper Makefiles: switch -O6 to -O3 Mar 19, 2014
Makefile.splh Makefiles: switch -O6 to -O3 Mar 19, 2014
NOTICE Removed evalb from the distribution. May 9, 2013
README-python.rst Updates for 2016.9.11 release Sep 25, 2016
README.rst README.rst: Add charniak-emscripten fork Oct 8, 2015 Use cvlm-lbfgs in more Makefiles/scripts Mar 15, 2014 Use cvlm-lbfgs in more Makefiles/scripts Mar 15, 2014 Apply Apache 2.0 license. Feb 12, 2013 This script works out-of-the-box again. Mar 11, 2013 Add Syntactic Parse Fusion Aug 6, 2015 Removed evalb from the distribution. May 9, 2013 Apply Apache 2.0 license. Feb 12, 2013
regression-test regression-test: Minor text updates Oct 3, 2013
setup.cfg Add Python 3 support (issue #26) Oct 16, 2015 python: Add whitespace to meet (new?) pep8 requirements Feb 20, 2017
tox.ini tox.ini: drop Python 3.5 testing (for now) Sep 27, 2016 Use cvlm-lbfgs in more Makefiles/scripts Mar 15, 2014


BLLIP Reranking Parser

Copyright Mark Johnson, Eugene Charniak, 24th November 2005 --- August 2006

We request acknowledgement in any publications that make use of this software and any code derived from this software. Please report the release date of the software that you are using, as this will enable others to compare their results to yours.


BLLIP Parser is a statistical natural language parser including a generative constituent parser (first-stage) and discriminative maximum entropy reranker (second-stage). The latest version can be found on GitHub. This document describes basic usage of the command line interface and describes how to build and run the reranking parser. There are now Python and Java interfaces as well. The Python interface is described in README-python.rst.

Compiling the parser

  1. (optional) For optimal speed, you may want to define $GCCFLAGS specifically for your machine. However, this step can be safely skipped as the defaults are usually fine. With csh or tcsh, try something like:

    shell> setenv GCCFLAGS "-march=pentium4 -mfpmath=sse -msse2 -mmmx"


    shell> setenv GCCFLAGS "-march=opteron -m64"
  2. Build the parser with:

    shell> make
    • Sidenote on compiling on OS X

      OS X uses the clang compiler by default which cannot currently compile the parser. Try setting this environment variable before building to change the default C++ compiler:

      shell> setenv CXX g++

      Recent versions of OS X may have additional issues. See issues 19 and 13 for more information.

Obtaining parser models

The GitHub repository includes parsing and reranker models, though these are mostly around for historical purposes. See this page on BLLIP Parser models for information about obtaining newer and more accurate parsing models.

Running the parser

After it has been built, the parser can be run with:

shell> <sourcefile.txt>

For example:

shell> sample-text/sample-data.txt

The input text must be pre-sentence segmented with each sentence in an <s> tag:

<s> Sentence 1 </s>
<s> Sentence 2 </s>

Note that there needs to be a space before and after the sentence.

The parser distribution currently includes a basic Penn Treebank Wall Street Journal parsing models which will use by default. The Python interface to the parser includes a mechanism for listing and downloading additional parsing models (some of which are more accurate, depending on what you're parsing).

The script demonstrates how to run syntactic parse fusion. Fusion can also be run via the Python bindings.

The script takes a list of treebank files as arguments and extracts the terminal strings from them, runs the two-stage parser on those terminal strings and then evaluates the parsing accuracy with Sparseval. For example, if the Penn Treebank 3 is installed at /usr/local/data/Penn3/, the following code evaluates the two-stage parser on section 24:

shell> /usr/local/data/Penn3/parsed/mrg/wsj/24/wsj*.mrg

The Makefile will attempt to automatically download and build Sparseval for you if you run make sparseval.

For more information on Sparseval see this paper:

    title={SParseval: Evaluation metrics for parsing speech},
    author={Roark, Brian and Harper, Mary and Charniak, Eugene and
            Dorr, Bonnie and Johnson, Mark and Kahn, Jeremy G and
            Liu, Yang and Ostendorf, Mari and Hale, John and
            Krasnyanskaya, Anna and others},
    booktitle={Proceedings of LREC},

We no longer distribute evalb with the parser since it sometimes skips sentences unnecessarily. Sparseval does not have these issues.

More questions?

There is more information about different components of the parser spread across README files in this distribution (see below). BLLIP Parser is maintained by David McClosky.

Parser details

For details on the running the parser, see first-stage/README.rst. For help retraining the parser, see first-stage/TRAIN/README.rst (also includes some information about the parser model file formats).

Reranker details

See second-stage/README for an overview. second-stage/README-retrain.rst details how to retrain the reranker. The second-stage/programs/*/README files include additional notes about different reranker components.

Other versions of the parser

We haven't tested these all of these and can't support them, but they may be useful if you're working on other platforms or languages.


Parser and reranker:


Syntactic fusion: