GitHub - jpate/ShakesEM: A library for Expectation Maximization with Probabilistic Context Free Grammars using Actors for parallelization. Also implements Pereira and Schabes (1992) modification for partially bracketed corpora.

jpate / ShakesEM Public

A library for Expectation Maximization with Probabilistic Context Free Grammars using Actors for parallelization. Also implements Pereira and Schabes (1992) modification for partially bracketed corpora.

homepages.inf.ed.ac.uk/s0930006/

GPL-3.0 license

2 stars 0 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
example		example
project		project
src/main/scala		src/main/scala
COPYING		COPYING
README		README
ShakesEM.scala		ShakesEM.scala

Repository files navigation

Author: John K Pate
Release date: Jan 25 2010
E-mail: j.k.pate@sms.ed.ac.uk

    This program is free software: you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by the Free
    Software Foundation, either version 3 of the License, or (at your option)
    any later version.

    This program is distributed in the hope that it will be useful, but WITHOUT
    ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
    FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
    more details.

    You should have received a copy of the GNU General Public License along with
    this program.  If not, see <http://www.gnu.org/licenses/>.

This is the first release of the ShakesEM library for doing
Expectation-Maximization for Probabilistic Context Free Grammars. The library
may be compiled with simply:

$ scalac ShakesEM.scala

The name of the library, ShakesEM, is a reference to William Shakespeare due to
the library's use of Scala Actors for distributed processing.

The ``example'' directory shows a basic use of the library. It contains an
example grammar file, an example lexicon, a corpus of 10 (mostly nonsense)
sentences, and a directory that stores resulting grammars. The rest of the files
were generated with:

$ scala shakesEMExample toyGrammar.txt toyLexicon.txt testSentences.txt 2 \
  0.001 exampleOutput/exampleRun &> exampleRun.log

The number following ``testSentences.txt'' in the above example corresponds to
the number of parsers that are started. You can start as many parsers as you
like, up to (and including) the number of sentences in your corpus. If you start
fewer parsers than you have processor cores, you will use as many cores as you
have parsers. If you start more parsers than you have processor cores, you will
use all your cores and the parsers will share computing resources transparently.

Note that both scalac and scala use the '-d' flag to decide where to place and
search for, respectively, JVM bytecode.

The ``scaladoc'' directory contains documentation generated by scaladoc (similar
to javadoc)