language-model-server

Inspired by (Madnani, 2009), this project tries to implement a queryable server for an English language model of variable n-gram order.

Requirements

SRILM toolkit
nltk

Installation

The most complicated aspect of the installation will be compiling SRILM. Once you have that toolkit downloaded and added to your $PATH, run these commands:

pip install --user -r requirements.txt to install Python dependencies.
./bootstrap.sh to build LM and load database
python manage.py runserver [::]:8000 to run the API server for ngram queries

Step 2 above can take a few minutes, depending on hardware. Using a batch size of 1000 per database commit:

$ time ./bootstrap.sh

Creating tables ...
Installing custom SQL ...
Installing indexes ...
Installed 0 object(s) from 0 fixture(s)
Generating countfile from corpus 8/8 ('combined')...            
Building language model (/home/conor/gits/language-model-server/corpus/nltk-combined.lm)... done.
Number of 1-grams committed to database: 223000
Number of 2-grams committed to database: 1795000
Number of 3-grams committed to database: 445000
Number of 4-grams committed to database: 287000
Number of 5-grams committed to database: 176000
Finished loading database.

real    2m36.424s
user    2m31.558s
sys     0m3.919s

TODO

write tests (damn it)
expand API with params for querying
investigate open source LMs
- kenLM for ngramModel?
investigate factored language model setup
- SRILM has a built-in for this
upgrade SRILM to newest version
consider breaking out custom lm as python lib

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
corpus		corpus
docs/img		docs/img
fixtures		fixtures
languagemodelserver		languagemodelserver
utils		utils
.gitignore		.gitignore
README.md		README.md
bootstrap.sh		bootstrap.sh
manage.py		manage.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

language-model-server

Requirements

Installation

TODO

About

Releases

Packages

Contributors 2

Languages

conorsch/language-model-server

Folders and files

Latest commit

History

Repository files navigation

language-model-server

Requirements

Installation

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages