Skip to content

cltk/lapos

Repository files navigation

Build Status

About

This is an un-official fork of the Lapos tagger, based on version 0.1.2. Official source available here.

The goal of this fork is to add Unicode support for use in the Classical Language Toolkit. Once fixed, the CLTK hopes that these changes will be merged upstream.

Build

There are two branches, master being for Linux and apple being for Mac OS (some changes were made for Clang, see below).

Use

For full instructions, see README. The CLTK's Latin model (based on Perseus treebanks) was made with the following command:

$ ./lapos-learn -m ./model latin_training_set.pos

Note: You can get this trainined set with curl -O https://raw.githubusercontent.com/cltk/latin_treebank_perseus/master/latin_training_set.pos.

For running, use echo to pass one sentence at a time:

$ echo "He opened the window." | ./lapos -t -m ./model_wsj02-21
He/PRP opened/VBD the/DT window/NN ./.

Changes

To compile on Clang, a few changes need to be made, namely removing tr1 from, e.g., (<tr1/unordered_map> and td::tr1::unordered_map).

We also increased the maximum number of tags, from 50 to 2000 (in crf.h, commenting out enum { MAX_LABEL_TYPES = 50 }; and uncommenting const static int MAX_LABEL_TYPES = 2000;). Also removed the unnecessary empty-input-line warning in crf.ppp ("warning: empty sentence").

License

Lapos created by Yoshimasa Tsuruoka, Yusuke Miyao, and Jun'ichi Kazama. For all technical details, see README and for license LICENSE.

About

Fork of the Lookahead Part-Of-Speech (Lapos) Tagger

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published