A fast and accurate POS and morphological tagging toolkit
HTML Python Java TeX
Switch branches/tags
Nothing to show
Clone or download
datquocnguyen Add missing symbols into .DICT files
Add missing symbols into .DICT files in UniPOS models
Latest commit 65bd199 Nov 4, 2017

README.md

RDRPOSTagger

RDRPOSTagger is a robust, easy-to-use and language-independent toolkit for POS and morphological tagging. It employs an error-driven approach to automatically construct tagging rules in the form of a binary tree. The main properties of RDRPOSTagger are as follows:

  • RDRPOSTagger obtains very fast tagging speed and achieves a competitive accuracy in comparison to the state-of-the-art results. See experimental results including performance speed and tagging accuracy for 13 languages in our AI Communications article.

  • RDRPOSTagger supports pre-trained models for fine-grained POS and morphological tagging for Bulgarian, Czech, Dutch, English, French, German, Hindi, Italian, Portuguese, Spanish, Swedish, Thai and Vietnamese.

  • RDRPOSTagger also supports pre-trained universal POS tagging models for 40+ languages. These models are learned using training data from the Universal Dependencies (UD) v2.0. See the universal POS tagging accuracies on UD v2.0 test sets at HERE.

The general architecture and experimental results of RDRPOSTagger can be found in our following papers:

Please cite either the EACL or the AICom paper whenever RDRPOSTagger is used to produce published results or incorporated into other software.

Current release v1.2.4 is available to download (11MB .zip file including all pre-trained models) at: https://github.com/datquocnguyen/RDRPOSTagger/archive/master.zip

Find more information about RDRPOSTagger at its website: http://rdrpostagger.sourceforge.net/

In addition, you might also find my new toolkit jPTDP interesting: jPTDP - A Novel Neural Network Model for Joint POS Tagging and Dependency Parsing. jPTDP also provides pre-trained models for 40+ languages from UD v2.0.