Skip to content

A python based Hidden Markov Model part-of-speech tagger for Catalan which adds tags to tokenized corpus.

Notifications You must be signed in to change notification settings

amjha/HMM-POS-Tagger

Repository files navigation

HMM-POS-Tagger

The corpus has been adapted from the Catalan portion of WikiCorpus v. 1.0, as follows:

  • The corpus contains only a selection (< 1.2M words) from the original set.
  • The corpus contains only tokens and parts of speech, not lemmas and word senses.
  • The part-of-speech tags have been simplified from the original, resulting in 29 tags.
  • The format has been changed to the word/TAG format, with each sentence on a separate line.

The corpus is licensed under the same terms as the original, that is, the GNU Free Documentation License (FDL; http://www.fsf.org/licensing/licenses/fdl.html). That means that you are allowed to use and redistribute the texts, provided the derived works keep the same license.

About

A python based Hidden Markov Model part-of-speech tagger for Catalan which adds tags to tokenized corpus.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages