Skip to content

airenas/lt-pos-tagger

Repository files navigation

LT Part of Speech tagging service

Go Coverage Status Go Report Card CodeQL Load Tests Integration Tests

Lithuanian Part of Speech Tagger - easy to use wrapper for lex and morph. The repository implements a service wrapper for semantikadocker.vdu.lt/v2/morph and semantikadocker.vdu.lt/lex services. These both services have quite complex API. This service makes the POS tag output simple to use and to understand.

Also it fixes some issues with lex segmentation.

Deploy

Deployment sample is prepared with docker: example/docker-compose.yml. You are on Linux? To start a service locally:

   cd example 
   make up

That's it. You can start using the service:

   curl -X POST http://localhost:8092/tag -d 'Mama su kasa kasa smėlį.'

The output is expected to be the list of tagged words:

[
  {
    "type": "WORD",
    "string": "Mama",
    "mi": "Ncfsnn-",
    "lemma": "mama"
  },
  {
    "type": "SPACE",
    "string": " "
  },
  {
    "type": "WORD",
    "string": "su",
    "mi": "Sgi",
    "lemma": "su"
  },
  {
    "type": "SPACE",
    "string": " "
  },
  {
    "type": "WORD",
    "string": "kasa",
    "mi": "Ncfsin-",
    "lemma": "kasa"
  },
  {
    "type": "SPACE",
    "string": " "
  },
  {
    "type": "WORD",
    "string": "kasa",
    "mi": "Vgmp3---n--ni-",
    "lemma": "kasti"
  },
...
]

Info about the values of mi property can be found here http://corpus.vdu.lt/en/morph. The set of possible values for the type field is SPACE, SEPARATOR, SENTENCE_END, NUMBER, WORD.


Author

Airenas Vaičiūnas


License

Copyright © 2021, Airenas Vaičiūnas. Released under the The 3-Clause BSD License.


About

Lithuanian POS Tagger - easy to use wrapper for lex and morph

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published