Lithuanian Part of Speech Tagger - easy to use wrapper for lex and morph. The repository implements a service wrapper for semantikadocker.vdu.lt/v2/morph and semantikadocker.vdu.lt/lex services. These both services have quite complex API. This service makes the POS tag output simple to use and to understand.
Also it fixes some issues with lex segmentation.
Deployment sample is prepared with docker: example/docker-compose.yml. You are on Linux? To start a service locally:
cd example
make up
That's it. You can start using the service:
curl -X POST http://localhost:8092/tag -d 'Mama su kasa kasa smėlį.'
The output is expected to be the list of tagged words:
[
{
"type": "WORD",
"string": "Mama",
"mi": "Ncfsnn-",
"lemma": "mama"
},
{
"type": "SPACE",
"string": " "
},
{
"type": "WORD",
"string": "su",
"mi": "Sgi",
"lemma": "su"
},
{
"type": "SPACE",
"string": " "
},
{
"type": "WORD",
"string": "kasa",
"mi": "Ncfsin-",
"lemma": "kasa"
},
{
"type": "SPACE",
"string": " "
},
{
"type": "WORD",
"string": "kasa",
"mi": "Vgmp3---n--ni-",
"lemma": "kasti"
},
...
]
Info about the values of mi
property can be found here http://corpus.vdu.lt/en/morph. The set of possible values for the type
field is SPACE, SEPARATOR, SENTENCE_END, NUMBER, WORD
.
Airenas Vaičiūnas
Copyright © 2021, Airenas Vaičiūnas. Released under the The 3-Clause BSD License.