The largest open lexical database for Latvian
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
wordlists Version: Summer 2018 Jun 25, 2018
README.md

README.md

Tēzaurs

Open data from http://tezaurs.lv - an extensive dictionary and thesaurus of Latvian, comprising more than 295,000 lexical entries.

Available datasets

  1. Wordlists with metadata.
  2. Synonym sets (upcoming).
  3. Glosses (upcoming).
Wordlists

See entries.txt and references.txt under wordlists. Entries is a list of main headwords. References is a list of derivatives of the main headwords. Named entities, acronyms, abbreviations, prefixes, etc. are not included.

Data format: tab-separated records consisting of 9 fields:

  1. Headword.
  2. Homonym / homograph index (0..N).
  3. Universal POS tag, or NULL.
  4. Inflectional paradigm1 (0..N), or NULL.
  5. Infinitive stem1 (if the paradigm is 15 or 18), or NULL.
  6. Comma-separated present stems1 (if the paradigm is 15 or 18), or NULL.
  7. Comma-separated past stems1 (if the paradigm is 15 or 18), or NULL.
  8. Verb prefix2 (if the paradigm is 15 or 18), or NULL.
  9. Comma-separated list of sources, or NULL, or REF in case of references.

1 Used by Tēzaurs' inflection service with the following parameters:

2 To be used by http://api.tezaurs.lv/v1/transcriptions/{word}

Publications

Spektors, A., Auziņa, I., Darģis, R., Grūzītis, N., Paikens, P., Pretkalniņa, L., Rituma, L. and Saulīte, B. Tezaurs.lv: the largest open lexical database for Latvian. Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC), 2016, pp. 2568-2571

Acknowledgements

This work is partially supported by the Latvian State research programmes: Letonika (Project No. 3), NexIT (Project No. 1) and SOPHIS (Project No. 2). The latest development is supported by European Regional Development Fund under the grant agreement No. 1.1.1.1/16/A/219 (Full Stack of Language Resources for Natural Language Understanding and Generation in Latvian).

Licence

Tēzaurs by AiLab is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.