Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


Open data from - an extensive dictionary and thesaurus of Latvian, comprising more than 315,000 lexical entries.

Available datasets

  1. Wordlists with metadata (under wordlists).
  2. Synonymic references (under wordlists).
  3. Glosses, etc. (under entries).

Additional datasets

  1. Multi-word expressions (extraced from a balanced 10M text corpus of Latvian) - under mwe.
  2. Mapping of Tēzaurs entries to core WordNet synsets (experimental) - under wordnet.


Morphology and other metadata

See entries.txt and references.txt under wordlists. Entries is a list of main headwords. References is a list of derivatives of the main headwords. Named entities, acronyms, abbreviations, prefixes, etc. are not included.

Data format: tab-separated records consisting of 9 fields:

  1. Headword.
  2. Homonym / homograph index (0..N).
  3. Universal POS tag, or NULL.
  4. Inflectional paradigm1 (0..N), or NULL.
  5. Infinitive stem1 (if the paradigm is 15 or 18), or NULL.
  6. Comma-separated present stems1 (if the paradigm is 15 or 18), or NULL.
  7. Comma-separated past stems1 (if the paradigm is 15 or 18), or NULL.
  8. Verb prefix2 (if the paradigm is 15 or 18), or NULL.
  9. Comma-separated list of sources, or NULL, or REF in case of references.

1 Used by Tēzaurs inflection service with the following parameters:

2 To be used by{word}


See synonyms.txt under wordlists.

Data format: tab-separated records consisting of 2 fields:

  1. Headword.
  2. Comma-separated synonymic references.


Spektors, A., Auziņa, I., Darģis, R., Grūzītis, N., Paikens, P., Pretkalniņa, L., Rituma, L., Saulīte, B. the largest open lexical database for Latvian. Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC), 2016

Pretkalniņa, L., Paikens, P. Extending Tē Online Dictionary into a Morphological Lexicon. Human Language Technologies - The Baltic Perspective. Frontiers in Artificial Intelligence and Applications, vol. 307, IOS Press, 2018

Paikens, P., Grūzītis, N., Rituma, L., Nešpore, G., Lipskis, V., Pretkalniņa, L., Spektors, A. Enriching an Explanatory Dictionary with FrameNet and PropBank Corpus Examples. Proceedings of the 6th Biennial Conference on Electronic Lexicography (eLex), 2019

Related work


This work is partially supported by the Latvian State research programmes: Letonika (Project No. 3), NexIT (Project No. 1) and SOPHIS (Project No. 2). The latest development is supported by European Regional Development Fund under the grant agreement No. (Full Stack of Language Resources for Natural Language Understanding and Generation in Latvian) and by the Latvian State research programme Latvian Language (VPP-IZM-2018/2-0002).


Tēzaurs data sets by AiLab are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Please, cite the relevant publications if you use Tēzaurs data or API in your research. Please, let us know if you use Tēzaurs data or API in your products or services. Your citations and feedback are important to secure funding for the further development of Tēzaurs data sets and API.


Project coordinator: Andrejs Spektors,

Team members: Ilze Auziņa, Guntis Bārzdiņš, Roberts Darģis, Mikus Grasmanis, Normunds Grūzītis, Gunta Nešpore-Bērzkalne, Pēteris Paikens, Ilmārs Poikāns, Lauma Pretkalniņa, Laura Rituma, Baiba Valkovska (Saulīte), Artūrs Znotiņš


The largest open lexical database of Latvian



No releases published


No packages published