No description or website provided.
Latest commit 611e6bb Jan 3, 2017 @normundsg normundsg Version: Winter 2017
Permalink
Failed to load latest commit information.
wordlists Version: Winter 2017 Jan 3, 2017
README.md Version: Winter 2017 Jan 3, 2017

README.md

Tēzaurs

Open data from http://tezaurs.lv -- an extensive dictionary and thesaurus of Latvian, comprising more than 275,000 lexical entries.

Available datasets

  1. Wordlists with metadata.
  2. Synonym sets (upcoming).
  3. Full machine-readable entries (upcoming).
Wordlists

See entries.txt and references.txt under wordlists. Entries is a list of main headwords. References is a list of derivatives of the main headwords. Acronyms, abbreviations, prefixes, etc. are currently not included. They will be added later as a separate wordlist.

Data format: tab-separated records consisting of 9 fields:

  1. Headword.
  2. Homonym / homograph index (0..N).
  3. Universal POS tag, or NULL.
  4. Inflectional paradigm* (0..N), or NULL.
  5. Infinitive stem* (if the paradigm is 15 or 18), or NULL.
  6. Comma-separated present stems* (if the paradigm is 15 or 18), or NULL.
  7. Comma-separated past stems* (if the paradigm is 15 or 18), or NULL.
  8. Verb prefix** (if the paradigm is 15 or 18), or NULL.
  9. Comma-separated list of sources, or NULL, or REF in case of references.

* Used by http://api.tezaurs.lv/v1/inflections/{word}

** To be used by http://api.tezaurs.lv/v1/transcriptions/{word}

Publications

Spektors, A., Auziņa, I., Darģis, R., Grūzītis, N., Paikens, P., Pretkalniņa, L., Rituma, L. and Saulīte, B. Tezaurs.lv: the largest open lexical database for Latvian. Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC), 2016, pp. 2568-2571

Acknowledgements

This work has been partially supported by Latvian State Research Programmes: Letonika (Project No. 3), NexIT (Project No. 1) and SOPHIS (Project No. 2).

Licence

Tēzaurs by AiLab is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.