Estonian resource grammar for the Grammatical Framework (GF) Resource Grammar Library (RGL)
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
api
chunk
data
docs/talks
estonian
other
tests
tools
translator
Makefile
README.md
log.txt

README.md

Estonian resource grammar for the GF RGL

Estonian resource grammar for the Grammatical Framework (GF) Resource Grammar Library (RGL).

This is work in progress!

Downloading

Tagged versions and zip-files with external resources are available at https://github.com/GF-Estonian/GF-Estonian/releases

To download only the GF files relevant for the resource grammar, execute:

mkdir -p api estonian
curl https://raw.githubusercontent.com/GF-Estonian/GF-Estonian/master/tools/download-gf-estonian.py | python

Building

To build the PGF run:

make pgf_lang

If the building fails with an error message, then compile ResEst.gf first:

make clean
gf estonian/ResEst.gf
make pgf_lang

Testing

Running morphology tests:

cd tools/
bash test-mk.bash N ../data/nouns.6forms.csv
bash test-mk.bash N ../data/adj.6forms.csv
bash test-mk.bash V ../data/verbs.8forms.csv

Running syntax tests:

make clean
cd tests/
sh run_tests.sh

Status

We started out with a fork from the Finnish resource grammar with the general plan to:

  • translate the demo lexicon from Finnish to Estonian (LexiconEst)
  • implement Estonian smart paradigms to replace the Finnish ones
  • tweak the syntax to be Estonian

This has been mostly accomplished.

Nouns

2 numbers and 14 cases.

2 * 14 forms (excluding parallel forms)

  • 1-arg (sg nom): 91% correct
  • 2-arg (sg nom gen): 95% correct
  • 3-arg (sg nom gen part): 97% correct
  • 4-arg (sg nom gen part, pl part): 98% correct
  • 6-arg paradigm: 100% correct
  • no parallel forms

Adjectives

Constructed in the same way as nouns, but have 3 degree forms.

3 * 2 * 14 forms + 2 (?) * 1 adverb forms

  • 1-arg : 90% correct
  • 2-arg : 93% correct
  • 3-arg : 95% correct
  • 4-arg : 97% correct
  • comparative guessed: TODO: determine correctness
  • TODO: superlative (not urgent)

Verbs

Forms: 40 in total

  • Present : Person * Number
  • Imperfect : Person * Number
  • Conditional : Person * Number
  • Imperative : P2 Sg, P2 Pl, P3, P1 Pl, NegPl
  • Passive : {Present, Past} * {Pos, Neg}
  • Quotative : Active, Passive
  • Past participle : Active, Passive
  • Present participle: Active, Passive
  • Infinitive : da * 2 cases ; ma * 5 cases

Smart paradigm constructors

  • 1-arg paradigm (ma): 90% correct
  • 2-arg paradigm (ma, da): 96% correct
  • 3-arg paradigm (ma, da, b): 98% correct
  • 4-arg paradigm (ma, da, b, takse): 99% correct

Large lexicon (DictEst)

See: https://github.com/GF-Estonian/lexicons

Syntax

  • most functions ported to Estonian
  • tested using the MOLTO Phrasebook and ACE-in-GF applications

Papers about GF-Estonian

  • Inari Listenmaa and Kaarel Kaljurand. Computational Estonian Grammar in Grammatical Framework. 9th SaLTMiL Workshop on "Free/open-Source Language Resources for the Machine Translation of Less-Resourced Languages", LREC 2014, Reykjavík, Iceland, 27 May 2014. PDF

See also the talks.

Projects that use GF-Estonian