Skip to content

SrgFreeling

Olga Zamaraeva edited this page May 29, 2023 · 4 revisions

This page describes the interface between the Spanish Resource Grammar and the morphophonological analyzer Freeling that it relies on.

Why does a grammar need an external morphophonological analyzer?

See e.g. Bender and Good 2005 and also some discussion here: https://delphinqa.ling.washington.edu/t/reparsing-and-updating-a-treebank-keeping-previous-decisions/873/8

How does the Spanish Resource Grammar morphological interface work?

Freeling is a complex tool with lots of functionalities. SRG relies on it so that it can only keep a single form of a word in the lexicon (rather than all of its inflected forms).

dormir_v := v_-_native_le &
  [ STEM < "dormir" >,
    SYNSEM.LKEYS.KEYREL.PRED "_dormir_v_rel" ].

In contrast, the form duerme (3rd person singular, present tense, indicative mood) will not be found in the lexicon.

Instead, the grammar has a lexical rule:

vmip3s0 :=
%suffix (vmip3s vmip3s)
pres-ind_ilr &
  [ SYNSEM.LOCAL [ CAT.HEAD.AUX -,
                   AGR.PNG.PN 3sg ] ].

The above lexical rule is not associated with any orthographic change. This may be confusing because we usually talk about inflectional lexical rule in DELPH-IN meaning an orthographic change. But that is in the absence of an external morphophonological analyzer. In this case, Freeling provides an analysis for a given word form, and so the grammar must not provide any further orthographic changes on top of what was already analyzed by Freeling. Below you can see Freeling's output for the Spanish sentence El gato duerme (The cat sleeps).

el gato duerme.
el el DA0MS0 1
gato gato NCMS000 1
duerme dormir VMIP3S0 0.989241
. . Fp 1

The above output was obtained using the Freeling's own tool, the analyze binary which will be installed on your computer if you install Freeling 4.1. This is not what is used by the SRG. SRG uses the Freeling python API which can be found also in the location to which Freeling was installed (such as \usr\share\freeling). The API includes a file called pyfreeling_api.py and _pyfreeling.so as well as a sample program sample.py which gives a few example of how to use it. Important: This API is not available through pypi, and misleadingly, there is a package named pyfreeling which you can install via pypi and import, and you don't want that one.

The goal is to map Freeling output to YY input format, which the ACE parser can process (scroll to the right to see the whole line):

(42, 0, 1, <0:2>, 1, "mi" "mi", 0, "dp1css") (43, 1, 2, <4:8>, 1, "perro" "perro", 0, "ncms000") (44, 2, 3, <9:15>, 1, "dormir" "duerme", 0, "vmip3s0")

In order to work with the above input, ACE should be called with the -y --yy-rules option. What happens then is, ACE can find the lemma _dormir in lexicon.tdl even though what it is getting as input is duerme and furthermore, it will instantiate a lexical rule instance VMIP3S0 and include it in the chain.

As a result, the lexical chart will contain edges for the lexical entry associated with the verb dormir and for the appropriate lexical rule which will provide the person, number, and tense information. These edges should then be successfully combined into something the parser will be able to use for the subsequent syntactic parsing stage.

Python SRG-to-Freeling interface

The interfacing between Freeling and the grammar is done by several python modules under the folder util/. util/populate_tokens.py can be given a folder of tsdb profiles. It will call Freeling API and populate the i-tokens field of each item file with (hopefully) appropriate YY-input. ACE can then be called using the pydelphing library so as to select the i-tokens field for parsing:

delphin process --options="-y --yy-rules" -g ~/delphin/srg/ace/srg.dat --full-forest --select i-tokens path-to-test-suite

Overriding Freeling output

In some cases, Freeling output can be overridden. In the old version of the SRG, this was done with the file sppp.dat and a C++ program which acted as an interface between that file, Freeling, and the SRG.

In the current version, this is done with the files: freeling_api/srg-freeling.dat, util/override_freeling.py, util/parse_sppp_dat.py, srg_freeling2yy.py`.

Common problems

  • Check SRG Github issues

  • Missing Freeling tag: Freeling provides a tag that is not in inflr.tdl. A lexical rule edge is not instanciated -> no parse. Solution: Assuming the tag is generally useful, a new lexical rule can be added to inflr.tdl. If the tag seems to essentially double another tag, it can instead be added to the TAGS dictionary in util/override_freeling.py.

  • A lexical rule supertype inherits from basic-lex-rule. Basic-lex-rule does not implement token mapping and the lexical rule will not be instantiated for the given token. Solution: Have the specific lexical rule (as it appears in Freeling's output) inherit from tmt-lex-rule instead. TMT-lex-rule is the same as basic-lex-rule but respects token mapping.

  • Nonsensical Freeling output (wrong tags): This may happen if there is a typo in the sentence. Freeling is statistical, so it will try to output something even if the probability of a tag sequence is low.

  • Sequence of tags: Freeling sometimes outputs multiple tags for one word form. This can be desirable (as in the case with clitics) but it is possible that not all required mappings or postprocessing routines were implemented in the SRG-to-Freeling interface. Each such case should be investigated separately.

Clone this wiki locally