Version 0.9.7
The aim of this project is to make existing FrameNet (FN) resources computationally accessible for multilingual natural language generation and controlled semantic parsing via a shared semantico-syntactic grammar and lexicon API.
We provide a currently bilingual but potentially multilingual FN-based grammar and lexicon library implemented in Grammatical Framework (GF) on top of GF Resource Grammar Library (RGL). The API of the FN-based library represents a shared set of automatically extracted semantico-syntactic verb valence patterns from 66,918 annotated sentences in Berkeley FrameNet (BFN 1.5) and 4,267 sentences in Swedish FrameNet (SweFN, a snapshot taken in December 3, 2014). The concise set of 869 patterns covers 483 shared frames (using BFN frames as interlingua) and 77.5% of sentences evoking the shared frames in both BFN and SweFN (44,645 and 2,596 sentences respectively).
Based on the FN-annotated sentences covered by the shared valence patterns, and the GF RGL type system for verbs, we have extracted 3,432 lexical entries (subcategorized lexical units, LUs) from BFN, and 1,899 entries form SweFN. LUs between BFN and SweFN are not directly aligned, therefore a specific lexicon is generated for each language. However, a partial shared lexicon has been automatically derived on top of the language-specific lexicons, currently providing a mapping between 703 LUs in BFN and 900 LUs in SweFN. The shared lexicon covers 25.1% (11,223) of BFN sentences and 35.8% (928) of SweFN sentences – of the above mentioned sentences which are represented by the shared valence patterns.
All numbers are indicative and a subject to change if more corpus examples, translation equivalents or improved heuristics is provided.
As a side result, a unified method for comparing and mapping semantic and syntactic valence patterns and lexical units across framenets is proposed. Thus, from the perspective of developers of FN-annotated corpora, this can be seen as a tool providing cross-lingual hints on how to improve the coverage.
All modules of the grammar and lexicon have been automatically generated based on the automatically extracted semantico-syntactic valence patterns.
Elements.gf
– 541 core frame elements (FE) declared as semantic categories that are subcategorized by the syntactic RGL types.Patterns.gf
– 869 valence patterns declared as functions that take one or more core FEs and a target verb as arguments, and return a clause. For each frame, the set of core FEs is often split into several alternative functions according to the corpus evidence.TargetsEngAbs.gf
– 3,432 lexical units (LU) from BFN (subcategorized by RGL verb types).TargetsSweAbs.gf
– 1,899 LUs from SweFN (subcategorized by RGL verb types).Targets.gf
– 703 LUs from BFN for which a mapping to one or more LUs in SweFN has been found.
ElementsI.gf
– mapping from the semantic types to the syntactic types, shared for all languages.ElementsEng.gf
andElementsSwe.gf
– language-specific instantiations ofElementsI.gf
.PatternsEng.gf
andPatternsSwe.gf
– language-specific implementation of the shared valence patterns (frame building functions).TargetsEngCnc.gf
– implementation of 3,350 BFN-specific LUs (reusingDictEng.gf
,DictionaryEng.gf
,LexiconEng.gf
,IrregEng.gf
,StructuralEng.gf
).TargetsSweCnc.gf
– implementation of 1,789 SweFN-specific LUs (reusingDictSwe.gf
,DictionarySwe.gf
,LexiconSwe.gf
,IrregSwe.gf
,StructuralSwe.gf
).TargetsEng.gf
andTargetsSwe.gf
– language-specific implementation of the shared LUs (the mapping from English to Swedish is based on potential translation equivalents extracted from the multilingualDictionaryEng.gf
/DictionarySwe.gf
andLexiconEng.gf
/LexiconSwe.gf
).
Note: The RGL modules DictL
, DictionaryL
, LexiconL
, IrregL
and StructuralL
are a subject to change independently of this library. We have used an RGL snapshot of December 15, 2014.
The API specification and a change log are available at http://grammaticalframework.org/framenet/
GF 3.6 installed from a recent developer source code (at least 2014-07-02), or a later version.
To illustrate the use of the FrameNet-based API, a re-engineered version of the Phrasebook grammar is included, preserving the original functionality and making no changes in the abstract syntax. Changes affect only the concrete syntax.
Search for "FrameNet API" in WordsEng.gf
, PhrasebookEng.gf
and WordsSwe.gf
, PhrasebookSwe.gf
under examples/phrasebook
to see the added and modified code.
-
Normunds Grūzītis and Dana Dannélls. A multilingual FrameNet-based grammar and lexicon for controlled natural language. Journal of Language Resources and Evaluation, 2015 (preprint)
-
Dana Dannélls and Normunds Grūzītis. Controlled natural language generation from a multilingual FrameNet-based grammar. In: Proceedings of the 4th Workshop on Controlled Natural Language (CNL), LNCS 8625, Springer, 2014, pp. 155-166 (preprint, slides, video)
-
Dana Dannélls and Normunds Grūzītis. Extracting a bilingual semantic grammar from FrameNet-annotated corpora. In: Proceedings of the 9th International Language Resources and Evaluation Conference (LREC), 2014, pp. 2466-2473
-
Normunds Grūzītis, Pēteris Paikens and Guntis Bārzdiņš. FrameNet Resource Grammar Library for GF. In: Proceedings of the 3rd Workshop on Controlled Natural Language (CNL), LNCS 7427, Springer, 2012, pp. 121-137 (preprint)
This work has been supported by the Swedish Research Council under Grant No. 2012-5746 (Reliable Multilingual Digital Communication: Methods and Applications) and by the Centre for Language Technology in Gothenburg. The research leading to these results has received funding also from the Latvian State Research Programme NexIT.
This library is licensed under GNU Lesser General Public License, version 3.