Skip to content

Latest commit

 

History

History
86 lines (53 loc) · 8.49 KB

README.md

File metadata and controls

86 lines (53 loc) · 8.49 KB

FrameNet-based API to Grammatical Framework

Version 0.9.7

Introduction

The aim of this project is to make existing FrameNet (FN) resources computationally accessible for multilingual natural language generation and controlled semantic parsing via a shared semantico-syntactic grammar and lexicon API.

We provide a currently bilingual but potentially multilingual FN-based grammar and lexicon library implemented in Grammatical Framework (GF) on top of GF Resource Grammar Library (RGL). The API of the FN-based library represents a shared set of automatically extracted semantico-syntactic verb valence patterns from 66,918 annotated sentences in Berkeley FrameNet (BFN 1.5) and 4,267 sentences in Swedish FrameNet (SweFN, a snapshot taken in December 3, 2014). The concise set of 869 patterns covers 483 shared frames (using BFN frames as interlingua) and 77.5% of sentences evoking the shared frames in both BFN and SweFN (44,645 and 2,596 sentences respectively).

Based on the FN-annotated sentences covered by the shared valence patterns, and the GF RGL type system for verbs, we have extracted 3,432 lexical entries (subcategorized lexical units, LUs) from BFN, and 1,899 entries form SweFN. LUs between BFN and SweFN are not directly aligned, therefore a specific lexicon is generated for each language. However, a partial shared lexicon has been automatically derived on top of the language-specific lexicons, currently providing a mapping between 703 LUs in BFN and 900 LUs in SweFN. The shared lexicon covers 25.1% (11,223) of BFN sentences and 35.8% (928) of SweFN sentences – of the above mentioned sentences which are represented by the shared valence patterns.

All numbers are indicative and a subject to change if more corpus examples, translation equivalents or improved heuristics is provided.

As a side result, a unified method for comparing and mapping semantic and syntactic valence patterns and lexical units across framenets is proposed. Thus, from the perspective of developers of FN-annotated corpora, this can be seen as a tool providing cross-lingual hints on how to improve the coverage.

Structure

All modules of the grammar and lexicon have been automatically generated based on the automatically extracted semantico-syntactic valence patterns.

Abstract syntax

  • Elements.gf – 541 core frame elements (FE) declared as semantic categories that are subcategorized by the syntactic RGL types.
  • Patterns.gf – 869 valence patterns declared as functions that take one or more core FEs and a target verb as arguments, and return a clause. For each frame, the set of core FEs is often split into several alternative functions according to the corpus evidence.
  • TargetsEngAbs.gf – 3,432 lexical units (LU) from BFN (subcategorized by RGL verb types).
  • TargetsSweAbs.gf – 1,899 LUs from SweFN (subcategorized by RGL verb types).
  • Targets.gf – 703 LUs from BFN for which a mapping to one or more LUs in SweFN has been found.

Concrete syntax

Note: The RGL modules DictL, DictionaryL, LexiconL, IrregL and StructuralL are a subject to change independently of this library. We have used an RGL snapshot of December 15, 2014.

Documentation

The API specification and a change log are available at http://grammaticalframework.org/framenet/

Requirements

GF 3.6 installed from a recent developer source code (at least 2014-07-02), or a later version.

Usage example

To illustrate the use of the FrameNet-based API, a re-engineered version of the Phrasebook grammar is included, preserving the original functionality and making no changes in the abstract syntax. Changes affect only the concrete syntax.

Search for "FrameNet API" in WordsEng.gf, PhrasebookEng.gf and WordsSwe.gf, PhrasebookSwe.gf under examples/phrasebook to see the added and modified code.

Publications

Related work

Acknowledgements

This work has been supported by the Swedish Research Council under Grant No. 2012-5746 (Reliable Multilingual Digital Communication: Methods and Applications) and by the Centre for Language Technology in Gothenburg. The research leading to these results has received funding also from the Latvian State Research Programme NexIT.

Licence

This library is licensed under GNU Lesser General Public License, version 3.