-
Notifications
You must be signed in to change notification settings - Fork 4
ERDWErgSemantics_Essence
Missing: events
In a nutshell, the English Resource Grammar (ERG) captures sentence meaning in an abstract representation to allow manipulation by software. The abstract representation is a collection of terms (with arguments) and information about how they fit together. The terms are akin to functions in a programming language or predicates in predicate logic. The representation is most naturally manipulated using a logic-based approach.
The meaning represented is ‘sentence meaning’, which is wholly determined by the words used and its grammatical structure. This is in contrast to what is sometimes called ‘speaker’ or ‘occasion meaning’: what an expression means given the entire context of what is going on. In a sentence like: "The bank with the shortest ATM lines is near the river bank", the ERG will represent "bank" the same way for both words since they are used in the same grammatical way. (On ‘speaker’ vs. ‘sentence’ meaning, see Quine 1960 and Grice 1968; for a discussion of these ideas and how they relate to the ERG, see Bender et al 2015.)
Part of what makes using the ERG valuable is that it generates rich and detailed information about the sentence in an easy-to-process form and converts sentences that mean the same thing into the same abstract representation, ignoring semantically irrelevant surface variation. Furthermore, the ERG represents all important semantic information from a sentence allowing software to do so-called "deep natural language parsing".
The following sections describe, at a high level, the abstract representation generated by the ERG. Throughout this document, attempts are made to give the linguistics background and reasoning behind certain behaviors, like this:
Linguistics Background: The linguistics background would go here...
Understanding these explanations are not important for understanding and using the ERG, they are there to help those with a linguistics background to understand some of the subtleties of the grammar.
The heart of the ERG representation is its predicate–argument structure. This is a collection of predicate-logic-like terms with arguments (or functions with arguments in a classic programming language) called "predications" or sometimes, "relations". The collection produced for a given sentence is called a "Minimal Recursion Semantics (MRS)" document (or simply "the MRS").
The terms generated go well beyond common representations in, for example, a semantic role labeling (SRL) system or classification in a machine learning type system. The ERG accounts for the contributions of all words in the phrase, including information that is mostly syntactic. For example, "garden dog" is represented by a term for "garden", "dog" and an additional term to indicate it is a compound word. "If it is sunny, I'll go" has an extra term to indicate that this is a conditional sentence, etc.
The predications in an MRS document have (often very blandly) named arguments such as ARG0
and ARG1
. Each argument contains a typed variable and the type is indicated by the letter used to name it. Thus, in this predication from an MRS document:
_dog_n_1 ARG0: x4
There is one argument: ARG0
that contains a variable x4
. The x
indicates that an "instance" (i.e. a thing) is being represented. The variable has a 4
appended to distinguish it from other instances being discussed in the rest of the document. The name of the predication _dog_n_1
indicates that the instance held in x4
is a dog (_dog
), is a noun (_n
), and is the first meaning of dog (_1
).
Variable sharing is used to link together the predications in an MRS document. To indicate that the mentioned dog is also yellow, add:
_yellow_a_1 ARG0: e9 ARG1: x4
The fact that ARG1
uses the same x4
indicates that it is further refining what x4
represents to be a "yellow dog". Its first argument (ARG0
) is of type "event" which is introduced in ERG Basics and is not used here. Argument sharing can happen in many situations throughout the ERG.
Linguistics background: Some examples of argument sharing are in non-scopal modification, control constructions, coordinate structures, as well as others (like relative clauses and certain types of comparatives).
Each predication in an MRS document is also assigned to a variable of type "Handle", so that the entire predication can be passed as an argument to another predication. This is very similar to how lambda functions in some programming languages allow functions to be passed to other functions. This is represented in the MRS document by an extra first argument named LBL
which is always included, like this:
_yellow_a_1 LBL: h8 ARG0: e9 ARG1: x4
_dog_n_1 LBL: h8 ARG0: x4
Now, we have a way to represent the fact that x4
should represent a dog that is not yellow:
_dog_n_1 LBL: h7 ARG0: x4
neg LBL: h1 ARG0: e8 ARG1: h10
_yellow_a_1 LBL: h10 ARG0: e2 ARG1: x4
neg
has the entire _yellow_a_1
predication as its ARG1
argument by virtue of using h10
, indicating that "not yellow" instances must be in x4
.
"Underspecification" allows a single MRS document to represent the many potential meanings of a sentence. Let's use the sentence "every dog chases a cat" as an example. One interpretation is "Every dog chases a (possibly different) cat". Another is "Every dog is chasing the same cat". A big part of why sentences are ambiguous is because of words that express "quantification" like "every, some, the, a, all, etc". "Every" and "some" are both quantifying parts of the sentence and, because language does not often have ways to specify "which is on top", you get multiple possible meanings.
The MRS document represents all of the meanings with one set of predications by "underspecifying" which predicates "are on top" or "have scope over" other predicates. It does this by leaving empty handle variables (i.e. "holes" like h1
) in predication arguments. The "holes" represent the places where alternatives could be filled it. It then provides constraints for plugging the predications together "legally" (i.e. maintaining the semantics of the sentence).
You can see this at work in the MRS document fragment for "every dog chases a cat":
_every_q LBL: h4 ARG0: x3 RSTR: h5 BODY: h6
_dog_n_1 LBL: h7 ARG0: x3
_a_q LBL: h9 ARG0: x8 RSTR: h10 BODY: h11
_cat_n_1 LBL: h12 ARG0: x8
_chase_v_1 LBL: h1 ARG0: e2 ARG1: x3 ARG2: x8
HCONS: < h0 qeq h1 h5 qeq h7 h10 qeq h12 > ]
Note that _every_q
has two arguments: RSTR
and BODY
that refer to handle variables that aren't actually used by any predication in the document. These represent the "holes". A new section at the bottom called "HCONS" represents the "Handle CONStraints" that must be followed when putting predications into these holes.
To figure out the meanings of an MRS document, you need to build a tree by assigning actual predications from the MRS into these "holes". The rules for building the tree are (basically): Every hole must be filled, every predication must be used, predications can only be in one hole, and the HCONS constraints must be followed (described next).
For _every_q
, the RSTR
argument h5
has a HCON that refers to it: h5 qeq h7
. This means that the predication represented by h7
(_dog_n_1
) must be in the subtree that gets put in RSTR
somewhere (not necessarily at the top). The BODY
argument has no HCONS constraints so anything can go there.
If you follow the rules, there are only two trees you can build with the above MRS:
a) "Every dog chases a (possibly different) cat"
┌_dog_n_1:x3
_every_q:x3,h5,h6
│ ┌_cat_n_1:x8
└_a_q:x8,h10,h11
└_chase_v_1:e2,x3,x8
b) "Every dog is chasing the same cat"
┌_cat_n_1:x8
_a_q:x8,h10,h11
│ ┌_dog_n_1:x3
└_every_q:x3,h5,h6
└_chase_v_1:e2,x3,x8
Using logic, we can represent these meanings in this way:
(1) Every dog chases a cat.
(a) ∀x dog(x): ∃y cat(y): chase(x,y)
(b) ∃y cat(y): ∀x dog(x): chase(x,y)
Note that this ambiguity and the resulting underspecification can happen with other ‘operator-like’ predications as well; see below).
Understanding how to read these trees is covered in sectionTBD. For now it is enough to understand that is what is going on.
There are only 3 more parts to an MRS document: Variable Properties, Top and Index.
Non-label variables can be ‘refined’ with what is called variable properties. These further filter down what the variable represents. For example: Whether the variable contains a plural or singular instance, whether it is first or second person, etc. Variable properties range over a fixed inventory of possible values, organized in a multiple-inheritance hierarchy to allow underspecification. More is described in TBD.
Top and Index do something that I still can't describe...
An actual MRS document is shown below. Top
, Index
and HCONS
are represented by different sections. The RELS
section contains the list of predications. Variable properties are represented next to the variable they modify, enclosed in []
[ TOP: h0
INDEX: e2
RELS: <
[ _the_q LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] RSTR: h5 BODY: h6 ]
[ _dog_n_1 LBL: h7 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] ]
[ _yellow_a_1 LBL: h1 ARG0: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x3 ]
>
HCONS: < h0 qeq h1 h5 qeq h7 > ]
There are many syntactically distinct ways of expressing the same underlying MRS document. This section walks through some examples to show how very different sentences can result in the same MRS document. It isn't important to understand the linguistic terms and reasoning outlined below to see that this kind of abstraction can be very helpful in simplifying the logic used to process the meaning of various sentences.
All of the examples (2a)–(2h) below are analyzed as semantically equivalent and all produce the same MRS document. This abstraction is one of the properties of the ERG that makes it well-suited as the interface to parsing and generation of natural language. Downstream processing can be independent of (language-specific) syntax.
(2a) Kim gave Sandy the book.
(2b) Kim gave the book to Sandy.
(2c) Sandy was given the book by Kim.
(2d) The book was given to Sandy by Kim.
(2e) The book, Kim gave Sandy.
(2f) Sandy, Kim gave the book to.
(2g) The book, Sandy was given by Kim.
(2h) To Sandy, the book was given by Kim.
...
[ TOP: h0
INDEX: e2
RELS: <
[ _the_q LBL: h10 ARG0: x9 [ x PERS: 3 NUM: sg IND: + ] RSTR: h11 BODY: h12 ]
[ _book_n_of LBL: h13 ARG0: x9 [ x PERS: 3 NUM: sg IND: + ] ARG1: i14 ]
[ proper_q LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] RSTR: h5 BODY: h6 ]
[ named LBL: h7 CARG: "Kim" ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] ]
[ _give_v_1 LBL: h1 ARG0: e2 [ e SF: prop TENSE: past MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x9 ]
>
HCONS: < h0 qeq h1 h5 qeq h7 h11 qeq h13 > ]
Linguistics Background: So-called diathesis alternations like the dative shift (2b), passivization ((2c) and (2d)), or focus movement ((2e) and (2f)) can lead to stark differences in syntactic structure but have no observable effect on the MRS document produced.
Many types of normalization like this happen in the ERG. Another example is given below:
(3a) This technique is impossible to apply.
(3b) It is impossible to apply this technique.
[ TOP: h0
INDEX: e2
RELS: <
[ _impossible_a_for LBL: h1 ARG0: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: h8 ARG2: i9 ]
[ _this_q_dem LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg ] RSTR: h5 BODY: h6 ]
[ _technique_n_1 LBL: h7 ARG0: x3 [ x PERS: 3 NUM: sg ] ]
[ _apply_v_2 LBL: h10 ARG0: e11 [ e SF: prop-or-ques TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: i9 ARG2: x3 ]
>
HCONS: < h0 qeq h1 h5 qeq h7 h8 qeq h10 > ]
Linguistics Background: In example in (3) and (4) above, lexical knowledge in the ERG enables the distinction between so-called referential vs. expletive usages of some pronouns, of which only the former will correspond to semantic arguments. While technique in (3a) is the syntactic subject of the predicative copula, the paraphrase invoking so-called (expletive) it extraposition in (3b) demonstrates that technique is not a semantic argument of impossible: Intuitively, there is no lack of possibility attributed to the technique instance. Instead, there is a long-distance dependency with the unexpressed syntactic complement of apply in (3a), which is made explicit by variable
x3
in the MRS.
Another frequent variation in syntactic structure that is normalized by the ERG is this:
(4a) The barking dog scared me.
(4b) The dog that was barking scared me.
(4c) The dog barking scared me. [i.e. when the dog is behind the fence]
(4d) The dog I think Kim told to bark scared me.
Examples 4a - 4c:
[ TOP: h0
INDEX: e2
RELS: <
[ pronoun_q LBL: h11 ARG0: x9 [ x PERS: 1 NUM: sg IND: + PT: std ] RSTR: h12 BODY: h13 ]
[ pron LBL: h10 ARG0: x9 [ x PERS: 1 NUM: sg IND: + PT: std ] ]
[ _the_q LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] RSTR: h5 BODY: h6 ]
[ _dog_n_1 LBL: h7 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] ]
[ _bark_v_1 LBL: h7 ARG0: e8 [ e SF: prop TENSE: untensed MOOD: indicative PROG: + PERF: - ] ARG1: x3 ]
[ _scare_v_1 LBL: h1 ARG0: e2 [ e SF: prop TENSE: past MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x9 ]
>
HCONS: < h0 qeq h1 h5 qeq h7 h12 qeq h10 > ]
Example 4d:
[ TOP: h0
INDEX: e2
RELS: <
[ pronoun_q LBL: h10 ARG0: x9 [ x PERS: 1 NUM: sg IND: + PT: std ] RSTR: h11 BODY: h12 ]
[ pron LBL: h8 ARG0: x9 [ x PERS: 1 NUM: sg IND: + PT: std ] ]
[ proper_q LBL: h16 ARG0: x17 [ x PERS: 3 NUM: sg IND: + ] RSTR: h18 BODY: h19 ]
[ named LBL: h20 CARG: "Kim" ARG0: x17 [ x PERS: 3 NUM: sg IND: + ] ]
[ pronoun_q LBL: h29 ARG0: x27 [ x PERS: 1 NUM: sg IND: + PT: std ] RSTR: h30 BODY: h31 ]
[ pron LBL: h28 ARG0: x27 [ x PERS: 1 NUM: sg IND: + PT: std ] ]
[ _the_q LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] RSTR: h5 BODY: h6 ]
[ _dog_n_1 LBL: h7 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] ]
[ _think_v_1 LBL: h7 ARG0: e13 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x9 ARG2: h14 ARG3: i15 ]
[ _tell_v_1 LBL: h22 ARG0: e23 [ e SF: prop TENSE: past MOOD: indicative PROG: - PERF: - ] ARG1: x17 ARG2: x3 ARG3: h24 ]
[ _bark_v_1 LBL: h25 ARG0: e26 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x3 ]
[ _scare_v_1 LBL: h1 ARG0: e2 [ e SF: prop TENSE: past MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x27 ]
>
HCONS: < h0 qeq h1 h5 qeq h7 h11 qeq h8 h14 qeq h22 h18 qeq h20 h24 qeq h25 h30 qeq h28 > ]
Linguistics Background: These examples pertain to what at times are called restrictive modifiers, which can take the form of pre- or post-nominal attributive adjectives or relative clauses (i.e. non-local dependencies).
In the ERS analyses for (4a) through (4d), there will always be an instance of the _bark_v_1
relation (albeit with different tense properties on its e8
event variable, events are described in TBD), where the dog instance (x3
) serves as its first argument.
As described above, words used in the same grammatical way will produce the same predications. For example, in "The bank with the shortest atm lines is near the river bank", both instances of bank will be represented by _bank_n_of
in the MRS. The ERG only produces different predications when it is clearly a different word based solely on the sentence grammar. On the other hand, if the same word is used in a grammatically different way, it will produce different predications. Predications are uniquely identified by the combination of their predication name and the number and name of their arguments. The combination represents a unique meaning in the ERG and will be used whenever that meaning is encountered.
The arguments to a given predication use bland names like ARG1
and ARG2
because their "role" (i.e. what they represent) changes based on the predication's identity. The ERG wanted to avoid an explosion of vague names to represent the different roles they play. The examples below illustrate these ideas.
Consider how "look" is represented in the MRS for these examples:
(1a) Kim looked up the answer.
(1b) Kim looked the answer up.
(1c) Kim looked up the chimney.
(1a) "Kim looked up the answer."
(1b) "Kim looked the answer up."
[ TOP: h0
INDEX: e2
RELS: <
[ _the_q LBL: h10 ARG0: x9 [ x PERS: 3 NUM: sg IND: + ] RSTR: h11 BODY: h12 ]
[ _answer_n_to LBL: h13 ARG0: x9 [ x PERS: 3 NUM: sg IND: + ] ARG1: i14 ]
[ proper_q LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] RSTR: h5 BODY: h6 ]
[ named LBL: h7 CARG: "Kim" ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] ]
[ _look_v_up LBL: h1 ARG0: e2 [ e SF: prop TENSE: past MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x9 ]
>
HCONS: < h0 qeq h1 h5 qeq h7 h11 qeq h13 > ]
(1c) "Kim looked up the chimney."
[ TOP: h0
INDEX: e2
RELS: <
[ _the_q LBL: h11 ARG0: x10 [ x PERS: 3 NUM: sg IND: + ] RSTR: h12 BODY: h13 ]
[ _chimney_n_1 LBL: h14 ARG0: x10 [ x PERS: 3 NUM: sg IND: + ] ]
[ proper_q LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] RSTR: h5 BODY: h6 ]
[ named LBL: h7 CARG: "Kim" ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] ]
[ _up_p_dir LBL: h1 ARG0: e9 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: e2 ARG2: x10 ]
[ _look_v_1 LBL: h1 ARG0: e2 [ e SF: prop TENSE: past MOOD: indicative PROG: - PERF: - ] ARG1: x3 ]
>
HCONS: < h0 qeq h1 h5 qeq h7 h12 qeq h14 > ]
The same MRS is generated for 1a
and 1b
since their meaning is the same (as described in the previous section). They use the predication _look_v_up ARG0: e2 ARG1: x3 ARG2: x9
-- a three argument predication that represents "looking up" in a "locate" sense.
Linguistics Background: This example of English verb–particle construction allows greater flexibility in placement of the up particle, and the different placements result in the same MRS (as shown above).
For 1c
, _look_v_1 ARG0: e2 ARG1: x3
is used -- a predication with two arguments, a different name, and a corresponding predication to indicate where: _up_p_dir ARG0: e9 ARG1: e2 ARG2: x10
. This use of "look" along with the "standard" use of up as a directional preposition lead to a different MRS representation.
Another example:
(2a) Kim broke the window.
(2b) The window broke.
(2a) "Kim broke the window."
[ TOP: h0
INDEX: e2
RELS: <
[ _the_q LBL: h10 ARG0: x9 [ x PERS: 3 NUM: sg IND: + ] RSTR: h11 BODY: h12 ]
[ _window_n_1 LBL: h13 ARG0: x9 [ x PERS: 3 NUM: sg IND: + ] ]
[ proper_q LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] RSTR: h5 BODY: h6 ]
[ named LBL: h7 CARG: "Kim" ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] ]
[ _break_v_cause LBL: h1 ARG0: e2 [ e SF: prop TENSE: past MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x9 ]
>
HCONS: < h0 qeq h1 h5 qeq h7 h11 qeq h13 > ]
(2b) "The window broke."
[ TOP: h0
INDEX: e2
RELS: <
[ _the_q LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] RSTR: h5 BODY: h6 ]
[ _window_n_1 LBL: h7 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] ]
[ _break_v_1 LBL: h1 ARG0: e2 [ e SF: prop TENSE: past MOOD: indicative PROG: - PERF: - ] ARG1: x3 ]
>
HCONS: < h0 qeq h1 h5 qeq h7 > ]
Linguistics Background: This is an example of the so-called "causative–inchoative alternation" and the different usages produce different predications. This can happen for roughly 400 other verbs in the ERG lexicon such as accumulate, age, break, and burn.
In this example, the "role" of ARG1
in _break_v_cause
(roughly, ‘agent’) is different from the role of ARG1
of _break_v_1
(roughly, ‘theme’). The "role" of ARG2
in _break_v_cause
is roughly 'target'. Note that a "class" of predications such as "locative prepositions" will have the same arguments and roles.
These examples show how different grammatical usage of a word can generate different unique predication identities (predication name and the number and name of their arguments) and how the "role" of the arguments such as ARG1
, ARG2
, etc. change based on their identity.
TBD
TBD
TBD
Home | Forum | Discussions | Events