Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
397 lines (211 sloc) 12.1 KB
word_grammar - definition of word level grammar
This file contains the definition of the grammar at word
level. It declares the morphemes of the language and
defines the composition of words in terms of those mor-
phemes. It further declares the word functions and their
values and formulates the rules by which the word functions
are calculated from the morphemes.
It consists of four sections, word, forms, functions and
The word section defines how a word is constructed from its
morphemes. It consists of two parts. The first, introduced
by the keyword inflection, is dedicated to the inflectional
morphology of the language and describes the word as a lex-
eme surrounded by inflectional morphemes. The second part,
introduced by the keyword derivation, describes the deriva-
tional morphology, or how the lexeme is composed of roots
and derivational morphemes.
For each type of morpheme that can occur in a word, a mor-
pheme type declaration is given, which consists of an iden-
tifier, a pair of markers and a descriptive string. The
identifier is used to refer to the morpheme in the other
sections of the file, the markers define the strings used to
encode this type of morpheme in the morphologically analysed
text at(5). The descriptive string holds the common name of
this type of morpheme.
Both in the inflectional and in the derivational part, the
morpheme types are grouped in five classes that identify
what kind of satellite this morpheme type is with respect to
the core element of lexeme or root. Those five classes are:
prefix, core, infix, suffix and pattern. The classes `pre-
fix', `core', `infix' and `suffix' represent the concatena-
tive morphemes, as opposed to the class `pattern', which
represents the non-concatenative morphemes.
prefixes Morphemes that always occur before a lexeme or
root. It is allowed to declare only one marker
instead of a pair. This would mean that this pre-
fix uses a closing marker only, which marks the
end of the prefix.
cores The class of lexemes (inflection) and roots
(derivation). No markers are allowed here, so the
pair of markers should be declared as the empty
Werkgroep Informatica Last change: 01/08/31 1
infixes Morphemes that invariably occur inside a lexeme or
root. They are always encoded by putting them in
between markers and therefore require both an
opening and a closing marker.
suffixes Morphemes that always occur after a lexeme or
root. It is allowed to declare only one marker
instead of a pair. This would mean that this suf-
fix uses an opening marker only, which marks the
beginning of the suffix.
patterns Morphemes that are non-concatenative and share
their realisation with a (lexeme or) root. They
are recorded after the last suffix of the lexeme
or root to which they belong. As with the suf-
fixes, the declaration of a closing marker is
The forms section defines the individual morphemes for each
morpheme type declared in the previous section. Every mor-
pheme type declared in the word section has to have an entry
in the forms section. Each entry consists of the identifier
of the morpheme type followed by an enumeration of the mor-
phemes known to this type. This enumeration can either be
done explicitly, or in the form of the declaration of an
identifier that serves as a handle to locate a database that
contains the morphemes. With the explicit enumeration,
every morpheme is denoted by a string representing the
`paradigmatic form' of the morpheme, which serves as a
mnemonic for this morpheme throughout the file. This
mnemonic is not to be confused with one of the grapheme
strings by which the morpheme can be realised.
The mnemonic itself consists of a grapheme string of letters
and diacritics, followed by zero or more repetitions of the
homograph marker. The homograph marker is a character
string that is used to distinguish morphemes that share the
same `paradigmatic form'. This marker is one of the
metasymbols defined by the meta keyword at the start of the
forms section.
The metasymbols defined by the grammar are the addition
marker, the elision marker, the root separator, and the
homograph marker. The definition consists of the keyword
meta followed by the four strings that constitute the addi-
tion and elision marker, the root separator, and the homo-
graph marker. The meaning of the strings is determined by
their position. The addition marker comes first, the eli-
sion marker comes second, the root separator comes third,
and the homograph marker comes fourth. The choice of mor-
pheme markers should not coincide with the the definition of
Werkgroep Informatica Last change: 01/08/31 2
the metasymbols, so none of the metasymbols may occur as a
morpheme marker. In parsing the encoded word, morpheme
markers have precedence over metasymbols, which have gra-
pheme representations. Ambiguities that might otherwise
arise are resolved in accordance with this rule.
The special nature of the non-concatenative morphemes of
type `pattern' implies that their values are not listed as
strings but as identifiers.
The functions section declares the word functions and their
possible values. A function is declared by stating its
mnemonic (for further reference) and its common name, fol-
lowed by a comma separated list of allowed values. The
value declaration also consists of the definition of a
mnemonic and a full name.
The rules section describes the rules by which the word
functions are derived from the morphemes. Like the word
section, the rules section consists of two parts, one stat-
ing the rules for inflection and one stating the rules for
A rule is a pair of a morphological condition and an action.
The condition is phrased as a Boolean expression yielding
true or false indicating whether the condition is met or
not. If the condition is met, the listed actions are under-
taken. An action is usually the assignment of a value to a
word function, but can also involve accepting or rejecting a
form, or jumping to a rule further down. The special opera-
tors `-' and `+' are used to exclude a function from a word,
or to add a function with an undetermined value. The rules
are normally executed in order of appearance, but this can
be interrupted by an accept, reject or goto.
After an accept no further actions are performed within the
inflectional or derivational part of the rule section in
which it occurred. A reject immediately ceases further
Rules can be combined into a block of rules governed by a
shared condition. Only when a word matches the shared con-
dition of a block, the block is entered. Then the shared
actions, if present, are executed and subsequently every
rule within the block is tried.
grammar = word forms functions rules.
word = `word' word_inflection word_derivation.
Werkgroep Informatica Last change: 01/08/31 3
forms = `forms' metasymbols { forms_declaration }.
functions = `functions' { function_declaration }.
rules = `rules' rules_inflection rules_derivation.
word_inflection = `inflection' prefix core infix suffix pattern.
word_derivation = `derivation' prefix core infix suffix pattern.
metasymbols = `meta' string string string string.
forms_declaration = identifier forms_enumeration.
= wf_declaration `=' fv_declaration { `,' fv_declaration }.
rules_inflection = <empty> | `inflection' { rule }.
rules_derivation = <empty> | `derivation' { rule }.
prefix = <empty> | `prefix' `=' { mt_declaration }.
core = `core' `=' mt_declaration.
infix = <empty> | `infix' `=' { mt_declaration }.
suffix = <empty> | `suffix' `=' { mt_declaration }.
pattern = <empty> | `pattern' `=' { mt_declaration }.
forms_enumeration = explicit_enum | implicit_enum.
explicit_enum = `=' mv_declaration { `,' mv_declaration }.
implicit_enum = `<' identifier.
wf_declaration = identifier `:' string.
fv_declaration = identifier `:' string.
rule = label rule_definition.
mt_declaration = identifier `:' marker_set string.
label = <empty> | `label' identifier.
rule_definition = simple_rule | block.
marker_set = `{' `}' | `{' string `}' | `{' string `,' string `}'.
simple_rule = expression `::' action { `,' action }.
Werkgroep Informatica Last change: 01/08/31 4
block = shared_rule rule { rule } `end'.
shared_rule = `shared' `{' expression shared_actions `}'.
expression = term { `||' term }.
shared_actions = <empty> | `::' action { `,' action }.
action = attribution | shift.
attribution = assignment | exclusion | inclusion.
shift = jump | `accept' | `reject'.
exclusion = `-' identifier.
inclusion = `+' identifier.
term = factor { `&&' factor }.
assignment = identifier `=' identifier.
jump = `goto' identifier.
factor = simple_factor | negated_factor | grouped_factor.
simple_factor = comparison | existence.
negated_factor = `not' factor.
grouped_factor = `(' expression `)'.
comparison = identifier relational_operator value.
existence = `exist' `(' identifier `)'.
relational_operator = `==' | `!='.
value = single_value | value_set.
value_set = `{' single_value { `,' single_value } `}'.
mv_declaration = string | identifier.
single_value = string | identifier.
identifier = <alpha> { <alpha> | <digit> }.
string = `"' { <printable> } `"'.
accept, core, derivation, end, exist, forms, functions,
Werkgroep Informatica Last change: 01/08/31 5
goto, infix, inflection, label, meta, not, pattern, prefix,
reject, rules, shared, suffix, word.
Library file name of the word grammar a language direc-
alphabet(5), at(5), at2ps(1), isalpha(3C), isdigit(3C),
isprint(3C), lexicon(3), vmex(3), wrdgrm(3).
Werkgroep Informatica Last change: 01/08/31 6