---

# 1. Language and Syntax
**[Emil Sekerinski](http://www.cas.mcmaster.ca/~emil/), McMaster University, January 2026**

---

### Language and Grammar

Every language is based on a _vocabulary_. Its elements are called _symbols_ whose structure is of no further interest. The _syntax_ determines which sequences of words, called *sentences*, belong to the language.

| language                | symbols                                             | sentences                          |
|:------------------------|:----------------------------------------------------|:-----------------------------------|
| English                 | `eats`, `runs`, `Dave`, `Kevin`, `a`, `banana`, ... | `Dave runs`, `Kevin eats a banana` |
| Roman numerals          | `I`, `V`, `X`, `L`, `C`, `D`, `M`                   | `IV`, `LXI`                        |
| identifiers in programs | `A`, `B`, ..., `a`, `b`, ..., `0`, `1`, ..., `_`    | `result`, `x14`, `PI`              |
| arithmetic expressions  | `dist`, `rot`, `24`, `+`, `‚Äì`, `√ó`, `/`, ...        | `rot √ó 24`, `dist + rot √ó 24`      |

**Question:** What are other non-spoken languages?

_Answer:_
- Chemical formulae, e.g <code>H<sub>3</sub>O<sup>+</sup></code> for hydronium.
- Musical scores, with vocabulary üéº, ‚ô≠, ‚ôÆ, ‚ôØ, ‚ô©, ‚ô™, ‚ô´, ‚ô¨, etc.
- Morse code, with vocabulary "‚óè" (short), "‚îÅ‚îÅ‚îÅ" (long), " " (pause).

<div style="float:right;background-color:lightgrey;border-left:20px solid white">

**Example:** if `V = {a, b}`, then  
`V·ê© = {a, b, a a, a b, b a, b b, a a a, ‚Ä¶ }`  
`V* = {Œµ, a, b, a a, a b, b a, b b, a a a, ‚Ä¶ }`  
The sentences of the language  
`L = {œÉ a œÉ | œÉ ‚àà V*}`  
are those sequences that contain at least one `a`.
</div>

Formally, a vocabulary `V` is a finite, non-empty set of (atomic) symbols. The set `V*` of all _finite sequences_ or _strings_ over `V` consists of

- the empty string `Œµ`,
- any symbol `x ‚àà V`,
- the _concatenation_ `œÉœÑ` of strings `œÉ, œÑ ‚àà V*`.

The empty string is both the left and right identity of concatenation. Concatenation is associative, meaning that parentheses can be left out. Formally, for any `œÉ, œÑ, œâ ‚àà V*`:

- `œÉ Œµ = œÉ = Œµ œÉ`
- `(œÉ œÑ) œâ = œÉ (œÑ œâ)`

The set of all _non-empty strings_ over `V` is denoted by `V·ê©`, formally `V·ê© = V* ‚Äì {Œµ}`. The _length_ of string `œÉ` is written as `|œÉ|`:

- `|Œµ| = 0`,
- `|x| = 1` for any `x ‚àà V`,
- `|œÉ œÑ| = |œÉ| + |œÑ|` for any `œÉ, œÑ ‚àà V*`.

<img style="width:16em;height:auto;float:right;border-left:10px solid white" src="./img/NLexample.svg">

A *grammar* not only determines unambiguously which strings are sentences and which are not, but also provides sentences with a *structure*. The structure is instrumental in recognizing the *semantic* of a sentence, which is our ultimate goal.

The theory of formal languages originates in linguistics. A basic rule of English is that sentences (`S`) consist of a noun phrase (`NP`) followed by a verb phrase (`VP`). A noun phrase is either a proper name (`PN`) or a determiner (`D`) followed by a noun (`N`). A verb phrase is either a verb (`V`) or a verb followed by a noun phrase. Determiners are `a` and `the`. The hierarchical composition of an English sentence by a *parse tree* is given to the right; below are the corresponding rules. Grammars of this form are called *generative*, and the rules are called *productions*, as they determine how all sentences of a language are generated.

<div style="float:right;background-color:lightgrey;margin-left:18pt">

`S ‚Üí NP VP`  
`NP ‚Üí PN`  
`NP ‚Üí D N`  
`VP ‚Üí V`  
`VP ‚Üí V NP`  
`PN ‚Üí Kevin`  
`PN ‚Üí Dave`  
`D ‚Üí a`  
`D ‚Üí the`  
`N ‚Üí banana`  
`N ‚Üí apple`  
`V ‚Üí eats`  
`V ‚Üí runs`

</div>

Formally, grammar `G = (T, N, P, S)` is specified by

- a finite set `T` of *terminal symbols*,
- a finite set `N` of *nonterminal symbols*,
- a finite set `P` of *productions*,
- a symbol `S ‚àà N`, the *start symbol*

where `N ‚à© T = {}` and `V = T ‚à™ N` is its *vocabulary*. Productions are pairs of strings `œÉ ‚àà V·ê©`, `œÑ ‚àà V*`, written `œÉ ‚Üí œÑ`.

**Example.** `G‚ÇÄ = (T, N, P, S)` with `T = {Kevin, Dave, a, the, banana, apple, eats, runs}`, `N = {S, NP, VP, PN, D, N, V}`, and the productions to the right is a grammar.

<div style="float:right;background-color:lightgrey;margin-left:18pt">

‚ÄÉ   `S`  
`‚áí NP VP`  
`‚áí PN VP`  
`‚áí Kevin VP`  
`‚áí Kevin V NP`  
`‚áí Kevin eats NP`  
`‚áí Kevin eats D N`  
`‚áí Kevin eats a N`  
`‚áí Kevin eats a banana`
</div>

Given grammar `G = (T, N, P, S)`, sequence `œá ‚àà V*` is _directly derivable_ from `œÄ ‚àà V·ê©`, written `œÄ¬†‚áí œá`, if there exist sequences `œÉ`, `œÑ`, `Œº`, `ŒΩ` such that, `œÄ = Œº œÉ ŒΩ`, `œá = Œº œÑ ŒΩ`, and `œÉ ‚Üí œÑ ‚àà P`.

If `œá` is _derivable in `n` steps_ from `œÄ`, this is written as `œÄ¬†‚áí‚Åø œá`. Formally, the relation `‚áí‚Åø` is defined for `n ‚â• 0` by:
- `œÄ¬†‚áí‚Å∞ œÄ`
- `œÄ¬†‚áí‚Åø‚Å∫¬π œá` if `œÄ¬†‚áí œÅ` and `œÅ¬†‚áí‚Åø œá` for some `œÅ`

We write

- `œÄ¬†‚áí* œá` if `œá` is _derivable in zero or more steps_ from `œÄ`,
- `œÄ¬†‚áí·ê© œá` if `œá` is _derivable in one or more steps_ from `œÄ`.

This implies that `‚áí*` is the transitive and reflexive closure of relation `‚áí` and `‚áí·ê©` is the transitive closure of `‚áí`.

The derivation to the right allows us to conclude that `S ‚áí·ê© Kevin eats a banana` with grammar `G‚ÇÄ`. More precisely, we can state `S ‚áí‚Å∏ Kevin eats a banana`.

The _language_ `L(G)` generated by grammar `G = (T, N, P, S)` is the set of all sequences of terminal symbols which can be derived from the start symbol:

    L(G) = {œá ‚àà T* | S ‚áí·ê© œá}

Grammars `G` and `G'` are _equivalent_ if they generate the same language, `L(G) = L(G')`.

**Example.** Given `G‚ÇÅ = (T, N, P, S)`, where `T = {a, b, c, d}`, `N = {S, X}`, `P = {S ‚Üí a X, S ‚Üí b X, X ‚Üí c, X ‚Üí d}`, the sequence `a c` is derivable from `S`, formally `S¬†‚áí·ê© a c`,

    S ‚áí a X ‚áí a c

as are `a d`, `b c`, `b d`. The language generated by `G‚ÇÅ` is:

    L(G‚ÇÅ) = {a c, a d, b c, b d}

**Question.** What are other equivalent grammars? 

_Answer._
- `G‚ÇÅÃç = (T, N', P', S)`, where `N = {S, X, Y}`, `P = {S ‚Üí X Y, X ‚Üí a, X ‚Üí b, Y ‚Üí c, Y ‚Üí d}`, is equivalent to `G‚ÇÅ`.
- Renaming the non-terminals also gives an equivalent grammar. In that sense, non-terminals "carry no meaning".
- Adding nonterminal `X‚ÇÅ` and replacing `X ‚Üí a` with `X ‚Üí X‚ÇÅ, X‚ÇÅ ‚Üí a` also gives an equivalent grammar. Repeating this, infinitely many equivalent grammars can be obtained.

Languages generated by a grammar can be _finite_ or _infinite_. Infinite languages are expressed through recursion with a finite set of productions.

**Example.** Let `G‚ÇÇ = (T, N, P, S)`, where `T = {a}`, `N = {S}` and let the productions `P` be:

    S ‚Üí Œµ
    S ‚Üí a S

For a string `œÉ`, the term `œÉ‚Åø` stands for `œÉ` repeated `n` times, formally `œÉ‚Å∞ = Œµ` and `œÉ‚Åø‚Å∫¬π = œÉ œÉ‚Åø`.  For example, `{a‚Åø | n ‚â• 0}` is  `{Œµ, a, a a, a a a, a a a a, ‚Ä¶}`.

**Theorem.** The language of `G‚ÇÇ` is that of sequences over `a` of arbitrary length:

    L(G‚ÇÇ) = {a‚Åø | n ‚â• 0}

*Proof.* This is formally proved by inclusion in both directions. By definition of `L(G‚ÇÇ)`,

    {œá ‚àà T* | S ‚áí·ê© œá} ‚äÜ {a‚Åø | n ‚â• 0}

means that for every `œá ‚àà T*` derivable from `S`, there exists an `n ‚â• 0` such that `œá = a‚Åø`. This is shown by induction over the length of derivations  from `S`.

- _Base._ A derivation of `œá` of length `1` from `S` can only derive `œá = Œµ` by the first production. As `Œµ = a‚Å∞`, the base case holds.
- _Step._ We need to show that each `œá` derivable from `S` in `n + 1` steps, `S ‚áí‚Åø‚Å∫¬π œá` is `a‚Å±` for some `i ‚â• 0`, under the hypothesis that each  `œá` derivable from `S` in `n` steps, `S ‚áí‚Åø œá` is `a‚Å±` for some `i ‚â• 0`. If `œá` is derivable in `n + 1` steps, then `S ‚áí a S ‚áí‚Åø œá` and `œá` is `a œâ` for some `œâ`. Since `œâ` is derived from `S` in `n` steps, `œâ` is `a‚Å±` for some `i ‚â• 0`, hence `œá = a œâ` is `a‚Å±` for some `i ‚â• 0`.

The inclusion in the other direction means that every `a‚Åø` for `n ‚â• 0` can be derived from `S`:

    {a‚Åø | n ‚â• 0} ‚äÜ {œá ‚àà T* | S ‚áí·ê© œá}

This is shown by induction over `n`.

- _Base._ For `n = 0`, obviously `a‚Å∞ = Œµ` can be generated by the first production, `S ‚áí·ê© Œµ`.
- _Step._ Suppose `a‚Åø` can be generated, `S ‚áí·ê© a‚Åø`. We need to show that `a‚Åø‚Å∫¬π` can also be generated. This follows from `S ‚áí a S ‚áí·ê© a a‚Åø = a‚Åø‚Å∫¬π`.

Thus we can conclude `L(G‚ÇÇ) = {a‚Åø | n ‚â• 0}`.

Recursion also allows the expression of arbitrarily deep *nested structures*.

**Example.** Let `G‚ÇÉ = (T, N, P, S)`, where `T = {a, b, c}`, `N = {S}`, and the productions `P` are:

    S ‚Üí b
    S ‚Üí a S c

The sequence `aabcc` is derivable from `S`:

    S ‚áí a S c ‚áí a a S c c ‚áí a a b c c
    
The generated language is:

    L(G‚ÇÉ) = {b, a b c, a a b c c, a a a b c c c, ‚Ä¶} = {a‚Åø b c‚Åø | n ‚â• 0}

Since a generated language can be infinite, a procedure returning the generated language cannot be expressed. However, a *generator* (asymmetric coroutine) can be used. The `yield` statement returns a value and, when the procedure is called again, continues the computation where it was suspended:
    
    generator G.L(): G.T* 
        dd, d := ‚àÖ, {G.S}
        while d ‚â† ‚àÖ do
            dd, d := dd ‚à™ d, ‚àÖ
            for œÉ ‚Üí œÑ ‚àà G.P do
                for Œº œÉ ŒΩ ‚àà dd do
                    œá := Œº œÑ ŒΩ
                    if œá ‚àâ dd and œá ‚àâ d then
                        if œá ‚àà G.T* then yield œá
                        d := d ‚à™ {œá}

The set `dd` contains sentences derived from `G.S` so far, and the set `d` contains sentences derived from `dd`, which are successively added to `dd`. If no new sentences can be derived from `dd`, the generator stops. The two `for`-loops consider all applications of the productions of `G.P` to all elements of `dd`. Each sentence `œá` obtained by applying a production is yielded and added to `d`, provided it was not yielded before; `œá` is yielded only if it consists only of terminals, i.e. is in `G.T*`. 

### Grammars in Python

The Python class below represents the components of a grammar by fields `T`, `N`, `P`, and `S`. Terminals and nonterminals are strings without spaces. Productions are pairs of strings with terminals and nonterminals separated by spaces. Method `L()` computes the generated languages. As that can be infinite, it cannot be returned as a set. Python allows the use of generators for this purpose. Starting from the start symbol, repeatedly, all productions are applied to the set of already derived sequences. If a derived sequence was already derived, it is ignored. If a derived sequence consists only of terminals, it is returned with a `yield` statement, and the computation continues. The method `s.find(t, i)` returns the index of the first occurrence of string `t` in string `s` starting at index `i`, or `-1` if no such occurrence exists. The method `s.split()` returns a list of substrings separated by space; it is used to determine if a derived sequence consists only of terminals. For each production and derived sequence, the innermost loop finds all positions at which the production can be applied:

In [79]:
from collections.abc import Iterator

class Grammar:
    def __init__(self, T: set[str], N: set[str], P: set[tuple[str, str]], S: str):
        self.T, self.N, self.P, self.S = T, N, P, S
    def L(self, log = False, stats = False) -> Iterator[str]:
        dd, d = set(), {self.S}
        if log: print('    ', self.S)
        while d != set():
            if stats: print('# added derivations:', len(d))
            if log: print()
            dd.update(d); d = set()
            for œÄ in sorted(dd, key = len):
                for œÉ, œÑ in self.P: # production œÉ ‚Üí œÑ
                    i = œÄ.find(œÉ, 0)
                    while i != - 1: # œÄ == œÄ[0:i] + œÉ + œÄ[i + len(œÉ):]
                        œá = œÄ[0:i] + œÑ + œÄ[i + len(œÉ):]; œá = œá.replace('  ', ' ')
                        if (œá not in dd) and (œá not in d):
                            if all(a in self.T for a in œá.split()): yield œá.strip()
                            if log: print('    ', œÄ, '‚áí', œá)
                            d.add(œá)
                        i = œÄ.find(œÉ, i + 1)

The method `s.strip()` removes leading and trailing spaces. The method `s.replace('  ', ' ')` is used to eliminate duplicate spaces; these occur if, say, in `X Y Z`, the production `Y ‚Üí Œµ`, represented by `('Y', '')`, is applied.

Grammar `G‚ÇÅ` is represented by:

In [80]:
G1 = Grammar({'a', 'b', 'c', 'd'}, {'S', 'X'},
             {('S', 'a X'), ('S', 'b X'), ('X', 'c'), ('X', 'd')}, 'S')

The call `G1.L()` returns a generator object, `g`; subsequent calls to `next(g)` yield a value. The order of the four values does not always have to be the same:

In [81]:
g = G1.L(); next(g), next(g), next(g), next(g)

('b c', 'b d', 'a c', 'a d')

A futher call `next(g)` would raise an exception. The `for` loop and comprehensions with `for` iterate exhaustively:

In [82]:
assert {d for d in G1.L()} == {'a c', 'a d', 'b c', 'b d'}

Setting the `log` parameter of method `L` to `True` prints, for each iteration of the outer `while`-loop, the newly derived sequences:

In [83]:
assert set(G1.L(True)) == {'a c', 'a d', 'b c', 'b d'}

     S

     S ‚áí a X
     S ‚áí b X

     b X ‚áí b c
     b X ‚áí b d
     a X ‚áí a c
     a X ‚áí a d



Grammar `G‚ÇÇ` is represented by:

In [84]:
G2 = Grammar({'a'}, {'S'}, {('S', ''), ('S', 'a S')}, 'S')

As the language is infinite, `next(g)` can be called arbitrarilly often:

In [85]:
g = G2.L(); next(g), next(g), next(g), next(g), next(g)

('', 'a', 'a a', 'a a a', 'a a a a')

The `range(n)` generator can be used to construct a set with `n` elements:

In [86]:
g = G2.L(); {next(g) for _ in range(9)}

{'',
 'a',
 'a a',
 'a a a',
 'a a a a',
 'a a a a a',
 'a a a a a a',
 'a a a a a a a',
 'a a a a a a a a'}

Grammar `G‚ÇÉ` is represented by:

In [87]:
G3 = Grammar({'a', 'b', 'c'}, {'S'}, {('S', 'b'), ('S', 'a S c')}, 'S')

Python prints the elements of a set in arbitrary order, not necessarily in the order in which they were inserted; if that is desired, a list can be used instead:

In [88]:
g = G3.L(); [next(g) for _ in range(5)]

['b', 'a b c', 'a a b c c', 'a a a b c c c', 'a a a a b c c c c']

Alternatively, the `L()` generator can be combined with a `range()` generator using `zip()` to limit the number of sentences in a `for`-loop or `for`-comprehension, eliminating the need for naming the `L()` generator

In [89]:
{d for d, _ in zip(G3.L(), range(5))}

{'a a a a b c c c c', 'a a a b c c c', 'a a b c c', 'a b c', 'b'}

### Chomsky Hierarchy

Languages can be classified according to restrictions on their grammar. The following classification is known as the _Chomsky Hierarchy_ [(Chomsky 1956)](#Chomsky56). For grammar `G = (T, N, P, S)`, let `V = T ‚à™ N` be its vocabulary, and assume `a ‚àà T`, `A, B ‚àà N`, `Œº, ŒΩ, œÑ ‚àà V*`, `œÉ ‚àà V·ê©`:

- A grammar is _monotonic_ (or _context-sensitive_) if productions are of the form

    `œÉ ‚Üí œÑ` and `|œÉ| ‚â§ |œÑ|`
        
    Additionally, `S ‚Üí Œµ` is allowed, provided that `S` does not occur on the right-hand side of another production.


- A grammar is _context-free_ if productions are of the form

    `A ‚Üí œÑ`
        
- A grammar is _regular_ if productions are of the form

    `A ‚Üí Œµ`  
    `A ‚Üí a`  
    `A ‚Üí a B`

Chomsky defines a grammar to be _context-sensitive_ if productions are of the form

    `Œº A ŒΩ ‚Üí Œº œÉ ŒΩ`

Although monotonic grammars are less restrictive than Chomsky's context-sensitive grammars, both are equivalent. For our purposes, monotonic grammars are more convenient; we continue to call them context-sensitive.

**Question.** Which of the grammars `G‚ÇÄ`, `G‚ÇÅ`, `G‚ÇÇ`, `G‚ÇÉ` are regular or context-free?

_Answer._
- `G‚ÇÄ` is not regular, but is context-free
- `G‚ÇÅ` is regular (and therefore context-free)
- `G‚ÇÇ` is regular (and therefore context-free)
- `G‚ÇÉ` is not regular, but is context-free

<div style="float:right;background-color:lightgrey;margin-left:2em;margin-top:1em">

`S ‚Üí NP VP`  
`NP ‚Üí D N‚Çõ`  
`NP ‚Üí D N‚Çö`  
`N‚Çõ VP ‚Üí N‚Çõ V‚Çõ`  
`N‚Çö VP ‚Üí N‚Çö V‚Çö`  
`D ‚Üí the`  
`N‚Çõ ‚Üí child`  
`N‚Çö ‚Üí children`  
`V‚Çõ ‚Üí runs`  
`V‚Çö ‚Üí run`

</div>

Context-sensitive languages allow the expression of subject-verb agreements with respect to singular vs. plural in natural languages.

**Example.** Consider the grammar `G‚ÇÑ` to the right with terminals in lower case and nonterminals starting with an upper case letter. Then

‚ÄÉ `the child runs`  
‚ÄÉ `the children run`  

are sentences, but `the child run` is not.  


**Question.** What is a derivation of `the child runs`?

*Answer.*

       S
    ‚áí NP VP  
    ‚áí D N‚Çõ VP
    ‚áí D N‚Çõ V‚Çõ
    ‚áí the child V‚Çõ
    ‚áí the child runs

Here are some fundamental results from formal language theory. Regular grammars can express repetition, but not nesting:

**Theorem.** No regular grammar for `L(G‚ÇÉ)` exists.

**Example.** Let `G‚ÇÖ = (T, N, P, S)`, where `T = {a, b, c}`, `N = {S, B}`, and let the productions `P` be:

    S ‚Üí a b c
    S ‚Üí a B S c
    B a ‚Üí a B
    B b ‚Üí b b

This grammar is not context-free but is context-sensitive. The generated language is:

    L(G‚ÇÖ) = {a b c, a a b b c c, a a a b b b c c c, ‚Ä¶} = {a‚Åø b‚Åø c‚Åø | n ‚â• 1}

**Question.** What is a derivation of `a a a b b b c c c` in `G‚ÇÖ`? Explain how the grammar works!

_Answer._

    ‚ÄÉ S
    ‚áí a B S c
    ‚áí a B a B S c c
    ‚áí a B a B a b c c c
    ‚áí a B a a B b c c c
    ‚áí a a B a B b c c c
    ‚áí a a a B B b c c c
    ‚áí a a a B b b c c c
    ‚áí a a a b b b c c c

The grammar works by first producing the same number of `a`, `B`, `c`, with all `c` in the correct position at the end but `a` and `B` alternating. The production `B a ‚Üí a B` moves all `a` to the left and all `B` to the middle. Once a `B` is in its correct position, it is converted to a `b`.

In [90]:
G5 = Grammar({'a', 'b', 'c'}, {'S', 'B'},
             {('S', 'a b c'), ('S', 'a B S c'), ('B a', 'a B'), ('B b', 'b b')}, 'S')

The log of derived sequences would be too long; instead, by setting the `stats` parameter to `True`, only the number of newly added derivations for each iteration of the outer `while`-loop is printed:

In [91]:
{d for d, _ in zip(G5.L(stats = True), range(4))}

# added derivations: 1
# added derivations: 2
# added derivations: 2
# added derivations: 4
# added derivations: 7
# added derivations: 11
# added derivations: 20
# added derivations: 34
# added derivations: 59
# added derivations: 102
# added derivations: 177
# added derivations: 307
# added derivations: 532
# added derivations: 924
# added derivations: 1602
# added derivations: 2781
# added derivations: 4826
# added derivations: 8375
# added derivations: 14536


{'a a a a b b b b c c c c', 'a a a b b b c c c', 'a a b b c c', 'a b c'}

**Theorem.** No context-free grammar for `L(G‚ÇÖ)` exists.

**Example.** Let `G‚ÇÜ = (T, N, P, S)`, where `T = {a, b}`, `N = {A, B, S}`, and productions `P` are:  

‚ÄÉ‚ÄÉ`S ‚Üí a A S`  
‚ÄÉ‚ÄÉ`S ‚Üí b B S`  
‚ÄÉ‚ÄÉ`A a ‚Üí a A`  
‚ÄÉ‚ÄÉ`A b ‚Üí b A`  
‚ÄÉ‚ÄÉ`B a ‚Üí a B`  
‚ÄÉ‚ÄÉ`B b ‚Üí b B`  
‚ÄÉ‚ÄÉ`S ‚Üí T`  
‚ÄÉ‚ÄÉ`A T ‚Üí T a`  
‚ÄÉ‚ÄÉ`B T ‚Üí T b`  
‚ÄÉ‚ÄÉ`T ‚Üí Œµ`  

The grammar is not context-free. The language generated is the *copy language*:

    L(G‚ÇÜ) = {w w | w ‚àà T*}

The first two productions produce an arbitrary sequence of pairs of `a A` and `b B` ending with `S`. The following four productions move all `A` and `B` to the right without "overtaking" each other. The final four productions convert `A` to `a` and `B` to `b` from right to left.

**Question.** What is a derivation of `a b a b` in `G‚ÇÜ`?

_Answer._

```
‚ÄÉ S
‚áí a A S
‚áí a A b B S
‚áí a b A B S
‚áí a b A B T
‚áí a b A T b
‚áí a b T a b
‚áí a b a b
```

In [92]:
G6 = Grammar({'a', 'b'}, {'A', 'B', 'S', 'T'},
             {('S', 'a A S'), ('S', 'b B S'), ('A a', 'a A'), ('A b', 'b A'), ('B a', 'a B'),
              ('B b', 'b B'), ('S', 'T'), ('A T', 'T a'), ('B T', 'T b'), ('T', '')},
             'S')

In [93]:
{d for d, _ in zip(G6.L(stats = True), range(8))}

# added derivations: 1
# added derivations: 3
# added derivations: 7
# added derivations: 20
# added derivations: 54
# added derivations: 148
# added derivations: 416
# added derivations: 1160
# added derivations: 3232
# added derivations: 9024
# added derivations: 25192


{'', 'a a', 'a a a a', 'a b a b', 'a b b a b b', 'b a b a', 'b b', 'b b b b'}

**Theorem.** No context-free grammar for `L(G‚ÇÜ)` exists.

Languages generated by context-sensitive, context-free, and regular grammars are called *context-sensitive*, *context-free*, and *regular languages*, respectively.

**Theorem.** Every regular language is also context-free. Every context-free language is also context-sensitive.

Note that the inclusion does not quite hold for grammars, as `A ‚Üí Œµ` is allowed in regular and context-free grammars, but not in context-sensitive grammars.

For brevity, we write

	œÉ ‚Üí œÑ‚ÇÄ | œÑ‚ÇÅ | ‚Ä¶

for the set of productions

	œÉ ‚Üí œÑ‚ÇÄ
    œÉ ‚Üí œÑ‚ÇÅ
    ‚Ä¶

### Concrete and Abstract Syntax Trees

We continue with context-free languages. For those, the _parse tree_ or _concrete syntax tree_ is a visual representation of a derivation, which abstracts from the order of independent applications of productions. In the example, `E` and `id` stand for expressions and identifiers in programs.

<img style="width:6em;float:right;border-left:10px solid white" src="./img/idplusid.svg">

**Example.** Let `G‚Çá = (T, N, P, E)` where `T = {id, +}`, `N = {E}`, and the productions `P` are:

    E ‚Üí id | E + E

There are two derivations of `id + id`:

    E ‚áí E + E ‚áí id + E ‚áí id + id
    E ‚áí E + E ‚áí E + id ‚áí id + id

<img style="width:19em;float:right;border-left:10px solid white" src="./img/idplusidplusid.svg"></img>
Continuing with `G‚Çá`, there are two parse trees for `id + id + id`. A sentence with more than one parse tree is an _ambiguous sentence_ and a grammar allowing that is an *ambiguous grammar*. Syntactically ambiguous sentences may have an ambiguous meaning. In natural languages, this may be resolved through the context; in programming languages, syntactic ambiguity is generally avoided.

<img style="width:9em;float:right;border-left:10px solid white" src="./img/idplusleft.svg">

Changing the productions to a _left-recursive_ form eliminates ambiguity and makes `+` associate to the left.

    E ‚Üí id | E + id

<img style="width:9em;float:right;border-left:10px solid white" src="./img/idplusright.svg"></img>
Changing the productions to a _right-recursive_ form eliminates ambiguity and makes `+` associate to the right.

    E ‚Üí id | id + E

**Question.** For which operators in programming languages does associativity matter, and for which does not?

_Answer._
- For integer division, associativity matters.
- For integer addition, associativity matters in bounded arithmetic (overflow is an error) and saturating arithmetic (overflow results in maximal number).
- For integer addition, associativity does not matter in modulo arithmetic, e.g. with word size and with arbitrary precision.
- For bitwise `and` and bitwise `or`, associativity does not matter.
- For string concatenation, associativity does not matter.

The next example illustrates operator _precedence_.

<img style="width:19em;float:right;border-left:10px solid white" src="./img/idplusidtimesid.svg"></img>
**Example.** Let `G‚Çà = (T, N, P, E)` where `T = {id, +, √ó}`, `N = {E}`, and the productions `P` are:

    E ‚Üí id | E + id | E √ó id

In `id + id √ó id`, operator `+` binds tighter; in `id √ó id + id`, operator `√ó` binds tighter: `+` and `√ó` bind equally tight and associate to the left.

<img style="width:19em;float:right;border-left:10px solid white" src="./img/idplustimestimesplus.svg">

To have proper operator precedence, nonterminal `T` for terms is introduced and the productions are changed to:  

    E ‚Üí T | E + T
    T ‚Üí id | T √ó id

Parentheses are needed to allow `+` to bind tighter than `√ó`. For this, an additional nonterminal, ` F` for factor, is introduced.

**Example.** Let `G‚Çâ = (T, N, P, E)` where `T = {id, +, √ó, (, )}`, `N = {E, T, F}`, and the productions `P` are:  

    E ‚Üí T | E + T
    T ‚Üí F | T √ó F
    F ‚Üí id | ( E )

**Question.** What are the parse trees for `id + id √ó id`, for `id √ó id + id`, and for `(id + id) √ó id`?

_Answer._

<img style="width:30em" src="./img/idparen.svg"></img>

A _structural tree_ or _abstract syntax tree_ is a simplified parse tree with only the relevant structure information:
- Productions whose sole purpose is to define precedence (like bracketing) are left out.
- Chains of derivations are left out.
- Nodes are labelled with the construct in question rather than a nonterminal.

For example, for `id + id √ó id`, for `id √ó id + id`, and for `(id + id) √ó id`:

<img style="width:28em" src="./img/idast.svg">

### Backus-Naur Form

Context-free grammars are more conveniently written in _Backus-Naur Form_ (*BNF*):
- The left-hand side of the first production is the start symbol.
- Terminals are enclosed in `'quotes'`; all other symbols are nonterminals.
- Productions for the same nonterminal are grouped into one, separated by `|`.
- The empty string `Œµ` is written as `''`.

For example, here is BNF grammar for expressions like `‚Äì 3 √ó a + b`

    expression ‚Üí term | '+' term | '‚Äì' term | expression '+' term | expression '‚Äì' term
    term ‚Üí factor | term '√ó' factor | term '/' factor
    factor ‚Üí number | identifier | '(' expression ')'

and one for statements like `if b then x := 3 else (x := y ; y := 5)`:

    statement ‚Üí assignment | compoundStatement | ifStatement | whileStatement
    assignment ‚Üí identifier ':=' expression
    compoundStatement ‚Üí '(' statementSequence ')'
    statementSequence ‚Üí statement | statementSequence ';' statement
    ifStatement ‚Üí 'if' expression 'then' statement | 'if' expression 'then' statement 'else' statement
    whileStatement ‚Üí 'while' expression 'do' statement

Let us define BNF in BNF! The terminals are characters written in quotes. The newline character is written as `\n`, and the quote character is `\'`:

    grammar  ‚Üí  production | grammar '\n' production
    production  ‚Üí  identifier '‚Üí' expression
    expression  ‚Üí  term | expression '|' term
    term  ‚Üí  factor | term ' ' factor
    factor  ‚Üí  identifier | string
	identifier  ‚Üí  letter | identifier letter | identifier digit
    letter  ‚Üí  'A' | ‚Ä¶ | 'Z' | 'a' | ‚Ä¶ | 'z'
    digit  ‚Üí  '0' | ‚Ä¶ | '9'
    string  ‚Üí  '\'' characters '\''
    characters  ‚Üí  characters char | ''
    char  ‚Üí  letter | '(' | ')' | ':' | '=' | ';' | ‚Ä¶

Numerous variations of BNF exist. For example, the grammar of C uses different fonts for terminals and nonterminals, enumerates the terms of a production indented on subsequent lines, and uses <code>A<sub>opt</sub></code> if `A` is optional [(Kernighan and Ritchie 1988)](#KernighandRitchie88). Formally, using <code>A<sub>opt</sub></code> amounts to adding a production <code>A<sub>opt</sub> ‚Üí A | Œµ</code>. Here is a simplified fragment:

<!-- <code style="font-family:monospace"> -->
<code style="font:Noto Sans Mono">
<i>statement:</i>
        <i>compound-statement</i>
        <i>expression-statement</i>
        <i>selection-statement</i>
        <i>iteration-statement</i>
<i>compound-statement:</i>
        { <i>statement-list<sub>opt</sub></i> }
<i>statement-list:</i>
        <i>statement</i>
        <i>statement-list</i> <i>statement</i>
<i>selection-statement:</i>
        if ( <i>expression</i> ) <i>statement</i>
        if ( <i>expression</i> ) <i>statement</i> else <i>statement</i>
        switch ( <i>expression</i> ) <i>statement</i>
<i>iteration-statement:</i>
        while ( <i>expression</i> ) <i>statement</i>
        for ( <i>expression<sub>opt</sub></i> ; <i>expression<sub>opt</sub></i> ; <i>expression<sub>opt</sub></i> ) <i>statement</i>
</code>

EBNF is an extension of BNF that allows simple repetitions to be formulated more naturally and avoids inflation of nonterminals:
- `(A)` allows precedence to be expressed. Formally, `(A)` stands for a new nonterminal `X` with the production `X ‚Üí A` added.
- `[A]` means that `A` is optional. Formally, `[A]` stands for a new nonterminal `X` with the production `X¬†‚Üí A | Œµ` added.
- `{A}` means repeating `A` an arbitrary number of times. Formally, `{A}` stands for a new nonterminal `X` with the production `X¬†‚Üí X A | Œµ` added.

For example, here is an EBNF grammar for arithmetic expressions,

    expression ‚Üí [ '+' | '‚Äì' ] term { ( '+' | '‚Äì' ) term}
    term ‚Üí factor { ( '√ó' | '/' ) factor }
    factor ‚Üí number | identifier | '(' expression ')'

and one for statements:

    statement ‚Üí assignment | compoundStatement | ifStatement | whileStatement
    assignment ‚Üí identifier ':=' expression
    compoundStatement ‚Üí '(' statement { ';' statement } ')'
    ifStatement ‚Üí 'if' expression 'then' statement ['else' statement]
    whileStatement ‚Üí 'while' expression 'do' statement

**Question.** First, eliminate `(...)`, `[...]` in the expression grammar, then eliminate `{...}`. How can the grammar be made more readable?

*Answer.*  

For eliminating `(...)` and `[...]`, `{...}`, we introduce `unaryop`, `addop`, and `multop`:

    expression ‚Üí unaryop term { addop term}
    unaryop ‚Üí '+' | '‚Äì' | Œµ
    addop ‚Üí '+' | '‚Äì'
    term ‚Üí factor { multop factor }
    multop ‚Üí '√ó' | '/'
    factor ‚Üí number | identifier | '(' expression ')'

For eliminating `{...}` in the production for `term`, that production can be replaced by:

    term ‚Üí factor morefactor
    morefactor ‚Üí morefactor multop factor | Œµ

The introduction of the nonterminal `morefactor` and the use of `Œµ` can be avoided here: 

    expression ‚Üí unaryop primary
    primary ‚Üí term | primary addop term
    unaryop ‚Üí '+' | '‚Äì' | Œµ
    addop ‚Üí '+' | '‚Äì'
    term ‚Üí factor | term multop factor
    multop ‚Üí '√ó' | '/'
    factor ‚Üí number | identifier | '(' expression ')'

Let us define EBNF in EBNF!

    grammar  ‚Üí  production {'\n' production }
    production  ‚Üí  identifier '‚Üí' expression
    expression  ‚Üí  term { '|' term }
    term  ‚Üí  factor { ' ' factor }
    factor  ‚Üí  identifier | string | '(' expression ')' | '[' expression ']' | '{' expression '}'
    identifier  ‚Üí  letter { letter | digit }
    letter  ‚Üí  'A' | ‚Ä¶ | 'Z' | 'a' | ‚Ä¶ | 'z'
    digit  ‚Üí  '0' | ‚Ä¶ | '9'
    string  ‚Üí  '\'' { char } '\''
    char  ‚Üí  letter | '(' | ')' | ':' | '=' | ';' | ‚Ä¶

Sometimes, `=` or `::=` is used instead of `‚Üí` and productions are terminated with a dot. For example, here is a fragment of the [Go Grammar](https://golang.org/ref/spec):

    Block = "{" StatementList "}" .
    StatementList = { Statement ";" } .
    
    Statement =
        Declaration | Assignment | Block | IfStmt | SwitchStmt | SelectStmt | ForStmt .

    Assignment = ExpressionList assign_op ExpressionList .
    assign_op = [ add_op | mul_op ] "=" .

    ExpressionList = Expression { "," Expression } .

Productions using `=` are also called *syntactic equations*; however, care has to be taken as `A = B` is not the same as `B = A`!

More variations of EBNF exist:

- Zero or more repetitions of `E` are also written as `E*`
- One or more repetitions of `E` are written as `E·ê©`, which stands for `E E*`.
- An optional occurrence of `E` is also written as `E?`, which stands for `E | Œµ`.

Here is a fragment of the Python [grammar in the language reference](https://docs.python.org/3/reference/compound_stmts.html) (which differs slightly from the [grammar used by parsers](https://docs.python.org/3/reference/grammar.html)). In Python, the indentation of statements matters; this is expressed in the grammar by symbols that indicate indentation:

<pre style="font-family:monospace;color:royalblue">
compound_stmt ::=  if_stmt
                   | while_stmt
                   | for_stmt
suite         ::=  stmt_list NEWLINE | NEWLINE INDENT statement+ DEDENT
statement     ::=  stmt_list NEWLINE | compound_stmt
stmt_list     ::=  simple_stmt (";" simple_stmt)* [";"]

if_stmt ::=  "if" expression ":" suite
             ("elif" expression ":" suite)*
             ["else" ":" suite]
</pre>

EBNF is not only helpful for the compact definition of a grammar but is also essential for the construction of a specific kind of recognizer. Our preference for EBNF is motivated by that.

### Syntax Diagrams

An EBNF grammar can be equivalently represented by _syntax diagrams_ (*railroad diagrams*). These are constructed recursively over the structure of EBNF grammars. Let `'a'` stand for a string (terminal), `A` for an identifier (nonterminal), and `E`, `E‚ÇÅ`, `E‚ÇÇ`, ‚Ä¶ for expressions (right-hand side of productions):

| EBNF            | syntax diagram                                     |
|:----------------|:---------------------------------------------------|
| `A ‚Üí E`         |<img style="width:10em" src="./img/production.svg"> |
| `'a'`           |<img style="width:10em" src="./img/terminal.svg">   |
| `A`             |<img style="width:10em" src="./img/nonterminal.svg">|
| `E‚ÇÅ ‚îÇ E‚ÇÇ ‚îÇ ‚Ä¶`   |<img style="width:10em" src="./img/choice.svg">     |
| `E‚ÇÅ E‚ÇÇ ‚Ä¶`       |<img style="width:14em" src="./img/sequence.svg">   |
| `(E)`           |<img style="width:10em" src="./img/parenthesis.svg">|
| `[E]`           |<img style="width:10em" src="./img/option.svg">     |
| `{E}`           |<img style="width:10em" src="./img/repetition.svg"> |

For example, for

    A ‚Üí 'x' | '(' A { '+' A } ')'

the syntax diagram is:

<img style="width:28em" src="./img/railroad.svg">

**Question.** What is the syntax diagram for EBNF?

### Recognizers

A _recognizer_ for a language is a program that takes as input a sequence and _accepts_ it if the sequence is a sentence of the language or otherwise _rejects_ it. For regular, context-free, and context-sensitive languages, a _universal recognizer_ exists, i.e. a program that, given a grammar `G` and sequence `œâ`, returns if `œâ ‚àà L(G)`. For an unrestricted grammar `G`, in general, `œâ ‚àà L(G)` is undecidable. 

For context-sensitive grammars, the property that applying a production to a derived sequence never shrinks the sequence allows the construction of a universal recognizer. The algorithm below keeps a set, `dd`, with all the previously derived sequences, starting with the start symbol. On each iteration, all productions are applied to all sequences of `dd`; if one of those matches `œâ`, true is returned. Each new derivation that is no longer than `œâ` is placed in a set, `d`. If all derivations of length less than or equal to `œâ` are explored, i.e., `d` remains empty, and `œâ` is not among those, false is returned. Otherwise, `d` is added to `dd` and a new iteration begins.

    procedure G.derivable(œâ): boolean 
        dd, d := ‚àÖ, {G.S}
        while d ‚â† ‚àÖ do
            dd, d := dd ‚à™ d, ‚àÖ
            for Œº œÉ ŒΩ ‚àà dd do
                for œÉ ‚Üí œÑ ‚àà G.P do
                    œá := Œº œÑ ŒΩ
                    if œá = œâ then return true
                    else if |œá| ‚â§ |œâ| then d := d ‚à™ {œá}
        return false

If `n` is the number of iterations of the outermost loop, `dd` contains all sequences that are derived from `G.S` in `n` steps and are no longer than `œâ`. That is, the invariant is:

    ‚àÄ œÄ ‚àà dd ¬∑ G.S ‚áí‚Åø œÄ ‚àß |œá| ‚â§ |œâ|

This algorithm always terminates, and the memory it uses is bounded. Since the set `dd` may become large, it is not a practical universal recognizer, but a constructive proof that membership in a context-sensitive language is decidable.

The Python implementation is a modification of the method `L()`:

In [94]:
def derivable(G: Grammar, œâ: str, log = False, stats = False) -> bool: # G must be context-sensitive
    dd, d, œâ = set(), {G.S}, œâ.strip()
    if log: print('    ', G.S)
    while d != set():
        if stats: print('# added derivations:', len(d))
        if log: print()
        dd.update(d); d = set()
        for œÄ in sorted(dd, key = len):
            for œÉ, œÑ in G.P: # production œÉ ‚Üí œÑ
                i = œÄ.find(œÉ, 0)
                while i != - 1: # œÄ == œÄ[0:i] + œÉ + œÄ[i + len(œÉ):]
                    œá = œÄ[0:i] + œÑ + œÄ[i + len(œÉ):]; œá = œá.replace('  ', ' ')
                    if (œá not in dd) and (œá not in d):
                        if œá.strip() == œâ: return True
                        elif len(œá.strip()) <= len(œâ):
                            if log: print('    ', œÄ, '‚áí', œá)
                            d.add(œá)
                    i = œÄ.find(œÉ, i + 1)
    return False

setattr(Grammar, 'derivable', derivable)

Consider again grammar `G3`:

In [95]:
G3 = Grammar({'a', 'b', 'c'}, {'S'}, {('S', 'b'), ('S', 'a S c')}, 'S')

In [96]:
assert G3.derivable('a a a b c c c')

If `log` is set to `True`, the set of all derivations of length `1`, `2`, `3`, etc., are printed:

In [97]:
assert G3.derivable('a a a b c c c', True)

     S

     S ‚áí a S c
     S ‚áí b

     a S c ‚áí a a S c c
     a S c ‚áí a b c

     a a S c c ‚áí a a a S c c c
     a a S c c ‚áí a a b c c



Grammar `G‚ÇÑ` is expressed as:

In [98]:
G4 = Grammar({'the', 'child', 'children', 'runs', 'run'},
             {'NP', 'VP', 'Ns', 'Vs', 'N‚Çö', 'V‚Çö'},
             {('S', 'NP VP'), ('NP', 'D N‚Çõ'), ('NP', 'D N‚Çö'), ('N‚Çõ VP', 'N‚Çõ V‚Çõ'), ('N‚Çö VP', 'N‚Çö V‚Çö'),
              ('D', 'the'), ('N‚Çõ', 'child'), ('N‚Çö', 'children'), ('V‚Çõ', 'runs'), ('V‚Çö', 'run')},
             'S')

In this particular grammar, spaces around the terminals and nonterminals in productions are not needed. As expected, we have:

In [99]:
assert G4.derivable('the children run', True) and G4.derivable('the child runs')

     S

     S ‚áí NP VP

     NP VP ‚áí D N‚Çõ VP
     NP VP ‚áí D N‚Çö VP

     D N‚Çõ VP ‚áí the N‚Çõ VP
     D N‚Çõ VP ‚áí D N‚Çõ V‚Çõ
     D N‚Çõ VP ‚áí D child VP
     D N‚Çö VP ‚áí the N‚Çö VP
     D N‚Çö VP ‚áí D N‚Çö V‚Çö
     D N‚Çö VP ‚áí D children VP

     D N‚Çõ V‚Çõ ‚áí the N‚Çõ V‚Çõ
     D N‚Çõ V‚Çõ ‚áí D child V‚Çõ
     D N‚Çõ V‚Çõ ‚áí D N‚Çõ runs
     D N‚Çö V‚Çö ‚áí the N‚Çö V‚Çö
     D N‚Çö V‚Çö ‚áí D N‚Çö run
     D N‚Çö V‚Çö ‚áí D children V‚Çö
     the N‚Çõ VP ‚áí the child VP
     the N‚Çö VP ‚áí the children VP

     D N‚Çö run ‚áí the N‚Çö run
     D N‚Çö run ‚áí D children run
     D N‚Çõ runs ‚áí the N‚Çõ runs
     D N‚Çõ runs ‚áí D child runs
     the N‚Çö V‚Çö ‚áí the children V‚Çö
     the N‚Çõ V‚Çõ ‚áí the child V‚Çõ



In [100]:
assert not G4.derivable('the children runs') and not G4.derivable('the child run')

### Historic Notes and Further Reading

The Backus-Naur Form was first proposed by John Backus and then adopted by Peter Naur for the definition of Algol-60. Donald Knuth suggested the name [(Knuth 1964)](#Knuth64). EBNF was proposed by Niklaus Wirth [(Wirth 1977)](#Wirth77).

The original motivation for the classification of grammars came from the study of natural languages. The following examples illustrate the potential use of regular, context-free, and context-sensitive languages (credit for examples: [C. Chesi, Univ. of Siena](http://www.ciscl.unisi.it/master/chesi/lingcomp-2017_18-03_04-formal_grammar.pdf))

- _Right recursion_ (*tail recursion*) of the form `a b‚Åø`:


    [the dog bit [the cat [that chased [the mouse [that ran]]]]]


- _Center embedding_ (*true recursion*) of the form `a‚Åø b‚Åø`:


    [the mouse [(that) the cat [(that) the dog bit] chased] ran]


- _Cross‚Äêserial dependencies_ (*identity recursion*) of the form `w w`:


    John, Mary, and David are a widower, a widow, and a widower, respectively

There is an ongoing discussion on using regular, context-free, and context-sensitive languages for natural languages. The male-female correspondence of the last example can also be seen as a semantic issue rather than a syntactic issue. If one takes the limits of human comprehension into account, the full generality of context-sensitive, context-free, and even regular languages is not needed. As a consequence, further classes of grammars have emerged, e.g. [(Kallmeyer 2010)](#Kallmeyer10).

Grammars can be used for the translation of natural languages: first, the input sentence is parsed according to the grammar of the source language, and then a sentence is generated that satisfies the grammar of the target language. That was, for decades, the dominant approach until computers became fast enough for neural networks, which perform better than grammar-based translation [(Wu et al. 2016)](#WuEtAl16), [(Le and Schuster 2016)](#LeSchuster16).

On the other hand, Chomsky's Hierarchy profoundly impacted computing: for each class of languages, equivalent recognizers are known. Calling languages of unrestricted grammars *recursively enumerable*, we have:

|type   |language              |recognizer              |
|:------|:---------------------|:-----------------------|
|type 0 |recursively enumerable|Turing machine          |
|type 1 |context-sensitive     |linear bounded automaton|
|type 2 |context-free          |pushdown automaton      |
|type 3 |regular               |finite state automaton  |

Regular and context-free languages are ubiquitous as recognizers for those that can be constructed efficiently; the recognizers themselves are, in a certain sense, efficient. The following chapters discuss their use for scanning and parsing.

Even the above examples show the difficulty of writing context-sensitive grammars. After Algol 60 introduced the use of context-free grammars for syntax definition, with Algol 68, an attempt was made to go beyond context-free grammars by using a dedicated "two-level grammar" [(Wijngaarden et al. 1976)](#WijngaardenEtAl76); that kind of grammar was not used for another language. Around the same time, Knuth proposed _attribute grammars_ as a way of associating computation to recognition of a context-free language [(Knuth 68)](#Knuth68). Since then, it has become common to define a programming language with regular and context-free grammars and to use attribute grammars for compilation. Type systems, which can be thought of as context-sensitive grammars, are also used in the definition of some languages [(Cardelli 1996)](#Cardelli96).

The Pascal language and its successors Modula-2 and Oberon have compact EBNF grammars. The [syntax diagrams of the Apple Pascal](https://web.archive.org/web/20211028072457/http://www.pascal-central.com/pascal-syntax.html) can fit on a poster. It was common to hang that poster on a wall next to an Apple Macintosh computer!

### Bibliography

<div class="csl-bib-body" style="line-height: 1.35; margin-left: 2em; text-indent:-2em;">
<a id='Cardelli96'></a> <div class="csl-entry">Cardelli, Luca. 1996. ‚ÄúType Systems.‚Äù <i>ACM Comput. Surv.</i> 28 (1): 263‚Äì64. doi:<a href="https://doi.org/10.1145/234313.234418">10.1145/234313.234418</a>.</div>
<a id='Chomsky56'></a> <div class="csl-entry">Chomsky, N. 1956. ‚ÄúThree Models for the Description of Language.‚Äù <i>IRE Transactions on Information Theory</i> 2 (3): 113‚Äì24. doi:<a href="https://doi.org/10.1109/TIT.1956.1056813">10.1109/TIT.1956.1056813</a>.</div>
<a id='Kallmeyer10'></a> <div class="csl-entry">Kallmeyer, Laura. 2010. <i>Parsing Beyond Context-Free Grammars</i>. Springer-Verlag Berlin Heidelberg. doi:<a href="https://doi.org/10.1007/978-3-642-14846-0">10.1007/978-3-642-14846-0</a>.</div>
<a id='KernighanRitchie88'></a> <div class="csl-entry">Kernighan, Brian W., and Dennis M. Ritchie. 1988. <i>The C Programming Language</i>. 2nd ed. Prentice Hall Professional Technical Reference.</div>
<a id='Knuth64'></a><div class="csl-entry">Knuth, Donald E. ‚ÄúBackus Normal Form vs. Backus Naur Form.‚Äù, Letter to Editor, <i>Communications of the ACM</i>, vol. 7, no. 12, Dec. 1964, pp. 735‚Äì36. <i>Dec. 1964</i>, doi:<a href="https://doi.org/10.1145/355588.365140">10.1145/355588.365140</a>.</div> 
<a id='Knuth68'></a> <div class="csl-entry">Knuth, Donald E. 1968. ‚ÄúSemantics of Context-Free Languages.‚Äù <i>Mathematical Systems Theory</i> 2 (2): 127‚Äì45. doi:<a href="https://doi.org/10.1007/BF01692511">10.1007/BF01692511</a>.</div>
<a id='LeSchuster16'></a> <div class="csl-entry">Le, Quoc V., and Mike Schuster. 2016. ‚ÄúA Neural Network for Machine Translation, at Production Scale.‚Äù <i>Google AI Blog</i> (blog). September 27, 2016. <a href="https://ai.googleblog.com/2016/09/a-neural-network-for-machine.html">https://ai.googleblog.com/2016/09/a-neural-network-for-machine.html</a>.</div>
<a id='WijngaardenEtAl76'></a> <div class="csl-entry">Wijngaarden, A. van, B. J. Mailloux, J. E. L. Peck, C. H. A. Koster, C. H. Lindsey, M. Sintzoff, L. G. L. T. Meertens, and R. G. Fisker, eds. 1976. <i>Revised Report on the Algorithmic Language Algol 68</i>. Berlin Heidelberg: Springer-Verlag. doi:<a href="https://doi.org/10.1007/978-3-642-95279-1">10.1007/978-3-642-95279-1</a>.</div>
<a id='Wirth77'></a><div class="csl-entry">Wirth, Niklaus. 1977. ‚ÄúWhat Can We Do about the Unnecessary Diversity of Notation for Syntactic Definitions?‚Äù <i>Communications of the ACM</i> 20 (11): 822‚Äì23. doi:<a href="https://doi.org/10.1145/359863.359883">10.1145/359863.359883</a>.</div>   
<a id='WuEtAl16'></a> <div class="csl-entry">Wu, Yonghui, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, et al. 2016. ‚ÄúGoogle‚Äôs Neural Machine Translation System: Bridging the Gap between Human and Machine Translation.‚Äù arXiv:<a href="http://arxiv.org/abs/1609.08144">1609.08144 </a>.</div>
</div>