# syntactic parsing depends on grammar 

Expressive power: Dependency parsing < Constituency parsing (context-free < context-sensitive)


<table>
  <tr>
    <th rowspan="2">Syntactic Parsing type</th>
    <th colspan="3">Grammar</th>
    <th rowspan="2" colspan="2">Parsing Algorithm</th>
    <th rowspan="2">Complexity</th>
  </tr>
  <tr>
    <td>Type</td>
    <td>Chomsky Hierarchy</td>
    <td>Subtype</td>
  </tr>
  <tr>
    <td rowspan='8'>Constituency Parsing</td>
    <td rowspan='8'>Formal grammar</td>
    <td colspan="2">Regular grammar<br>(Type 3)</td>
    <td colspan="2">Finite State Automata</td>
    <td>O(n)</td>
  </tr>
  <tr>
    <td rowspan="3">Context-Free grammar<br>(Type 2)</td>
    <td>X-bar theory</td>
    <td rowspan="3">Push-down Automata</td>
    <td>Earley, CYK, Recursive Descent, Shift-Reduce</td>
    <td>O(n^3)</td>
  </tr>
  <tr>
    <td>Lexicalized (HPSG)</td>
    <td>Earley, CYK</td>
    <td>O(n^5)</td>
  </tr>
  <tr>
    <td>Probabilistic</td>
    <td>Earley, CYK</td>
    <td>O(n^3)</td>
  </tr>
  <tr>
    <td rowspan="2">Mildly Context-Sensitive grammar <br>(unification grammar)</td>
    <td>Tree Adjoining Grammar</td>
    <td colspan="2">Earley</td>
    <td>O(n^6)</td>
  </tr>
  <tr>
    <td>Combinatory Categorial Grammar</td>
    <td colspan="2">CYK</td>
    <td>O(n^7)</td>
  </tr>
  <tr>
    <td colspan="2">Context-Sensitive grammar<br>(Type 1)</td>
    <td colspan="2">Linear Bounded Automaton</td>
    <td>O(n^k), k > 1</td>
  </tr>
  <tr>
    <td colspan="2">Recursively Enumerable grammar<br>(Type 0)</td>
    <td colspan="2">Turing machines</td>
    <td>Unbounded</td>
  </tr>
  <tr>
    <td rowspan='2'>Dependency Parsing</td>
    <td rowspan="2" >Dependency Grammar</td>
    <td colspan='2'>Projective</td>
    <td colspan="2">MaltParser, Neural network</td>
    <td>O(n^2)</td>
  </tr>
  <tr>
    <td colspan='2'>Non-projective</td>
    <td colspan="2">Maximum Spanning Tree, Neural network</td>
    <td>O(n^3) or better</td>
  </tr>
</table>


## definition

parsing: given a **grammar**, assign a tree structure to a sentence

## dataset

| Dataset                  | Short Description                               | Data Source                                          | Data Size     |
|--------------------------|-------------------------------------------------|------------------------------------------------------|--------------|
| Penn Treebank            | Annotated corpus of English                     | Wall Street Journal, Brown Corpus, Switchboard      | ~4.5M words  |
| WSJ subset               | Subset of Penn Treebank with WSJ articles       | Wall Street Journal                                  | ~1M words    |
| Brown Corpus             | Annotated corpus of English from various genres | Novels, Newspapers, Academic texts, etc.            | ~1M words    |
| Universal Dependencies   | Treebanks for various languages                 | Various sources (e.g., UD English: Penn Treebank)   | Varies       |
| Chinese Treebank         | Annotated corpus of Chinese                     | People's Daily, Xinhua News Agency, etc.            | ~500K words  |
| Prague Dependency Treebank | Multilayer annotated corpus of Czech          | Czech National Corpus, Czech Academic Corpus, etc.  | ~1.5M words  |


## application

• Grammar checking

• Question answering

• Machine translation

• Information extraction

• Speech generation

• Speech understanding

## challenges

Complexity and Ambiguity:

- compared to computer language, human language Words don't have explicit types and no brackets to indicate phrase boundaries.


- Ambiguity exists at multiple levels, including word meanings, parses, and implied information.


Parse Tree Structures:

- Sentences can have none or multiple valid parse tree structures

- Context-free grammars (CFG) is **declarative** don't specify how to construct parse trees, requiring additional parsing algorithms.

## analysis of algorithm

- Growth property: how size of parse tree grows w.r.t size of sentence. 

- Parsing complexity: computational resources (time and space) required to parse a sentence.

## evaluation metrics



### constituency parsing

Parseval precision and recall: 

- Precision = (Number of correct constituents in the parsed output) / (Total number of constituents in the parsed output)

- Recall = (Number of correct constituents in the parsed output) / (Total number of correct constituents in the gold standard parse)

Labeled Precision and Recall: consider proper constituents with **correct non-terminal labels**. 

- Labeled Precision = (Number of correct constituents with correct non-terminal labels in the parsed output) / (Total number of constituents in the parsed output)

- Labeled Recall = (Number of correct constituents with correct non-terminal labels in the parsed output) / (Total number of correct constituents with correct non-terminal labels in the gold standard parse)

F1 score: harmonic mean of precision and recall:

- F1 = 2 * (precision * recall) / (precision + recall)

Crossing Brackets: degree of structural differences between two parses. 

- \# bracketing of constituents in parsed output differs from bracketing in gold standard parse. 

    e.g.,, crossing bracket of "(A (B C))" and "((A B) C)" = 1.

### Dependency Parsing

Attachment Score
- \# correct deps/# deps (attached to the right head)
- Unlabeled dependency accuracy (UAS)
- Labeled dependency accuracy (LAS)

# Automata
Automata theory deals with abstract machines and their ability to recognize patterns or solve problems. Important types of automata include:

- Deterministic Finite-State Automata (DFA): A simple computational model with a finite number of states and transitions between those states based on input symbols. DFAs have a unique transition for each state and input symbol.

- Non-deterministic Finite-State Automata (NFA): Similar to DFAs, but they allow multiple transitions for the same state and input symbol. NFAs can lead to multiple possible states for a given input.

- Push-down Automata (PDA): An extension of finite-state automata that includes a stack, allowing them to recognize context-free languages.


# formal grammar

Formalism 形式框架: mathematical framework used to describe a grammar, represent the structure and properties of a language.

A formal grammar defines rules for generating sentences of a language, consisting of a finite set of nonterminal symbols, terminal symbols, a start symbol, and a set of production rules. 

The types of formal grammar, as defined by the Chomsky hierarchy, are:

- Type 3: Regular grammar (finite-state automata)

- Type 2: Context-Free grammar (pushdown automata)

- Type 1: Context-Sensitive grammar (linear-bounded automaton)

- Type 0: Recursively Enumerable grammar (Turing machines)

# context-free grammar (CFG)

## definition

4-tuple $(N, \Sigma, R, S)$

- $N$: non-terminal symbols

- $\Sigma$: terminal symbols

- $R: A \to \beta$ context-free rules

    - $A$ is non-terminal symbol

    - $\beta$ is a string from union $(\Sigma \cup N)$

- $S$: start symbol from $N$

production rule: a grammar rule that specifies how non-terminal symbols can be replaced by other non-terminal or terminal symbols to generate valid sentences in the language defined by the grammar

- S → NP VP | CP VP

    A sentence (S) can be formed by combining a noun phrase (NP) and a verb phrase (VP) or a complementizer phrase (CP) and a verb phrase (VP).

- NP → (DT) (JJ*) N (CP) (PP*) 

    A noun phrase (NP) can be formed with an optional determiner (DT), zero or more adjectives (JJ), a noun (N), and optional complementizer phrase (CP) and/or prepositional phrase(s) (PP).

- VP → V (NP) (NP) (PP) | V (NP) (CP) (PP*)

    A verb phrase (VP) can be formed with a verb (V) followed by optional noun phrases (NP) and/or prepositional phrase(s) (PP), or a verb (V) followed by an optional noun phrase (NP), complementizer phrase (CP), and/or prepositional phrase(s) (PP).

- PP → P NP 

    A prepositional phrase (PP) is formed by combining a preposition (P) and a noun phrase (NP).

- CP → C S

    A complementizer phrase (CP) is formed by combining a complementizer (C) and a sentence (S).

each symbol represents a syntactic category

- **non-terminal symbol**: non-conjunctive (see syntax notebook), such as noun phrases (NP), verb phrases (VP), adjectives (JJ)

- **terminal symbol**: conjunction 连词. connect phrases and clauses. e.g., and, or, but. VP -> VP and VP

## limitation

Context-free grammars (CFGs) have limitations in capturing agreement features, subcategorization frames, and accounting for context-dependent rules. 


- agreement 主谓一致

    agreement features (number, person, tense, case, and gender) have to be explicitly encoded as separate rules in the grammar, lead to a large number of rules and increased complexity.

- Subcategorization frames 动词子语类框架: 

    Different verbs can take different types of complements (e.g., direct objects, empty (vi), prepositional phrases, predicative adjectives, Bare/to infinitive 不定式, Participial phrase, That-clause, question-form clasues), and capturing these distinctions result in a large number of rules.

- context-dependent rules: CFGs make assumption that probability of a rule's application is independent of its context. However, this assumption does not always hold true。

    When considering all noun phrases (NPs) in general, 11% are followed by a prepositional phrase (PP), 9% are determiner-noun (DT NN) structures, and 6% are pronouns (PRP).

    When NPs appear in the subject position (under S), 9% are followed by a PP, 9% are DT NN structures, and 21% are pronouns.

    When NPs appear in the object position (under VP), 23% are followed by a PP, 7% are DT NN structures, and 4% are pronouns.

## Probabilistic context-free grammar

Definition: A 4-tuple $(N, \Sigma, R, S)$

- N: non-terminal symbols
- $\Sigma$: terminal symbols (disjoint from N)
- R: rules $(A \to \beta) [p]$
  - A is a non-terminal symbol
  - $\beta$ is a string from $(\Sigma \cup N)*$
  - p is the probability $P(\beta|A)$
- S: start symbol (from N)

## Lexicalized Grammar

- basis: X-bar theory and HPSG (Head-driven Phrase Structure Grammar)


- Use **head** of a phrase as additional information, label the phrase with its head word

    head: a word in a phrase that determines syntactic and semantic properties of that phrase. 

    child: other words or constituents in a phrase that depend on the head. 

    Constituents receive their heads from their head child???


### Head-driven Phrase Structure Grammar (HPSG)

Head-driven Phrase Structure Grammar (HPSG) describe linguistic objects like words, phrases, and sentences are described as bundles of features with values.

<table>
  <tr>
    <th>Eat pizza with pepperoni</th>
    <th>Eat pizza with fork</th>
  </tr>
  <tr>
    <td>
      <pre>
[eat
 CATEGORY = verb,
 ARGS = < [pizza
           CATEGORY = noun,
           MODIFIER = < [with
                         CATEGORY = preposition,
                         ARGS = < [pepperoni
                                   CATEGORY = noun]]>]>]>
      </pre>
    </td>
    <td>
      <pre>
[eat
 CATEGORY = verb,
 ARGS = < [pizza
           CATEGORY = noun,
           MODIFIER = < [with
                         CATEGORY = preposition,
                         ARGS = < [<span style="color:red">fork</span>
                                   CATEGORY = noun]]>]>]>
      </pre>
    </td>
  </tr>
</table>


<h5>HPSG vs. X-bar Theory<h5/>
<table>
  <tr>
    <th>Aspect</th>
    <th>HPSG</th>
    <th>X-bar Theory</th>
  </tr>
  <tr>
    <td>Formalism</td>
    <td>Constraint-based</td>
    <td>Transformational-generative</td>
  </tr>
  <tr>
    <td>Representation</td>
    <td>Feature structures</td>
    <td>Tree structures</td>
  </tr>
  <tr>
    <td>Expressiveness</td>
    <td>More expressive, can capture complex linguistic phenomena</td>
    <td>Less expressive, mainly focused on phrase structure</td>
  </tr>
  <tr>
    <td>Integration of syntax, semantics, and morphology</td>
    <td>Integrated within a single formalism</td>
    <td>Separate components in the theory</td>
  </tr>
</table>
