## How to Describe the Linguistic Structure of Sentences?
To describe the structure of sentences, people proposed two views of the linguistic structure of sentences. They are:

1. Phrase Structure(Constituency) - context-free grammars

2. Dependency Structure

### Phrase Structure
Phrase Structure organises words into nested constituents. Words constitute phrases of different part of speech. And Phrases constitute larger phrases. For example, we have a lexicon:dog, cat, the, a, sit, on, table, cuddly, by, door... Then we can generate the following grammars:

1. NP → Det (AdjP*) N (PP*)
2. PP → P NP
3. VP → V (NP) (PP*)
4. S  → NP VP
5. AdjP → Adj
6. etc.

With these phrase structure grammars, we can parse the linguistic structure of sentences. Zum Beispiel, "A cuddly cat sits on the table by the door" :
1. NP1 → "the door" → Det + N
2. PP1 → "by" + NP1
3. NP2 → "the table"
4. NP3 → NP2 + PP1
5. PP2 → "on" + NP3
6. VP1 → "sits" + PP2
7. NP4 → "a cuddly cat" → Det + Adj + N

Finally, NP4 and VP1 forms the S(entence).
<img src="./phrase_structure_eg.jpg" width="400" height="300">

### Dependency Structure
Dependency structure shows which words depend on (modify, attach to, or are arguments of) which other words. Every word in a sentence(but one exception) depend on another word, i.e. the **"head"**. And the word that depends is the **"dependent"**. We can use directed arrows to represent this relation. Yet, there are two agreements on the direction of the arrows: **Prague direction**/**UD(Universal Dependencies)-style**(dependent→head) and **Tesnière direction**(head→dependent). In cs224n, the latter is adopted but I prefer the former, which I am gonna adopt.

To parse the dependency structure of a sentence, we don't have to find all **"heads"** and **"dependents"**. Instead, we find all dependency relations. Common dependency relations are listed as follows:

1. nsubj: 是head的名词性主语 (is the nominal subject of head)
2. obj: 是head的直接宾语 (is the direct object of head)
3. iobj: 是head的间接宾语 (is the indirect object of head)
4. det: 是head的限定词 (is the determiner of head)
5. amod: 是head的形容词性修饰语 (is the adjectival modifier of head)
6. advmod: 是head的副词性修饰语 (is the adverbial modifier of head)
7. mark: 是head的从句标记，包括引导词以及不定式、for等 (is the clause marker of head)
8. case: 是head的格标记，例如mit是Dativ的case，für是Akusativ的case (is the case marker of head)
9. ROOT: 依赖ROOT (dependend on the root)

    ......

For example, "Look in the crate in the kitchen by the door", we can do the following analysis:

1. the $\overset{\text{det}}{\longrightarrow}$ crate
2. in $\overset{\text{case}}{\longrightarrow}$ crate
3. the $\overset{\text{det}}{\longrightarrow}$ kitchen
4. in $\overset{\text{case}}{\longrightarrow}$ kitchen
5. the $\overset{\text{det}}{\longrightarrow}$ door
6. by $\overset{\text{case}}{\longrightarrow}$ door
7. door $\overset{\text{nmod}}{\longrightarrow}$ kitchen
8. kitchen $\overset{\text{obl}}{\longrightarrow}$ crate
9. crate $\overset{\text{obj}}{\longrightarrow}$ look
10. look $\overset{\text{ROOT}}{\longrightarrow}$ ROOT

In the UD-style agreement that I adopt here, the words with no outward arrows, namely, the words that originally depend on no other words,  are the words to point to a fake ROOT. The word that should point to a ROOT here is "look". Why we need a fake root? Because a sentence may have multiple core words, to draw a strictly defined tree rather than a forest, a ROOT is need to connect different core words.
<center>
<img src="./dependency_structure_eg.jpeg" width="400" height="300">
</center>
And then we can draw a tree graph which is connected, acyclic and has a single root. This gives us the dependency tree analysis.(I missed a ROOT here which should be the destination of "look")
<center>
<img src="./dependency_tree_graph.jpg" width="200" height="100">
</center>

### From Grammar Rules to Annotated Data
Dependency Structure proved to be more suitable for data-driven NLP. Here's a teeny introduction of the history:

In early times, linguists would write specific dependency grammars just to build one particular parser. And when the parser is finished, people evaluate it very subjectively. For example, you type in a sentence and see what the parser outputs. And then you stare at it and contemplate "Umm it looks fairly good" or "Well it's a piece of sh*t".

The advent of treebanks fundamentally changed this process.". If we build a treebank with annotated dependency parsing data, we can reuse the treebank to build parsers, part-of-speech tagger, etc. In addition, a treebank can also function as an evalution, prodiving a more quantitative perspective. It is sort of like the concept of "test set" in machine learning.

### Learning to Parse with Treebank
For now, we have a powerful treebank [UD](https://universaldependencies.org). Yet, directly using the original features provided by the treebank doesn't give us satisfying performance. Naturally, we want to utilise linguistic prior knowledge, i.e., sources of information:

1. Bilexical affinities:
2. Dependency distance:
3. Intervening material:
4. Valency of heads: 