## Syntactic Processing

![](https://wisdomml.in/wp-content/uploads/2023/03/synt1.png)

- Syntactic processing refers to the analysis of the structure of sentences according to the rules of grammar.
- It involves understanding how words in a sentence relate to each other and how they combine to form meaningful phrases and sentences.

Two common approaches to syntactic processing are parsing with
1. Dependency grammar
2. Constituency parsing.

### 1.Parsing with Dependency Grammar:

- **Dependency grammar** is a type of syntactic structure analysis that focuses on the relationships between words in a sentence.
- It represents the *grammatical structure of a sentence as a set of binary asymmetric relationships between words*, where one word is considered the 'head' of the relationship and the other is the 'dependent'.

![](https://raw.githubusercontent.com/ashishpatel26/LLM-Engineering-Crash-Course/main/images/1234.png)

In this dependency parse, "John" is the subject of the verb "eats" and "pizza" is the object of the verb "eats".

Parsing with dependency grammar involves identifying these relationships between words and representing them in a tree-like structure called a dependency parse tree.

In [15]:
import spacy
from IPython.display import display, HTML

# Load the English language model
nlp = spacy.load("en_core_web_sm")

# Define the text to process
text = "The quick brown fox jumps over the lazy dog."

# ========================  Step 1: Tokenization
doc = nlp(text)
print("Tokenization:")
for token in doc:
    print(token.text)

# ========================  Step 2: Part-of-speech tagging
print("\nPart-of-speech tagging:")
for token in doc:
    print(f"{token.text}: {token.pos_}")

# ========================  Step 3: Dependency parsing
print("\nDependency parsing:")
for token in doc:
    print(f"{token.text}: {token.dep_} -> {token.head.text}")

# ========================  Visualization
from spacy import displacy

# ========================  Render the dependency parse tree
html = displacy.render(doc, style="dep", jupyter=False)

# ========================  Display the result in jupyter notebook
display(HTML(html))

Tokenization:
The quick brown fox jumps over the lazy dog . 
Part-of-speech tagging:
The: DET
quick: ADJ
brown: ADJ
fox: NOUN
jumps: VERB
over: ADP
the: DET
lazy: ADJ
dog: NOUN
.: PUNCT

Dependency parsing:
The: det -> fox
quick: amod -> fox
brown: amod -> fox
fox: nsubj -> jumps
jumps: ROOT -> jumps
over: prep -> jumps
the: det -> dog
lazy: amod -> dog
dog: pobj -> over
.: punct -> jumps


### 2.Constituency parsing with Dependency Grammar:

- **Constituency parsing**, also known as ***phrase structure parsing***, focuses on identifying the constituents or phrases in a sentence and the hierarchical relationships between them.
- It breaks down a *sentence into its constituent parts based on the rules of a formal grammar.*

![](https://github.com/ashishpatel26/LLM-Engineering-Crash-Course/blob/main/images/contparse.png?raw=true)

In this constituency parse, the sentence is divided into noun phrases (NP) and verb phrases (VP), and further into their constituent parts such as determiners (Det), nouns (Noun), and verbs (Verb).

Constituency parsing generates parse trees that represent the hierarchical structure of sentences according to the rules of a given grammar.

In [29]:
import spacy
from spacy import displacy
from IPython.display import display, HTML

# Load the English language model
nlp = spacy.load("en_core_web_sm")

# Sample sentence
sentence = "The cat chased the mouse."

# Process the sentence using SpaCy
doc = nlp(sentence)

# Extract the tokens and part-of-speech tags
tokens = [token.text for token in doc]
pos_tags = [token.pos_ for token in doc]

# Create the constituency parse in CoNLL format
conll_format = "\n".join([f"{i+1}\t{token}\t{pos}\t{pos}\t_\t_\t0\troot\t_\t_"
                          for i, (token, pos) in enumerate(zip(tokens, pos_tags))])

# Display the constituency parse using HTML rendering
html = displacy.render(doc, style="dep", options={"compact": True, "bg": "#09a3d5", "color": "white"})

# Display the HTML
display(HTML(html))