<a href="https://colab.research.google.com/github/babupallam/Msc_AI_Module2_Natural_Language_Processing/blob/main/L05-Analyzing%20Sentence%20Structure/Note_2_Use_of_Syntax_and_Context_Free_Grammar.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **2. What's the Use of Syntax?**



- Syntax helps us understand and create meaningful sentences.
  - For example, in the sentence "The cat chased the mouse," syntax tells us that "the cat" is performing the action and "the mouse" is receiving it.
- It provides a structure for organizing words to make sense.
  - Without syntax, word arrangements like "mouse the cat chased" would be confusing or meaningless.
- Syntax helps reduce confusion and makes language easier to understand.
  - It clarifies the roles of different words in a sentence.



### **2.1 Beyond n-grams**



#### **Syntax and Its Role in Language Understanding**
  - Syntax defines rules that help us understand relationships between words.
  - For example, "The cat chased the mouse" uses syntax to tell us who did the action (the cat) and what was acted upon (the mouse).
  - Without clear syntax, the relationships between words can become unclear, leading to confusion.



#### **Syntax as a System for Structuring Words**
  - Syntax arranges words in a way that makes meaning clear.
  - For example, changing word order can change the meaning completely:
    - "The dog bites the man" is very different from "The man bites the dog."
    - In both cases, syntax helps us understand who is doing the action.



#### **Limitations of n-gram Models**

N-gram models predict the next word based on a fixed number of preceding words, which leads to several limitations in understanding complex grammar and relationships in sentences:



##### **1. Limited Context and Lack of Grammatical Understanding**

- N-gram models consider only a fixed number of preceding words, lacking an understanding of grammatical rules such as subject-verb agreement or sentence structure.
- **Example:**
  - A bigram model (n=2) may correctly predict common sequences like "the cat" or "a dog" because they frequently appear together.
  - For a longer sequence like "the cat the dog chased ran away," an n-gram model may consider this valid based on individual word pairs (e.g., "the cat," "cat the"), but the resulting sentence is not grammatically correct.



##### **2. Inability to Capture Long-Distance Dependencies**

- N-gram models struggle with long-distance relationships between words, which are crucial for complex grammatical structures.
- **Example:**
  - Consider the sentence: "The dog that runs fast is happy."
  - The subject "dog" is linked to the verb "is," but an n-gram model that only uses fixed word sequences (e.g., trigrams) cannot easily establish this connection due to the intervening phrase "that runs fast."
  
  - **Demonstration in Python:**


In [None]:
# We are importing the 'bigrams' function from the nltk library.
# 'bigrams' helps us take a sentence and create pairs of consecutive words.
from nltk import bigrams

# This is a sentence in the form of a string: "The dog that runs fast is happy".
# The .split() method breaks this sentence into separate words, creating a list of words.
# After the split, the sentence becomes: ['The', 'dog', 'that', 'runs', 'fast', 'is', 'happy'].
sentence = "The dog that runs fast is happy".split()

# Now we use 'bigrams' to create pairs of consecutive words from the sentence.
# For example, the first pair will be ('The', 'dog'), the second pair ('dog', 'that'), and so on.
# The bigrams function creates an iterator, so we use list() to turn it into a list we can print or use.
bi_grams = list(bigrams(sentence))

# Finally, we print the list of word pairs (bigrams).
print(bi_grams)


[('The', 'dog'), ('dog', 'that'), ('that', 'runs'), ('runs', 'fast'), ('fast', 'is'), ('is', 'happy')]



##### **3. Lack of Generalization**

- N-gram models rely heavily on statistical frequencies, making them poor at generalizing beyond frequently occurring word sequences.
- **Example:**
  - A trigram model may predict a common sequence like "I am going" but fail to generate a less common but grammatically valid phrase like "I am contemplating."
  - This is because the model's predictions are biased toward frequent word combinations, limiting creativity and generalization.


## **3. Context-Free Grammar**

- Context-Free Grammar (CFG) is a way to describe the structure of sentences using rules.
- CFG is widely used in natural language processing to represent how sentences are built.



### **3.1 A Simple Grammar**



#### **Introduction to CFG**
  - CFG uses a set of rules to create valid sentences.
  - These rules break down sentences into smaller parts like phrases and words.



#### **Components of Context-Free Grammar (CFG)**

A Context-Free Grammar (CFG) is a formalism used to describe the syntax of a language. CFGs are widely used in natural language processing and computational linguistics to represent the rules that govern the structure of valid sentences. CFGs consist of several key components that work together to define how a sentence can be constructed.



##### **1. Terminals**
- **Definition**: Terminals are the basic symbols or words that appear in a sentence. These are the actual words of the language and cannot be broken down further.
- **Examples**:
  - Words like "cat," "runs," "the," and "dog" are terminals.
  - In the rule `NP -> Det N`, the words "the" and "cat" are terminals that appear in the final generated sentence.



##### **2. Non-terminals**
- **Definition**: Non-terminals are symbols used to represent parts of a sentence. These symbols are placeholders that can be expanded using production rules until they are replaced by terminals.
- **Examples**:
  - Symbols like `S`, `NP`, `VP`, `Det`, and `N` are non-terminals.
  - `NP` represents a noun phrase, `VP` represents a verb phrase, and `S` represents a complete sentence.
  - These symbols are used to break down the sentence into its constituent parts during parsing.



##### **3. Rules/Productions**
- **Definition**: Production rules define how non-terminals can be expanded into terminals or other non-terminals. Each production rule describes a possible structure or combination of components that can form a valid part of the sentence.
- **Examples of Production Rules**:
  - `S -> NP VP`: This rule states that a sentence (`S`) consists of a noun phrase (`NP`) followed by a verb phrase (`VP`).
  - `NP -> Det N`: This rule indicates that a noun phrase (`NP`) consists of a determiner (`Det`) followed by a noun (`N`).
  - `VP -> V NP`: This rule defines that a verb phrase (`VP`) consists of a verb (`V`) followed by a noun phrase (`NP`).
- **Explanation**:
  - Production rules are applied recursively, expanding non-terminals until they result in a sequence of terminals, thereby generating a valid sentence.



##### **4. Example Sentence Generation**
- **Using the Production Rules**:
  - Consider the following production rules:
    - `S -> NP VP`
    - `NP -> Det N`
    - `VP -> V NP`
    - `Det -> 'the'`
    - `N -> 'dog' | 'cat'`
    - `V -> 'chased'`
  - By applying these rules, we can generate the sentence "The dog chased the cat."
    - Step-by-step expansion:
      1. Start with `S`.
      2. Apply `S -> NP VP` to get `NP VP`.
      3. Expand `NP` using `NP -> Det N` to get `Det N`.
      4. Expand `Det` to `the` and `N` to `dog`, resulting in `the dog`.
      5. Expand `VP` using `VP -> V NP` to get `V NP`.
      6. Expand `V` to `chased` and `NP` to `Det N` again.
      7. Expand `Det` to `the` and `N` to `cat`, resulting in `the cat`.
      8. The final sentence is "The dog chased the cat."


### **3.2 Writing Your Own Grammars**



#### **Techniques for Writing CFGs in NLTK**
  - NLTK is a popular Python library used for natural language processing, and it allows users to define CFGs.
  - This grammar can generate sentences like "The dog chased the cat" or "The cat saw the dog."


In [None]:
from nltk import CFG

# Define a simple CFG using the 'fromstring' method.
# The grammar rules are specified inside a triple-quoted string.
# Remove comments or place them on separate lines to avoid errors.
grammar = CFG.fromstring("""
S -> NP VP
NP -> Det N
VP -> V NP
Det -> 'the'
N -> 'dog' | 'cat'
V -> 'chased' | 'saw'
""")


#### **Practical Considerations in Grammar Creation**
  - Make sure your grammar covers enough types of sentences without allowing incorrect ones.
  - For example, the rule `S -> NP VP` is flexible enough to generate many valid sentences but also specific enough to avoid nonsense sentences.
  - CFGs can sometimes produce multiple valid structures for the same sentence, leading to ambiguity.
    - For instance, "I saw the man with the telescope" can mean either that you used the telescope to see the man or that the man had a telescope. Carefully designed rules can help manage such ambiguity.



### **3.3 Recursion in Syntactic Structure**



#### **Understanding Recursion in CFG**
  - Recursion allows CFGs to represent complex and nested sentence structures.
  - For example:
    - `NP -> NP PP`
    - `PP -> P NP`
    - `P -> 'with'`
    - `N -> 'telescope'`
  - These rules allow for sentences like "The man with the telescope with the lens," where phrases are nested within each other.



#### **Direct and Indirect Recursion**
  - **Direct Recursion**: A rule refers to itself directly.
    - Example: `NP -> NP PP`.
    - This allows phrases to expand repeatedly, such as "The book on the table in the room."
  - **Indirect Recursion**: A rule refers back to itself through other rules.
    - Example: `S -> NP VP` and `VP -> V S`.
    - This allows sentences to have embedded clauses, like "He said that she knew the answer."
  - Recursion is important for capturing how language can include phrases within phrases, such as clauses in sentences or prepositional phrases within noun phrases.

Using syntax and Context-Free Grammars helps us better understand the structure of language. CFGs provide clear rules for generating sentences and help us model language features like recursion, which are essential for natural language processing and understanding. Examples help us see how these rules work in practice, making it easier to grasp how sentences are formed and analyzed.
    