<a href="https://colab.research.google.com/github/RajarajachozhanVK/RajarajachozhanVK/blob/main/Probabilistic_Context_Free_Grammars_(PCFG).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Introduction to Probabilistic Context-Free Grammars (PCFG)**

Aim:
To develop Probabilistic Context-Free Grammars (PCFGs) that can handle natural language ambiguity by allocating probability to several parse
trees and selecting the most likely one.
1. Introduction to Probabilistic Context-Free Grammars (PCFG) in Natural Language Processing:
Probabilistic Context-Free Grammars (PCFGs) are an extension of Context-Free Grammars (CFGs) that incorporate probabilities into the
production rules. This allows them to handle the inherent ambiguity in natural languages by assigning probabilities to different parse trees and
choosing the most likely one. PCFGs are widely used in natural language processing (NLP) for tasks such as syntactic parsing, where the goal
is to determine the grammatical structure of a sentence.

1.1 Context-Free Grammar (CFG):
A CFG consists of a set of production rules that define how symbols in the language can be replaced with other symbols.
A CFG is defined by a tuple(𝑁,Σ,𝑃,𝑆):
𝑁: A set of non-terminal symbols.
Σ: A set of terminal symbols (the actual words).
𝑃: A set of production rules.
𝑆: A start symbol.


1.2 Probabilistic Extension:
In a PCFG, each production rule is assigned a probability.
The probabilities of all production rules for a given non-terminal symbol sum to 1.
The probability of a parse tree is the product of the probabilities of the rules used to generate it.

2. Example of a PCFG
Consider the following PCFG:
  1. S -> NP VP [1.0]
  2. VP -> V NP [0.7] | V [0.3]
  3. V -> 'eats' [0.5] | 'sleeps' [0.5]
  4. NP -> 'John' [0.5] | 'Mary' [0.5]
This grammar defines simple sentences with a subject (NP) and a verb phrase (VP).
Each production rule has an associated probability.

3. Parsing with PCFGs
The parsing process involves generating all possible parse trees for a given sentence and then computing the probability of each tree. The tree
with the highest probability is typically chosen as the best parse.

4. Python Implementation Using NLTK

4.1 Importing Libraries:
import nltk
from nltk import PCFG
from nltk.parse import pchart
Importing the necessary modules from the NLTK library. PCFG is used to define a probabilistic context-free grammar, and pchart is used
for parsing.

4.2 Defining a PCFG:
pcfg = PCFG.fromstring("""
  S -> NP VP [1.0]
  VP -> V NP [0.7] | V [0.3]
  V -> 'eats' [0.5] | 'sleeps' [0.5]
  NP -> 'John' [0.5] | 'Mary' [0.5]
""")

A probabilistic context-free grammar (PCFG) is defined using PCFG.fromstring.
The grammar has rules with associated probabilities. For example, VP -> V NP [0.7] means that a verb phrase (VP) can consist of a verb
(V) followed by a noun phrase (NP) with a probability of 0.7.
4.3 Creating a Parser:
parser = pchart.InsideChartParser(pcfg)
An InsideChartParser is created using the defined PCFG. This parser will be used to parse sentences according to the grammar.

4.4 Defining a Sentence to Parse:
sentence = "John eats".split()
The sentence "John eats" is split into a list of words ['John', 'eats'] for parsing.

4.5 Parsing the Sentence and Printing Parse Trees with Probabilities:
for tree in parser.parse(sentence):
    print(tree)
    print("Probability:", tree.prob())
(S (NP John) (VP (V eats))) (p=0.075)
Probability: 0.075
The sentence is parsed using the parser.parse method.
For each parse tree generated, the tree structure and its probability are printed.

5.6 Additional Function to Compute and Print Probabilities of All Parse Trees:
def compute_parse_probabilities(sentence, parser):
    print(f"\nParsing sentence: {' '.join(sentence)}")
    for tree in parser.parse(sentence):
        tree.pretty_print()
        print(f"Probability: {tree.prob()}\n")
This function takes a sentence and a parser as input.
It parses the sentence and prints each parse tree using tree.pretty_print().
The probability of each parse tree is also printed.

5.7 Example Usage with Multiple Sentences:
sentences = ["John eats", "Mary eats", "John sleeps", "Mary sleeps"]
for sentence in sentences:
    compute_parse_probabilities(sentence.split(), parser)


In [1]:
import nltk
from nltk import PCFG
from nltk.parse import pchart

In [2]:
# Define a simple PCFG
pcfg = PCFG.fromstring("""
  S -> NP VP [1.0]
  VP -> V NP [0.7] | V [0.3]
  V -> 'eats' [0.5] | 'sleeps' [0.5]
  NP -> 'John' [0.5] | 'Mary' [0.5]
""")
# Create a parser using the PCFG
parser = pchart.InsideChartParser(pcfg)
# Define a sentence to parse
sentence = "John eats".split()
# Parse the sentence and print the parse trees with their probabilities
for tree in parser.parse(sentence):
    print(tree)
    print("Probability:", tree.prob())
# Additional function to compute and print probabilities of all parse trees
def compute_parse_probabilities(sentence, parser):
    print(f"\nParsing sentence: {' '.join(sentence)}")
    for tree in parser.parse(sentence):
        tree.pretty_print()
        print(f"Probability: {tree.prob()}\n")
# Example usage with multiple sentences
sentences = ["John eats", "Mary eats", "John sleeps", "Mary sleeps"]
for sentence in sentences:
    compute_parse_probabilities(sentence.split(), parser)


(S (NP John) (VP (V eats))) (p=0.075)
Probability: 0.075

Parsing sentence: John eats
      S      
  ____|___    
 |        VP 
 |        |   
 NP       V  
 |        |   
John     eats

Probability: 0.075


Parsing sentence: Mary eats
      S      
  ____|___    
 |        VP 
 |        |   
 NP       V  
 |        |   
Mary     eats

Probability: 0.075


Parsing sentence: John sleeps
      S        
  ____|____     
 |         VP  
 |         |    
 NP        V   
 |         |    
John     sleeps

Probability: 0.075


Parsing sentence: Mary sleeps
      S        
  ____|____     
 |         VP  
 |         |    
 NP        V   
 |         |    
Mary     sleeps

Probability: 0.075

