<a href="https://colab.research.google.com/github/Alamodi123/Alamodi123/blob/main/NLP_lab_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [5]:
import nltk
from nltk import word_tokenize, pos_tag
from nltk.chunk import RegexpParser

# Download required resources (run once)
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger_eng')
nltk.download('maxent_ne_chunker')
nltk.download('words')

def demo_phrase_chunking():
    """
    Demonstrates basic noun phrase chunking using regular expressions
    """
    sentences = [
        "The quick brown fox jumps over the lazy dog.",
        "A beautiful butterfly landed on the colorful flower.",
        "The experienced teacher explained the complex concept clearly."
    ]

    # Define grammar for NP chunking
    # NP: {<DT>?<JJ>*<NN.*>+} matches:
    # - Optional determiner (DT): the, a, an
    # - Zero or more adjectives (JJ)
    # - One or more nouns (NN, NNS, NNP, NNPS)
    grammar = r"""
    NP: {<DT|PP\$>?<JJ>*<NN.*>+}
    PP: {<IN><NP>}
    VP: {<VB.*><NP|PP|CLAUSE>+}
    """
    # Create parser
    cp = RegexpParser(grammar)

    for sentence in sentences:
        print(f"\nSentence: {sentence}")
        # Tokenize and POS tag
        tokens = word_tokenize(sentence)
        pos_tags = pos_tag(tokens)
        print(f"POS Tags: {pos_tags}")

        # Parse
        tree = cp.parse(pos_tags)
        print(f"Parse Tree:\n{tree}")

        # Extract noun phrases
        noun_phrases = []
        for subtree in tree.subtrees():
            if subtree.label() == 'NP':
                noun_phrases.append(' '.join(word for word, tag in subtree.leaves()))
        print(f"Extracted Noun Phrases: {noun_phrases}")

# Run demo
demo_phrase_chunking()

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping chunkers/maxent_ne_chunker.zip.
[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Unzipping corpora/words.zip.



Sentence: The quick brown fox jumps over the lazy dog.
POS Tags: [('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN'), ('.', '.')]
Parse Tree:
(S
  (NP The/DT quick/JJ brown/NN fox/NN)
  (VP jumps/VBZ (PP over/IN (NP the/DT lazy/JJ dog/NN)))
  ./.)
Extracted Noun Phrases: ['The quick brown fox', 'the lazy dog']

Sentence: A beautiful butterfly landed on the colorful flower.
POS Tags: [('A', 'DT'), ('beautiful', 'JJ'), ('butterfly', 'NN'), ('landed', 'VBD'), ('on', 'IN'), ('the', 'DT'), ('colorful', 'JJ'), ('flower', 'NN'), ('.', '.')]
Parse Tree:
(S
  (NP A/DT beautiful/JJ butterfly/NN)
  (VP landed/VBD (PP on/IN (NP the/DT colorful/JJ flower/NN)))
  ./.)
Extracted Noun Phrases: ['A beautiful butterfly', 'the colorful flower']

Sentence: The experienced teacher explained the complex concept clearly.
POS Tags: [('The', 'DT'), ('experienced', 'JJ'), ('teacher', 'NN'), ('explained', 'VBD'), ('the',

## Phrase Chunking Demonstration

This code cell demonstrates **Phrase Chunking**, a fundamental technique in Natural Language Processing (NLP) for identifying and grouping grammatically related words into meaningful phrases.

### What is Phrase Chunking?
Phrase chunking (also known as shallow parsing) involves breaking down a sentence into non-overlapping, syntactically correlated parts (chunks). These chunks are typically Noun Phrases (NP), Verb Phrases (VP), and Prepositional Phrases (PP).

### How it works (in this code):
1.  **Tokenization**: The sentence is first split into individual words.
2.  **Part-of-Speech (POS) Tagging**: Each word is assigned a grammatical category (e.g., noun, verb, adjective, determiner).
3.  **Regular Expression Parser**: A set of regular expression rules (the `grammar` variable) defines patterns for how POS tags combine to form specific types of phrases.
4.  **Parsing and Extraction**: The parser applies these rules to the POS-tagged sentence to identify and extract the defined phrases.

### Why is Phrase Chunking Useful?
Phrase chunking is a crucial step in NLP pipelines for several reasons:
*   **Information Extraction**: Helps in identifying entities (like people, organizations, locations) and their relationships by isolating noun phrases.
*   **Text Summarization**: Aids in understanding the core components of sentences, which can be used to select important information for summaries.
*   **Question Answering**: Facilitates finding answers by matching phrases from questions to phrases in documents.
*   **Machine Translation**: Provides structural information that can help in reordering words and phrases for better translation quality across languages.
*   **Feature Engineering**: The extracted chunks can serve as powerful features for machine learning models in various NLP tasks like sentiment analysis or document classification.

This demonstration provides a hands-on look at how these basic grammatical units can be automatically identified from raw text.

In [7]:
import nltk
from nltk import word_tokenize, pos_tag
from nltk.chunk import RegexpParser

# Note: NLTK resources are assumed to be downloaded from previous cells.

def demo_phrase_chunking_original_sentences_vp_prp():
    """
    Demonstrates noun, verb, and pronoun phrase chunking using regular expressions
    with the *original* set of sentences.
    """
    sentences = [
        "The quick brown fox jumps over the lazy dog.",
        "A beautiful butterfly landed on the colorful flower.",
        "The experienced teacher explained the complex concept clearly."
    ]

    # Define grammar for NP, PRP_PH, PP, and VP chunking
    # This grammar includes rules for Pronoun Phrases and Verb Phrases.
    grammar = r"""
    NP: {<DT|PP\$>?<JJ>*<NN.*>+}
    PRP_PH: {<PRP|PRP\$>+}
    PP: {<IN><NP>}
    VP: {<VB.*><NP|PP|CLAUSE>+}
    """
    # Create parser
    cp = RegexpParser(grammar)

    for sentence in sentences:
        print(f"\nSentence: {sentence}")
        # Tokenize and POS tag
        tokens = word_tokenize(sentence)
        pos_tags = pos_tag(tokens)
        print(f"POS Tags: {pos_tags}")

        # Parse
        tree = cp.parse(pos_tags)
        print(f"Parse Tree:\n{tree}")

        # Extract noun, verb, and pronoun phrases
        noun_phrases = []
        verb_phrases = []
        pronoun_phrases = []
        for subtree in tree.subtrees():
            if subtree.label() == 'NP':
                noun_phrases.append(' '.join(word for word, tag in subtree.leaves()))
            elif subtree.label() == 'VP':
                verb_phrases.append(' '.join(word for word, tag in subtree.leaves()))
            elif subtree.label() == 'PRP_PH': # Check for the new Pronoun Phrase label
                pronoun_phrases.append(' '.join(word for word, tag in subtree.leaves()))

        print(f"Extracted Noun Phrases: {noun_phrases}")
        print(f"Extracted Verb Phrases: {verb_phrases}")
        print(f"Extracted Pronoun Phrases: {pronoun_phrases}")

# Run the updated demo with original sentences
demo_phrase_chunking_original_sentences_vp_prp()


Sentence: The quick brown fox jumps over the lazy dog.
POS Tags: [('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN'), ('.', '.')]
Parse Tree:
(S
  (NP The/DT quick/JJ brown/NN fox/NN)
  (VP jumps/VBZ (PP over/IN (NP the/DT lazy/JJ dog/NN)))
  ./.)
Extracted Noun Phrases: ['The quick brown fox', 'the lazy dog']
Extracted Verb Phrases: ['jumps over the lazy dog']
Extracted Pronoun Phrases: []

Sentence: A beautiful butterfly landed on the colorful flower.
POS Tags: [('A', 'DT'), ('beautiful', 'JJ'), ('butterfly', 'NN'), ('landed', 'VBD'), ('on', 'IN'), ('the', 'DT'), ('colorful', 'JJ'), ('flower', 'NN'), ('.', '.')]
Parse Tree:
(S
  (NP A/DT beautiful/JJ butterfly/NN)
  (VP landed/VBD (PP on/IN (NP the/DT colorful/JJ flower/NN)))
  ./.)
Extracted Noun Phrases: ['A beautiful butterfly', 'the colorful flower']
Extracted Verb Phrases: ['landed on the colorful flower']
Extracted Pronoun Phrases: []



**Thw code above is just printed the verb phrase and the pronoun along side with the noun phrase**

In [9]:
#Dependent and Independent Clause Demo
import spacy
# Load English model
nlp = spacy.load("en_core_web_sm")
def identify_clauses(text):
    """
    Identify independent and dependent clauses in a sentence.
    """
    doc = nlp(text)

    print(f"\nSentence: {text}")
    print("="*70)

    # Find the root (main verb of independent clause)
    root = [token for token in doc if token.dep_ == "ROOT"][0]

    # Independent clause
    print("\nINDEPENDENT CLAUSE (can stand alone):")
    print(f" Main verb: {root.text}")

    # Get subject and object of main clause
    independent_words = [root.text]

    for child in root.children:
        if child.dep_ in ["nsubj", "nsubjpass"]:
            print(f" Subject: {child.text}")
            independent_words.insert(0, child.text)
        elif child.dep_ in ["dobj", "attr"]:
            print(f" Object: {child.text}")
            independent_words.append(child.text)

    print(f" Clause: {' '.join(independent_words)}")

    # Dependent clauses
    dependent_labels = {
        'advcl': 'Adverbial Clause (tells when, why, how)',
        'ccomp': 'Complement Clause (completes the meaning)',
        'xcomp': 'Complement Clause',
        'relcl': 'Relative Clause (describes a noun)',
        'acl': 'Clausal Modifier'
    }

    print("\nDEPENDENT CLAUSES (cannot stand alone):")
    found_dependent = False

    for token in doc:
        if token.dep_ in dependent_labels:
            found_dependent = True
            clause_words = [t.text for t in token.subtree]

            # Find the subordinating word
            marker = ""
            for t in token.subtree:
                if t.dep_ == "mark":
                    marker = t.text
                    break

            print(f"\n Type: {dependent_labels[token.dep_]}")
            if marker:
                print(f" Marker: {marker}")
            print(f" Verb: {token.text}")
            print(f" Clause: {' '.join(clause_words)}")

    if not found_dependent:
        print(" None found - This is a simple sentence")

    print()
# Demo sentences
print("="*70)
print("DEPENDENT vs INDEPENDENT CLAUSE DEMONSTRATION")
print("="*70)
sentences = [
    # Simple sentence - only independent clause
    "The dog barks.",

    # Adverbial clause (dependent)
    "I stayed home because it was raining.",

    # Complement clause (dependent)
    "I believe that she is right.",

    # Relative clause (dependent)
    "The student who studied hard passed the exam.",

    # Multiple dependent clauses
    "She said that she would come when she finished her work.",

    # Complex sentence
    "Although it was difficult, we completed the project."
]
for sentence in sentences:
    identify_clauses(sentence)
    print("-"*70)

DEPENDENT vs INDEPENDENT CLAUSE DEMONSTRATION

Sentence: The dog barks.

INDEPENDENT CLAUSE (can stand alone):
 Main verb: barks
 Subject: dog
 Clause: dog barks

DEPENDENT CLAUSES (cannot stand alone):
 None found - This is a simple sentence

----------------------------------------------------------------------

Sentence: I stayed home because it was raining.

INDEPENDENT CLAUSE (can stand alone):
 Main verb: stayed
 Subject: I
 Clause: I stayed

DEPENDENT CLAUSES (cannot stand alone):

 Type: Adverbial Clause (tells when, why, how)
 Marker: because
 Verb: raining
 Clause: because it was raining

----------------------------------------------------------------------

Sentence: I believe that she is right.

INDEPENDENT CLAUSE (can stand alone):
 Main verb: believe
 Subject: I
 Clause: I believe

DEPENDENT CLAUSES (cannot stand alone):

 Type: Complement Clause (completes the meaning)
 Marker: that
 Verb: is
 Clause: that she is right

--------------------------------------------------

#**Summrization of the code's output**

Breaking sentences into clauses is a critical step in NLP for several reasons, as it allows for a deeper and more structured understanding of text than just individual words or simple phrases.


---
Both phrases and clauses are groups of words, but their key difference lies in whether they contain a subject-verb pair:

**Phrase** : A group of two or more words that functions as a single unit in a sentence.

**Clause** :A group of words that contains both a subject and a verb.

**In summary** : The presence of a subject-verb pair is the defining characteristic that elevates a group of words from a phrase to a clause. A phrase is a building block of a clause, while a clause is a building block of a sentence.

In [12]:
# Hierachical Syntax Tree
import nltk
nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger')
from nltk import pos_tag, word_tokenize, RegexpParser
# Example text
sample_text = "The quick brown fox jumps over the lazy dog"
# Tokenize and tag parts of speech
tagged = pos_tag(word_tokenize(sample_text))
# Define chunking patterns
chunker = RegexpParser("""
 NP: {<DT>?<JJ>*<NN.*>+} # Noun Phrases
P: {<IN>} # Prepositions
V: {<VB.*>} # Verbs
PP: {<P><NP>} # Prepositional Phrases
VP: {<V><NP|PP>*} # Verb Phrases
""")
# Parse and extract phrases
output = chunker.parse(tagged)
# Display results
print("POS Tags:")
for word, tag in tagged:
    print(f" {word:10} -> {tag}")
print("\nParsed Output:")
print(output)
print("\nTree Structure:")
output.pretty_print()
print("\nExtracted Phrases:")
for subtree in output.subtrees():
    if subtree.label() != 'S':
        phrase_text = ' '.join(word for word, tag in subtree.leaves())
        print(f"{subtree.label()}: {phrase_text}")


POS Tags:
 The        -> DT
 quick      -> JJ
 brown      -> NN
 fox        -> NN
 jumps      -> VBZ
 over       -> IN
 the        -> DT
 lazy       -> JJ
 dog        -> NN

Parsed Output:
(S
  (NP The/DT quick/JJ brown/NN fox/NN)
  (VP (V jumps/VBZ) (PP (P over/IN) (NP the/DT lazy/JJ dog/NN))))

Tree Structure:
                                    S                                      
           _________________________|_______________                        
          |                                         VP                     
          |                          _______________|_____                  
          |                         |                     PP               
          |                         |         ____________|_____            
          NP                        V        P                  NP         
   _______|________________         |        |       ___________|______     
The/DT quick/JJ brown/NN fox/NN jumps/VBZ over/IN the/DT     lazy/JJ dog/N

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


#**Key Differences and summrization of the code output**:

**Phrases vs. Clauses**:

A phrase is a group of words without a subject-verb pair (e.g., "the quick fox"), while a clause contains both a subject and a verb (e.g., "the fox jumps"). Clauses are fundamental in NLP for understanding complete thoughts and complex sentence structures.

**Constituency vs. Dependency Trees**:

Constituency trees group words into nested grammatical phrases (like Noun Phrases, Verb Phrases).
Dependency trees show direct grammatical relationships between individual words (e.g., which word acts as the subject of a verb).

**Importance of Trees in NLP**:

Both types of hierarchical syntax trees are crucial because they provide the structural and relational understanding of sentences needed for advanced NLP tasks such as information extraction, machine translation, and question answering. They allow systems to interpret meaning beyond simple word sequences.