## What is Chunking?
Chunking is the process of grouping words into meaningful "chunks" — typically noun phrases (NPs) like:

“The quick brown fox jumps over the lazy dog.”

Chunking uses POS tags and pattern rules (called regular expressions) to form these groups.



In [3]:
import nltk
from nltk import word_tokenize, pos_tag, RegexpParser

# Download required NLTK data
#nltk.download('punkt')
#nltk.download('averaged_perceptron_tagger')


In [2]:
sentence = "Shivi plays football in New York during summer."


In [4]:
sentence = "Shivi plays football in New York during summer."


In [5]:
tokens = word_tokenize(sentence)
pos_tags = pos_tag(tokens)
print("POS Tags:", pos_tags)


POS Tags: [('Shivi', 'NNP'), ('plays', 'VBZ'), ('football', 'NN'), ('in', 'IN'), ('New', 'NNP'), ('York', 'NNP'), ('during', 'IN'), ('summer', 'NN'), ('.', '.')]


In [6]:
# Define a simple noun phrase chunk grammar
grammar = r"""
  NP: {<DT>?<JJ>*<NN.*>}   # NP = optional determiner, adjectives, and noun
      {<NNP>+}             # or one or more proper nouns
"""

chunk_parser = RegexpParser(grammar)
chunk_tree = chunk_parser.parse(pos_tags)


In [7]:
print("\nChunk Tree:")
print(chunk_tree)

# Visualize tree
chunk_tree.draw()



Chunk Tree:
(S
  (NP Shivi/NNP)
  plays/VBZ
  (NP football/NN)
  in/IN
  (NP New/NNP)
  (NP York/NNP)
  during/IN
  (NP summer/NN)
  ./.)


| Word     | POS Tag | Chunk            |
| -------- | ------- | ---------------- |
| Shivi    | NNP     | NP (Proper noun) |
| plays    | VBZ     | –                |
| football | NN      | NP               |
| in       | IN      | –                |
| New York | NNP NNP | NP               |
| summer   | NN      | NP               |


## Sample Sentences for Practice 

sentences = [
    "The tall man played football.",
    "Kron visited beautiful Delhi in July.",
    "A shiny red car passed quickly.",
    "Shivi and Kron are good friends."
]

### Chunking example to include Verb Phrases (VP)

In [12]:
import nltk
from nltk import word_tokenize, pos_tag, RegexpParser

# Download NLTK resources
#nltk.download('punkt')
#nltk.download('averaged_perceptron_tagger')

# Sample sentences
sentences = [
    "The tall man played football.",
    "Kron visited beautiful Delhi in July.",
    "A shiny red car passed quickly.",
    "Shivi and Kron are good friends.",
    "Shivi plays football in New York during summer."
]

# Fixed grammar with NP and VP rules
grammar = r"""
  NP: {<DT>?<JJ>*<NN.*>}          # Noun Phrase
      {<NNP><NNP>*}               # Proper noun phrase (e.g., New York)
  VP: {<VB.*><NP|PP|CLAUSE>+$}    # Verb Phrase with noun or prep phrase
      {<VB.*><RB.*>?}             # Verb + optional adverb
"""

chunk_parser = RegexpParser(grammar)

# Process each sentence
def process_sentence(sentence):
    print(f"\Sentence: {sentence}")
    tokens = word_tokenize(sentence)
    pos_tags = pos_tag(tokens)
    print("POS Tags:", pos_tags)
    
    chunk_tree = chunk_parser.parse(pos_tags)
    print("🌳 Chunk Tree:")
    print(chunk_tree)
    chunk_tree.draw()  # Visual tree

# Apply to all sample sentences
for sent in sentences:
    process_sentence(sent)

    

\Sentence: The tall man played football.
POS Tags: [('The', 'DT'), ('tall', 'JJ'), ('man', 'NN'), ('played', 'VBD'), ('football', 'NN'), ('.', '.')]
🌳 Chunk Tree:
(S (NP The/DT tall/JJ man/NN) (VP played/VBD) (NP football/NN) ./.)
\Sentence: Kron visited beautiful Delhi in July.
POS Tags: [('Kron', 'NNP'), ('visited', 'VBD'), ('beautiful', 'JJ'), ('Delhi', 'NNP'), ('in', 'IN'), ('July', 'NNP'), ('.', '.')]
🌳 Chunk Tree:
(S
  (NP Kron/NNP)
  (VP visited/VBD)
  (NP beautiful/JJ Delhi/NNP)
  in/IN
  (NP July/NNP)
  ./.)
\Sentence: A shiny red car passed quickly.
POS Tags: [('A', 'DT'), ('shiny', 'JJ'), ('red', 'JJ'), ('car', 'NN'), ('passed', 'VBD'), ('quickly', 'RB'), ('.', '.')]
🌳 Chunk Tree:
(S (NP A/DT shiny/JJ red/JJ car/NN) (VP passed/VBD quickly/RB) ./.)
\Sentence: Shivi and Kron are good friends.
POS Tags: [('Shivi', 'NNP'), ('and', 'CC'), ('Kron', 'NNP'), ('are', 'VBP'), ('good', 'JJ'), ('friends', 'NNS'), ('.', '.')]
🌳 Chunk Tree:
(S
  (NP Shivi/NNP)
  and/CC
  (NP Kron/NNP)
  (