<center>
    <h1>Text Chunking (Shallow Parsing) <h1>
</center >
Chunking is the process of extracting phrases (noun phrases, verb phrases, etc.)
from a sentence using POS tags.

This is a **classical NLP technique**, usually done with rule-based grammars.

In [1]:
import nltk
from nltk import word_tokenize, pos_tag, RegexpParser

In [2]:
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Asus\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\Asus\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

In [3]:
sentence = "The quick brown fox jumps over the lazy dog"
tokens = word_tokenize(sentence)
tokens

['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

In [4]:
# POS Tagging
pos_tags = pos_tag(tokens)
pos_tags

[('The', 'DT'),
 ('quick', 'JJ'),
 ('brown', 'NN'),
 ('fox', 'NN'),
 ('jumps', 'VBZ'),
 ('over', 'IN'),
 ('the', 'DT'),
 ('lazy', 'JJ'),
 ('dog', 'NN')]

In [5]:
# Define Chunk Grammar (Noun Phrase)
grammar = r"""
NP: {<DT>?<JJ>*<NN|NNS>}   # Noun Phrase
"""
chunk_parser = RegexpParser(grammar)

In [6]:
!pip install svgling



In [7]:
# Apply Chunking
chunk_tree = chunk_parser.parse(pos_tags)

In [None]:
chunk_tree.draw()

In [None]:
# Extract Chunks Programmatically
grammar_vp = r"""
VP: {<VB.*><NP|PP|CLAUSE>+$}
"""
vp_parser = RegexpParser(grammar_vp)
vp_tree = vp_parser.parse(pos_tags)
vp_tree

In [None]:
# Verb Phrase (VP) Chunking
grammar_all = r"""
NP: {<DT>?<JJ>*<NN|NNS>}
VP: {<VB.*><DT>?<JJ>*<NN|NNS>}
PP: {<IN><NP>}
"""
parser = RegexpParser(grammar_all)
tree = parser.parse(pos_tags)
tree.draw()

In [None]:
# Multiple Chunk Rules Together
grammar_all = r"""
NP: {<DT>?<JJ>*<NN|NNS>}
VP: {<VB.*><DT>?<JJ>*<NN|NNS>}
PP: {<IN><NP>}
"""
parser = RegexpParser(grammar_all)
tree = parser.parse(pos_tags)
tree.draw()


## Why Chunking?

- Helps in **information extraction**
- Used in **NER preprocessing**
- Improves **syntactic understanding**
- Lightweight & fast (rule-based)

Chunking ≠ Parsing  
Chunking ≠ NER  