1. Chunking

Chunking is the process of grouping related words into meaningful phrases (called “chunks”).

Typically done after POS (Part-of-Speech) tagging.

Helps identify structures like:

Noun Phrases (NP): “The quick brown fox”

Verb Phrases (VP): “is running fast”

2. Chinking

Chinking is the opposite of chunking.

It removes certain words from a chunk, leaving only the relevant part.

Often used to refine chunks by excluding unwanted POS tags.

In [27]:
import nltk
from nltk import pos_tag, word_tokenize, RegexpParser
from nltk.tree import Tree

#Download required NLTK
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger_eng')

#step-1
text="The quick brown fox jumps over the lazy dog near the river bank"

#step-2
tokens=word_tokenize(text)
pos_tags=pos_tag(tokens)

print("POS Tagged Sentence:\n",pos_tags)
print("-" * 80)

# Step-3
chunk_grammar=r"""
      NP: {<DT>?<JJ>*<NN.*>+}  # NOptional determiner, adjectives, and nouns
"""

#Step-4 create chunk parser
chunk_parser=RegexpParser(chunk_grammar)

chunk_tree=chunk_parser.parse(pos_tags)  # apply chunking

print("chunking result(Noun Phrase):")
for subtree in chunk_tree:
  if isinstance(subtree, Tree) and subtree.label()== 'NP':
    print(" ".join(word for word, pos in subtree.leaves()))
print("-" * 80)

# step-6 Define chinking Grammar(Remove Verbs)
chink_grammar= r"""
     NP: {<.*>+}     # Chumk everything
     }<VB.*|IN>+{    #Chink out verbs and prepostions
"""

chink_parser=RegexpParser(chink_grammar)
chink_tree=chink_parser.parse(pos_tags)

print("Chinking Result(After Removing Verbs & prep):")
for subtree in chink_tree:
  if isinstance(subtree, Tree) and subtree.label()=='NP':
    print(" ".join(word for word, pos in subtree.leaves()))
print("-" * 80)


print("chunk")
print(chunk_tree.pformat(margin=70))
print("\n" + "="*80)
print("chinlk")
print(chink_tree.pformat(margin=70))

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.


POS Tagged Sentence:
 [('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN'), ('near', 'IN'), ('the', 'DT'), ('river', 'NN'), ('bank', 'NN')]
--------------------------------------------------------------------------------
chunking result(Noun Phrase):
The quick brown fox
the lazy dog
the river bank
--------------------------------------------------------------------------------
Chinking Result(After Removing Verbs & prep):
The quick brown fox
the lazy dog
the river bank
--------------------------------------------------------------------------------
chunk
(S
  (NP The/DT quick/JJ brown/NN fox/NN)
  jumps/VBZ
  over/IN
  (NP the/DT lazy/JJ dog/NN)
  near/IN
  (NP the/DT river/NN bank/NN))

chinlk
(S
  (NP The/DT quick/JJ brown/NN fox/NN)
  jumps/VBZ
  over/IN
  (NP the/DT lazy/JJ dog/NN)
  near/IN
  (NP the/DT river/NN bank/NN))
