## Chunking
##### Chunking is a process of extracting phrases from unstructured text. Instead of just simple tokens which may not represent the actual meaning of the text, its advisable to use phrases such as “South Africa” as a single word instead of ‘South’ and ‘Africa’ separate words.


##### Chunking works on top of POS tagging, it uses pos-tags as input and provides chunks as output. Similar to POS tags, there are a standard set of Chunk tags like Noun Phrase(NP), Verb Phrase (VP), etc. Chunking is very important when you want to extract information from text such as Locations, Person Names etc. In NLP called Named Entity Extraction.

##### The rule states that whenever the chunk finds an optional determiner (DT) followed by any number of adjectives (JJ) and then a noun (NN) then the Noun Phrase(NP) chunk should be formed.

In [5]:
# We will consider Noun Phrase Chunking and search for the chunks corresponding to an indivisual noun phrase.
# Inorder to Create NP Chunks. we define the chunk grammer using POS tags. we will define thi using a regular expression

# The rule states that whenever the chunk finds an optional determiner(DT) followed by any number of adjectives(JJ) and 
# then Noun(NN) then the Noun Phrase(NP) chunk should be formed

import nltk

sentences = "the little yellow dog barked at the cat"

grammer = (''' NP:{<DT>?<JJ>*<NN>} ''') # NP

chunkparser = nltk.RegexpParser(grammer)
tagged = nltk.pos_tag(nltk.word_tokenize(sentences))

tagged

tree = chunkparser.parse(tagged)

for subtree in tree:
    print(subtree)
    
tree.draw()



(NP the/DT little/JJ yellow/JJ dog/NN)
('barked', 'VBD')
('at', 'IN')
(NP the/DT cat/NN)


![1.JPG](attachment:1.JPG)

In [6]:
# Example of a simple regular expression based NP chunker.
import nltk

sentence = "the little yellow dog barked at the cat"

#Define your grammar using regular expressions
grammar = (''' NP: {<DT>?<JJ>*<NN>} # NP ''')

chunkParser = nltk.RegexpParser(grammar)

tagged = nltk.pos_tag(nltk.word_tokenize(sentence))

tagged

[('the', 'DT'),
 ('little', 'JJ'),
 ('yellow', 'JJ'),
 ('dog', 'NN'),
 ('barked', 'VBD'),
 ('at', 'IN'),
 ('the', 'DT'),
 ('cat', 'NN')]

In [7]:
tree = chunkParser.parse(tagged)

In [8]:
for subtree in tree.subtrees():
    print(subtree)

(S
  (NP the/DT little/JJ yellow/JJ dog/NN)
  barked/VBD
  at/IN
  (NP the/DT cat/NN))
(NP the/DT little/JJ yellow/JJ dog/NN)
(NP the/DT cat/NN)


In [9]:
import matplotlib.pyplot as plt

tree.draw()

![2.JPG](attachment:2.JPG)

In [10]:
import nltk
noun1=[("financial","NN"),("year","NN"),("account","NN"),("summary","NN")]

gram="NP:{<NN>+}"
find = nltk.RegexpParser(gram)
print(find.parse(noun1))
x=find.parse(noun1)
x.draw()

(S (NP financial/NN year/NN account/NN summary/NN))


![3.JPG](attachment:3.JPG)

In [11]:
import nltk
sent=[("A","DT"),("wise", "JJ"), ("small", "JJ"),("girl", "NN"),
("of", "IN"), ("village", "N"), ("became", "VBD"), ("leader", "NN")]

grammar = "NP: {<DT>?<JJ>*<NN><IN>?<NN>*}"

find = nltk.RegexpParser(grammar)

res = find.parse(sent)

print(res)

res.draw()

(S
  (NP A/DT wise/JJ small/JJ girl/NN of/IN)
  village/N
  became/VBD
  (NP leader/NN))


![4.JPG](attachment:4.JPG)