**Chunking** allows you to identify phrases.

A **phrase** is a word or group of words that works as a single unit to perform a grammatical function.



- Chunking makes use of POS tags to group words and apply chunk tags to those groups.

- Chunks don’t overlap, so one instance of a word can be in only one chunk at a time.

- Before chunk,  make sure that the parts of speech in your text are tagged, so create a string for POS tagging.

- In order to chunk, you first need to define a chunk grammar.

- A **chunk grammar** is a combination of rules on how sentences should be chunked. It often uses regular expressions, or regexes.

### Importing libraries and dependencies

In [1]:
import nltk
from nltk.tokenize import word_tokenize

### Specifying Sentences

In [2]:
sentence = "Artificial intelligence is intelligence demonstrated by machines as opposed to natural intelligence displayed by animals including humans"

### Word Tokenization

In [3]:
words_in_sentence = word_tokenize(sentence)

### POS Tagging

In [4]:
pos_tag = nltk.pos_tag(words_in_sentence)

In [5]:
pos_tag

[('Artificial', 'JJ'),
 ('intelligence', 'NN'),
 ('is', 'VBZ'),
 ('intelligence', 'RB'),
 ('demonstrated', 'VBN'),
 ('by', 'IN'),
 ('machines', 'NNS'),
 ('as', 'IN'),
 ('opposed', 'VBN'),
 ('to', 'TO'),
 ('natural', 'JJ'),
 ('intelligence', 'NN'),
 ('displayed', 'VBN'),
 ('by', 'IN'),
 ('animals', 'NNS'),
 ('including', 'VBG'),
 ('humans', 'NNS')]

### Creating a chunk grammar

In [6]:
grammar = "NP: {<DT>?<JJ>*<NN>}"

- NP stands for noun phrase.
- Start with an optional (?) determiner ('DT')
- Can have any number (*) of adjectives (JJ)
- End with a noun (<NN>)

### Chunk Parser

In [7]:
chunk_parser = nltk.RegexpParser(grammar)

In [8]:
tree = chunk_parser.parse(pos_tag)

In [9]:
tree.draw()