# Tokenization, Tagging, Chunking - Example: Multiple Parts of Speech

Find all `parts of speech` for `over`, `spoke`, `answer` in `alice in wonderland`

In [1]:
import nltk

Same word can be tagged with a different part of speech based on usage.

In [5]:
alice = nltk.corpus.gutenberg.words("carroll-alice.txt")
alice[:10]

['[',
 'Alice',
 "'",
 's',
 'Adventures',
 'in',
 'Wonderland',
 'by',
 'Lewis',
 'Carroll']

In [6]:
# Normalize to make sure it is a word and not punctuation 
# make it lowercase
alice_norm = [word.lower() for word in alice if word.isalpha()]

In [4]:
# Finding POS tags of normalized tokens
alice_tags = nltk.pos_tag(alice_norm,tagset="universal")
alice_tags[:5]

[('alice', 'NOUN'),
 ('s', 'NOUN'),
 ('adventures', 'NOUN'),
 ('in', 'ADP'),
 ('wonderland', 'NOUN')]

In [13]:
# Conditional Frequency Distribution of POS tags
alice_cfd = nltk.ConditionalFreqDist(alice_tags)
alice_cfd.items()



**Frequency Distribution of POS tags of specific words**

In [15]:
alice_cfd['over']

FreqDist({'ADP': 31, 'PRT': 5, 'ADV': 4})

In [16]:
alice_cfd['spoke']

FreqDist({'VERB': 16, 'NOUN': 1})

In [18]:
alice_cfd['answer']

FreqDist({'NOUN': 5, 'VERB': 3, 'ADP': 1})

**Thus we see how many different POS and their frequency for a given word**