# Part Of Speech Tagging

in this notebook, we will use the `nltk` library to perform Part Of Speech tagging on a given text.

In [6]:
import pandas as pd
import nltk


In [7]:
spam = pd.read_csv('../assets/spam.csv')

In [8]:
spam.head()

Unnamed: 0,text,target
0,"Go until jurong point, crazy.. Available only ...",ham
1,Ok lar... Joking wif u oni...,ham
2,Free entry in 2 a wkly comp to win FA Cup fina...,spam
3,U dun say so early hor... U c already then say...,ham
4,"Nah I don't think he goes to usf, he lives aro...",ham


In [9]:
text1 = spam["text"][7]

In [10]:
text1

"As per your request 'Melle Melle (Oru Minnaminunginte Nurungu Vettam)' has been set as your callertune for all Callers. Press *9 to copy your friends Callertune"

In [11]:
tokens = nltk.word_tokenize(text1)

In [18]:
tokens

['Go',
 'until',
 'jurong',
 'point',
 ',',
 'crazy',
 '..',
 'Available',
 'only',
 'in',
 'bugis',
 'n',
 'great',
 'world',
 'la',
 'e',
 'buffet',
 '...',
 'Cine',
 'there',
 'got',
 'amore',
 'wat',
 '...']

In [19]:
nltk.pos_tag(tokens)

[('Go', 'NNP'),
 ('until', 'IN'),
 ('jurong', 'JJ'),
 ('point', 'NN'),
 (',', ','),
 ('crazy', 'JJ'),
 ('..', 'NN'),
 ('Available', 'NNP'),
 ('only', 'RB'),
 ('in', 'IN'),
 ('bugis', 'NN'),
 ('n', 'RB'),
 ('great', 'JJ'),
 ('world', 'NN'),
 ('la', 'NN'),
 ('e', 'FW'),
 ('buffet', 'NN'),
 ('...', ':'),
 ('Cine', 'NNP'),
 ('there', 'EX'),
 ('got', 'VBD'),
 ('amore', 'RB'),
 ('wat', 'NN'),
 ('...', ':')]

In [20]:
nltk.help.upenn_tagset()

$: dollar
    $ -$ --$ A$ C$ HK$ M$ NZ$ S$ U.S.$ US$
'': closing quotation mark
    ' ''
(: opening parenthesis
    ( [ {
): closing parenthesis
    ) ] }
,: comma
    ,
--: dash
    --
.: sentence terminator
    . ! ?
:: colon or ellipsis
    : ; ...
CC: conjunction, coordinating
    & 'n and both but either et for less minus neither nor or plus so
    therefore times v. versus vs. whether yet
CD: numeral, cardinal
    mid-1890 nine-thirty forty-two one-tenth ten million 0.5 one forty-
    seven 1987 twenty '79 zero two 78-degrees eighty-four IX '60s .025
    fifteen 271,124 dozen quintillion DM2,000 ...
DT: determiner
    all an another any both del each either every half la many much nary
    neither no some such that the them these this those
EX: existential there
    there
FW: foreign word
    gemeinschaft hund ich jeux habeas Haementeria Herr K'ang-si vous
    lutihaw alai je jour objets salutaris fille quibusdam pas trop Monte
    terram fiche oui corporis ...
IN: preposition or

## Perform Part of sepach tagging using spacy

In [2]:

import spacy

In [4]:
nlp = spacy.load('en_core_web_sm')

In [12]:
doc = nlp(text1)

In [13]:
for token in doc:
    print(token.text, token.pos_)

As ADP
per ADP
your PRON
request NOUN
' PUNCT
Melle PROPN
Melle PROPN
( PUNCT
Oru PROPN
Minnaminunginte PROPN
Nurungu PROPN
Vettam PROPN
) PUNCT
' PUNCT
has AUX
been AUX
set VERB
as ADP
your PRON
callertune NOUN
for ADP
all DET
Callers PROPN
. PUNCT
Press PROPN
* PUNCT
9 NUM
to PART
copy VERB
your PRON
friends NOUN
Callertune PROPN


In [20]:
# print the POS tags meaning
spacy.explain('PROPN')

'proper noun'

## POS tagging using TextBlob

In [16]:
from textblob import TextBlob

In [17]:
blob = TextBlob(text1)

In [18]:
# POS tagging
blob.tags

[('As', 'IN'),
 ('per', 'IN'),
 ('your', 'PRP$'),
 ('request', 'NN'),
 ("'Melle", 'POS'),
 ('Melle', 'NNP'),
 ('Oru', 'NNP'),
 ('Minnaminunginte', 'NNP'),
 ('Nurungu', 'NNP'),
 ('Vettam', 'NNP'),
 ("'", 'POS'),
 ('has', 'VBZ'),
 ('been', 'VBN'),
 ('set', 'VBN'),
 ('as', 'IN'),
 ('your', 'PRP$'),
 ('callertune', 'NN'),
 ('for', 'IN'),
 ('all', 'DT'),
 ('Callers', 'NNP'),
 ('Press', 'NNP'),
 ('*', 'VBD'),
 ('9', 'CD'),
 ('to', 'TO'),
 ('copy', 'VB'),
 ('your', 'PRP$'),
 ('friends', 'NNS'),
 ('Callertune', 'NNP')]

## Exercises

# 1. Perform Part Of Speech tagging on each text in the spam dataset, and store the results in a new column called `pos`

2. Find the number of nouns in each text
3. Find the number of verbs in each text
4. Find the number of adjectives in each text
5. Find the number of adverbs in each text
6. Find the number of pronouns in each text
7. Find the number of prepositions in each text
8. Find the number of conjunctions in each text
9. Find the number of interjections in each text