# Part of Speech Tagging

Part of Speech (POS) Tagging  a sentence in a broader sense refers to the addition of labels of the verb, noun,etc.by the context of the sentence. Generic tagging of POS is manually not possible as some words may have different (ambiguous) meanings according to the structure of the sentence. 

Conversion of text in the form of list of words is done first. Afterwords the list is traversed and a tag is assigned to individual word.



In [1]:
# Install NLTK if not already installed...uncomment the next cell and run it.
#! pip install nltk

In [2]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\soharab.hossain\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [3]:
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\soharab.hossain\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping taggers\averaged_perceptron_tagger.zip.


True

In [4]:
nltk.download('maxent_ne_chunker')

[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     C:\Users\soharab.hossain\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping chunkers\maxent_ne_chunker.zip.


True

In [5]:
nltk.download('words')

[nltk_data] Downloading package words to
[nltk_data]     C:\Users\soharab.hossain\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\words.zip.


True

## POS Tagging

In [6]:

text="A quick brown fox jumps over the lazy dogs."

tokens_tag = nltk.pos_tag(nltk.word_tokenize(text))

print("After Token:",tokens_tag)


After Token: [('A', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dogs', 'NNS'), ('.', '.')]


In [7]:
text = "everybody should learn machine learning with python"
tokens = nltk.word_tokenize(text)
print(tokens)


['everybody', 'should', 'learn', 'machine', 'learning', 'with', 'python']


In [8]:
tag = nltk.pos_tag(tokens)
print(tag)


[('everybody', 'NN'), ('should', 'MD'), ('learn', 'VB'), ('machine', 'NN'), ('learning', 'VBG'), ('with', 'IN'), ('python', 'NN')]


# Named Entity Recognition (NER)

Named entity recognition (NER) is probably the first step towards information extraction that seeks to locate and classify named entities in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. NER is used in many fields in Natural Language Processing (NLP).


## Chunking

**We need to implement noun phrase chunking to identify named entities using a regular expression consisting of rules that indicate how sentences should be chunked.**

In this case the chunk pattern consists of one rule - a noun phrase, NP, should be formed whenever the chunker finds an optional determiner, DT, followed by any number of adjectives, JJ, and then a noun, NN.

In [9]:
grammar = "NP: {<DT>?<JJ>*<NN>}"
cp  =nltk.RegexpParser(grammar)



text = "Jack and Jill went up the hill, to fetch a pale of water."

# Tokenize words
tokens = nltk.word_tokenize(text)
print(tokens)

# POS tag the words
tag = nltk.pos_tag(tokens)
print(tag)

# Chunking
result = cp.parse(tag)
print(result)


['Jack', 'and', 'Jill', 'went', 'up', 'the', 'hill', ',', 'to', 'fetch', 'a', 'pale', 'of', 'water', '.']
[('Jack', 'NNP'), ('and', 'CC'), ('Jill', 'NNP'), ('went', 'VBD'), ('up', 'RP'), ('the', 'DT'), ('hill', 'NN'), (',', ','), ('to', 'TO'), ('fetch', 'VB'), ('a', 'DT'), ('pale', 'NN'), ('of', 'IN'), ('water', 'NN'), ('.', '.')]
(S
  Jack/NNP
  and/CC
  Jill/NNP
  went/VBD
  up/RP
  (NP the/DT hill/NN)
  ,/,
  to/TO
  fetch/VB
  (NP a/DT pale/NN)
  of/IN
  (NP water/NN)
  ./.)


### Another Example

In [10]:
sentence = """Alice and Bob went to America to see the President at Washington D.C"""


In [11]:
tokens = nltk.word_tokenize(sentence)
print(tokens)

['Alice', 'and', 'Bob', 'went', 'to', 'America', 'to', 'see', 'the', 'President', 'at', 'Washington', 'D.C']


In [12]:
tagged = nltk.pos_tag(tokens)
print(tagged)

[('Alice', 'NNP'), ('and', 'CC'), ('Bob', 'NNP'), ('went', 'VBD'), ('to', 'TO'), ('America', 'NNP'), ('to', 'TO'), ('see', 'VB'), ('the', 'DT'), ('President', 'NNP'), ('at', 'IN'), ('Washington', 'NNP'), ('D.C', 'NNP')]


In [13]:
tagged = nltk.pos_tag(tokens)
print(tagged)

[('Alice', 'NNP'), ('and', 'CC'), ('Bob', 'NNP'), ('went', 'VBD'), ('to', 'TO'), ('America', 'NNP'), ('to', 'TO'), ('see', 'VB'), ('the', 'DT'), ('President', 'NNP'), ('at', 'IN'), ('Washington', 'NNP'), ('D.C', 'NNP')]


In [14]:
entities = nltk.chunk.ne_chunk(tagged)
print(entities)

(S
  Alice/NNP
  and/CC
  (PERSON Bob/NNP)
  went/VBD
  to/TO
  (GPE America/NNP)
  to/TO
  see/VB
  the/DT
  President/NNP
  at/IN
  (ORGANIZATION Washington/NNP)
  D.C/NNP)


### Another Example

In [15]:
text = 'European authorities fined Google a record $5.1 billion on Wednesday for abusing its power in the mobile phone market and ordered the company to alter its practices'

In [16]:
tokens = nltk.word_tokenize(text)
print(tokens)

['European', 'authorities', 'fined', 'Google', 'a', 'record', '$', '5.1', 'billion', 'on', 'Wednesday', 'for', 'abusing', 'its', 'power', 'in', 'the', 'mobile', 'phone', 'market', 'and', 'ordered', 'the', 'company', 'to', 'alter', 'its', 'practices']


In [17]:
tagged = nltk.pos_tag(tokens)
print(tagged)

[('European', 'JJ'), ('authorities', 'NNS'), ('fined', 'VBD'), ('Google', 'NNP'), ('a', 'DT'), ('record', 'NN'), ('$', '$'), ('5.1', 'CD'), ('billion', 'CD'), ('on', 'IN'), ('Wednesday', 'NNP'), ('for', 'IN'), ('abusing', 'VBG'), ('its', 'PRP$'), ('power', 'NN'), ('in', 'IN'), ('the', 'DT'), ('mobile', 'JJ'), ('phone', 'NN'), ('market', 'NN'), ('and', 'CC'), ('ordered', 'VBD'), ('the', 'DT'), ('company', 'NN'), ('to', 'TO'), ('alter', 'VB'), ('its', 'PRP$'), ('practices', 'NNS')]


In [18]:
entities = nltk.chunk.ne_chunk(tagged)
print(entities)

(S
  (GPE European/JJ)
  authorities/NNS
  fined/VBD
  (PERSON Google/NNP)
  a/DT
  record/NN
  $/$
  5.1/CD
  billion/CD
  on/IN
  Wednesday/NNP
  for/IN
  abusing/VBG
  its/PRP$
  power/NN
  in/IN
  the/DT
  mobile/JJ
  phone/NN
  market/NN
  and/CC
  ordered/VBD
  the/DT
  company/NN
  to/TO
  alter/VB
  its/PRP$
  practices/NNS)
