# <b style="color:green">POS Tagging : Part of Speech Tagging</b>
- Work as a ___Text Preprocessing Technique___
- POS is a preprocessing step.
- Use `spacy` library.

- Plane
  1. What is POS?
  2. Application
  3. Spacy code demo
  4. HMM:Hidden Markov Model >---> Viterbi Algorithm

1. What is POS?
   - In simple words, we can say that POS tagging is a task of labelling each word in a sentence with its appropriate part of speech. In traditional grammer, a part of speech or part-of-speech is a category of words that have similar grammatical properties.
   - <pre>
        Why     not     tell     someone      ?
       adverb  adverb   verb      noun       punctuation mark,
                                             sentence closer
   </pre>

2. Application of POS Tagging
   1. Named Entity Recognition : Name Recogination System
   2. Question Answering System
   3. Word sense disambiguation : 1. I left the room.
                                  2. Left of the room.
   4. Chabots
3. Spacy Library Code
   - Coarse Grain POS in spacy model
   - Fine Grained POS in spacy model

In [1]:
# !pip install spacy

In [2]:
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp(u"I will google about facebook")
doc.text

'I will google about facebook'

In [3]:
for word in doc:
    print(word)

I
will
google
about
facebook


In [4]:
# check the which POS is assign to which word
for word in doc:
    print(word, ":>----->>>", word.pos_, "=", spacy.explain(word.pos_)) # coarse grain pos

I :>----->>> PRON = pronoun
will :>----->>> AUX = auxiliary
google :>----->>> VERB = verb
about :>----->>> ADP = adposition
facebook :>----->>> NOUN = noun


In [5]:
for word in doc:
    print(word, ":>---->>>", word.tag_, "=", spacy.explain(word.tag_))

I :>---->>> PRP = pronoun, personal
will :>---->>> MD = verb, modal auxiliary
google :>---->>> VB = verb, base form
about :>---->>> IN = conjunction, subordinating or preposition
facebook :>---->>> NN = noun, singular or mass


In [6]:
spacy.explain('PRP')

'pronoun, personal'

In [7]:
doc2 = nlp(u"I left the room")
# check the which POS is assign to which word
print(doc2.text)
for word in doc2:
    print(word, ":>----->>>", word.pos_, "=", spacy.explain(word.pos_)) # coarse grain pos
print()

# check tagging
for word in doc2:
    print(word, ":>---->>>", word.tag_, "=", spacy.explain(word.tag_)) # fine-grained pos

print()
print("-----------------------------------------------------------------------")
print()

doc3 = nlp(u"to the left of the room")
print(doc3.text)
# check the which POS is assign to which word
for word in doc3:
    print(word, ":>----->>>", word.pos_, "=", spacy.explain(word.pos_)) # coarse grain pos
print()

# check tagging
for word in doc3:
    print(word, ":>---->>>", word.tag_, "=", spacy.explain(word.tag_)) # fine-grained pos

I left the room
I :>----->>> PRON = pronoun
left :>----->>> VERB = verb
the :>----->>> DET = determiner
room :>----->>> NOUN = noun

I :>---->>> PRP = pronoun, personal
left :>---->>> VBD = verb, past tense
the :>---->>> DT = determiner
room :>---->>> NN = noun, singular or mass

-----------------------------------------------------------------------

to the left of the room
to :>----->>> ADP = adposition
the :>----->>> DET = determiner
left :>----->>> NOUN = noun
of :>----->>> ADP = adposition
the :>----->>> DET = determiner
room :>----->>> NOUN = noun

to :>---->>> IN = conjunction, subordinating or preposition
the :>---->>> DT = determiner
left :>---->>> NN = noun, singular or mass
of :>---->>> IN = conjunction, subordinating or preposition
the :>---->>> DT = determiner
room :>---->>> NN = noun, singular or mass


In [8]:
doc4 = nlp(u"I read books on history")
print(doc4.text)
for word in doc4:
    print(word.text, ":>----->>>", 
          word.pos_, "=", spacy.explain(word.pos_),
          " | ",
          word.tag_, "=", spacy.explain(word.tag_))
print()

print()
print("---------------------------------------------------------------")
print()

doc5 = nlp(u"I have read a book on history")
print(doc5.text)
for word in doc5:
    print(word.text, ":>----->>>", 
          word.pos_, "=", spacy.explain(word.pos_),
          " | ",
          word.tag_, "=", spacy.explain(word.tag_))
print()


I read books on history
I :>----->>> PRON = pronoun  |  PRP = pronoun, personal
read :>----->>> VERB = verb  |  VBP = verb, non-3rd person singular present
books :>----->>> NOUN = noun  |  NNS = noun, plural
on :>----->>> ADP = adposition  |  IN = conjunction, subordinating or preposition
history :>----->>> NOUN = noun  |  NN = noun, singular or mass


---------------------------------------------------------------

I have read a book on history
I :>----->>> PRON = pronoun  |  PRP = pronoun, personal
have :>----->>> AUX = auxiliary  |  VBP = verb, non-3rd person singular present
read :>----->>> VERB = verb  |  VBN = verb, past participle
a :>----->>> DET = determiner  |  DT = determiner
book :>----->>> NOUN = noun  |  NN = noun, singular or mass
on :>----->>> ADP = adposition  |  IN = conjunction, subordinating or preposition
history :>----->>> NOUN = noun  |  NN = noun, singular or mass



In [9]:
doc6 = nlp(u"The quick brown fox jumped over the lazy dog")
print(doc6.text)
for word in doc6:
    print(word.text, ":>----->>>", 
          word.pos_, "=", spacy.explain(word.pos_),
          " | ",
          word.tag_, "=", spacy.explain(word.tag_))
print()

The quick brown fox jumped over the lazy dog
The :>----->>> DET = determiner  |  DT = determiner
quick :>----->>> ADJ = adjective  |  JJ = adjective (English), other noun-modifier (Chinese)
brown :>----->>> ADJ = adjective  |  JJ = adjective (English), other noun-modifier (Chinese)
fox :>----->>> NOUN = noun  |  NN = noun, singular or mass
jumped :>----->>> VERB = verb  |  VBD = verb, past tense
over :>----->>> ADP = adposition  |  IN = conjunction, subordinating or preposition
the :>----->>> DET = determiner  |  DT = determiner
lazy :>----->>> ADJ = adjective  |  JJ = adjective (English), other noun-modifier (Chinese)
dog :>----->>> NOUN = noun  |  NN = noun, singular or mass



In [10]:
from spacy import displacy
displacy.render(doc6, style='dep', jupyter=True)

In [11]:
options={
    'distance':80,
    'compact':True,
    'color':'#fff',
    'bg':'#00a65a'
}

displacy.render(doc6, style='dep', jupyter=True, options=options)

### **How POS Tagging Works?**
- HMM
<pre>
    Test Data : [Will will google campusx]
    Training Data : [[Nitish loves campusx],
                    [Can Nitish google campusx],
                    [Will Ankita google campusx],
                    [Ankita lover Will],
                    [Will lovers google]]
    Prepare Training Dataset : 1. POS Tagging
                               2. HMM Model Train
                               3. Now POS tagging on test dataset.

    HMM : N(noun), V(verb), M(model)
    N(noun) = [[Nitish, campusx],        V(verb) = [[loves],       M(model) = [[],
               [Nitish, campusx],                   [google],                  [can],
               [Ankita, campusx],                   [google],                  [will],
               [Ankita, will],                      [loves],                   [],
               [Will, google]]                      [loves]]                   []]
    Total no. of N(noun) = 10,          Total no. of V(verb) = 5,  Total no. of M(model) = 2
     ______________________Emission_____________________
    |     word     |  N(noun)  |  M(model)  |  V(verb)  |
    |______________|___________|____________|___________|
    |  Nitesh      |   2/10    |     0      |     0     |
    |______________|___________|____________|___________|
    |  loves       |    0      |     0      |    3/5    |
    |______________|___________|____________|___________|
    |  campusx     |   3/10    |     0      |     0     |
    |______________|___________|____________|___________|
    |  google      |   1/10    |     0      |    2/5    |
    |______________|___________|____________|___________|
    |  will        |   2/10    |    1/2     |     0     |
    |______________|___________|____________|___________|
    |  ankita      |   2/10    |      0     |     0     |
    |______________|___________|____________|___________|
    |  can         |     0     |     1/2    |     0     |
    |______________|___________|____________|___________|

     ______________________Transition____________________________
    |            |  N(noun)  |  M(model)  |  V(verb)  |  E(end)  |
    |____________|___________|____________|___________|__________|
    |  S(start)  |    3/5    |    2/5     |     0     |    0     | 
    |____________|___________|____________|___________|__________|
    |  N(nout)   |     0     |     0      |    5/10   |   5/10   |
    |____________|___________|____________|___________|__________|
    |  M(model)  |    2/2    |     0      |     0     |    0     |
    |____________|___________|____________|___________|__________|
    |  V(verb)   |    5/5    |     0      |     0     |    0     |
    |____________|___________|____________|___________|__________|
    
</pre>
- <b>Transition Probabiliy</b> \
        ![Metacharacter](../img/transitionProb2.jpeg)
    
- <b>Transition + Emission Probability</b> \
        ![Metacharacter](../img/emissionProb.jpeg)
    

- Test Dataset : [Will will google campusx]
  <pre>
                     Will          will         google         campusx
             /------- N             N            N               N ---------\  
            /                                                                \
      start --------- M             M            M               M ---------- end
            \                                                                /
             \------- V             V            V               V ---------/
      Fully Connected and try all possible combination and calculate(<em><b>emission x transition</b></em>) value.
      For which combination the value of <em><b>emission x transition</b></em> will be high that will be answere.
      
      start------------M------------N--------------V---------------N-----------end
               2/5   [1/2]    1   [2/10]   1/2   [2/5]     1     [1/10]   1/2      > 0 close to 1 
  </pre>
- There could be too high combination. To over come this problem we use ___Viterbi Algorithm___. It will ignore those node for which __emission x transition__ is zero or minimum. If there is multiple path to go on a node, take that node for which __emission x transition__ value is high.