# POS tagging

1. universal tags can be accessed via "pos_" 
2. fine grained tagged can be accessed via "tag_"

In [1]:
import spacy 
nlp = spacy.load("en_core_web_md")

In [2]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [4]:
tag_list = nlp.pipe_labels["tagger"]
len(tag_list)

50

In [6]:
for tag in tag_list:
    print(f"{tag:<15s}{spacy.explain(tag)}")

$              symbol, currency
''             closing quotation mark
,              punctuation mark, comma
-LRB-          left round bracket
-RRB-          right round bracket
.              punctuation mark, sentence closer
:              punctuation mark, colon or ellipsis
ADD            email
AFX            affix
CC             conjunction, coordinating
CD             cardinal number
DT             determiner
EX             existential there
FW             foreign word
HYPH           punctuation mark, hyphen
IN             conjunction, subordinating or preposition
JJ             adjective (English), other noun-modifier (Chinese)
JJR            adjective, comparative
JJS            adjective, superlative
LS             list item marker
MD             verb, modal auxiliary
NFP            superfluous punctuation
NN             noun, singular or mass
NNP            noun, proper singular
NNPS           noun, proper plural
NNS            noun, plural
PDT            predeterminer
POS    

In [7]:
doc = nlp("I will ship the package tomorrow.")
for token in doc:
    print(f"{token.text:<15s}{token.tag_:<5s}{spacy.explain(token.tag_)}")

I              PRP  pronoun, personal
will           MD   verb, modal auxiliary
ship           VB   verb, base form
the            DT   determiner
package        NN   noun, singular or mass
tomorrow       NN   noun, singular or mass
.              .    punctuation mark, sentence closer


here ship is marked as the verb.

In [8]:
doc = nlp("I saw a red ship.")
for token in doc:
    print(f"{token.text:<15s}{token.tag_:<5s}{spacy.explain(token.tag_)}")

I              PRP  pronoun, personal
saw            VBD  verb, past tense
a              DT   determiner
red            JJ   adjective (English), other noun-modifier (Chinese)
ship           NN   noun, singular or mass
.              .    punctuation mark, sentence closer


In [9]:
doc = nlp("My cat will fish for a fish tomorrow in a fishy way.")
for token in doc:
    print(f"{token.text:<15s}{token.tag_:<5s}{spacy.explain(token.tag_)}")

My             PRP$ pronoun, possessive
cat            NN   noun, singular or mass
will           MD   verb, modal auxiliary
fish           VB   verb, base form
for            IN   conjunction, subordinating or preposition
a              DT   determiner
fish           NN   noun, singular or mass
tomorrow       NN   noun, singular or mass
in             IN   conjunction, subordinating or preposition
a              DT   determiner
fishy          JJ   adjective (English), other noun-modifier (Chinese)
way            NN   noun, singular or mass
.              .    punctuation mark, sentence closer


# why we need to POS

1. to solve the Word Sense Disambiguation(WSD). 
2. Even though it cant complete solve the prob.but it can use make some sense in word usage.

Example :
Consider some sentence have the action to fly, but only some sentence intent to make a ticket booking.

In [11]:
doc1 = nlp("I flew to Rome.")
doc2 = nlp("I have flown to Rome.")
doc3 = nlp("I'm flying to Rome")
doc4 = nlp("I need to fly to Rome")
doc5 = nlp("I will fly to Rome")

in these sentence we remove sentence with past tense and present participle.those are not intent to book flight.

In [14]:
for doc in [doc1,doc2,doc3,doc4,doc5]:
    print([(token,token.tag_,token.lemma_) for token in doc if  token.tag_ =="VB" and token.lemma_ =="fly"])

[]
[]
[]
[(fly, 'VB', 'fly')]
[(fly, 'VB', 'fly')]
