# Parts of Speech Tagging

Using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902).

In [1]:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

In [2]:
with open('peterrabbit.txt') as f:
    doc=nlp(f.read())

In [5]:
sentences=[s for s in doc.sents]
print(sentences[2])

They lived with their Mother in a sand-bank, underneath the root of a
very big fir-tree.




**For every token in the third sentence, printing the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [6]:
for token in sentences[2]:
    print(f'{token.text :10}{token.pos_ :10}{token.tag_ :10}{spacy.explain(token.tag_)}')

They      PRON      PRP       pronoun, personal
lived     VERB      VBD       verb, past tense
with      ADP       IN        conjunction, subordinating or preposition
their     DET       PRP$      pronoun, possessive
Mother    PROPN     NNP       noun, proper singular
in        ADP       IN        conjunction, subordinating or preposition
a         DET       DT        determiner
sand      NOUN      NN        noun, singular or mass
-         PUNCT     HYPH      punctuation mark, hyphen
bank      NOUN      NN        noun, singular or mass
,         PUNCT     ,         punctuation mark, comma
underneathADP       IN        conjunction, subordinating or preposition
the       DET       DT        determiner
root      NOUN      NN        noun, singular or mass
of        ADP       IN        conjunction, subordinating or preposition
a         DET       DT        determiner

         SPACE     _SP       None
very      ADV       RB        adverb
big       ADJ       JJ        adjective
fir       NO

**Frequency list of POS tags from the entire document**

In [7]:
POS_counts = doc.count_by(spacy.attrs.POS)
for k,v in sorted(POS_counts.items()):
    print(f'{k}. {doc.vocab[k].text:{5}}: {v}')

84. ADJ  : 49
85. ADP  : 122
86. ADV  : 67
87. AUX  : 48
89. CCONJ: 61
90. DET  : 117
92. NOUN : 169
93. NUM  : 8
94. PART : 28
95. PRON : 82
96. PROPN: 75
97. PUNCT: 174
98. SCONJ: 20
100. VERB : 139
103. SPACE: 99


**Percentage of noun tokens are?**

In [8]:
total=0
nouns=0
for k,v in sorted(POS_counts.items()):
    if doc.vocab[k].text=='NOUN':
        nouns=v
    total+=v

print(f'{nouns}/{total} = {round(((nouns/total)*100),2)}%')

169/1258 = 13.43%


**Show the first two named entities from Beatrix Potter's 'The Tale of Peter Rabbit'**

In [9]:
doc.ents[0:2]

(Peter Rabbit, Beatrix Potter)

In [10]:
for ent in doc.ents[:2]:
    print(ent.text+' - '+ent.label_+' - '+str(spacy.explain(ent.label_)))

Peter Rabbit - PERSON - People, including fictional
Beatrix Potter - PERSON - People, including fictional


**Check how many sentences contain named entities?**

In [11]:
list_of_sents=[]
for s in sentences:
    if s.ents:
        list_of_sents.append(s)
len(list_of_sents)

34

**Displaying the named entity visualization for `list_of_sents[0]` from the previous problem**

In [12]:
displacy.render(list_of_sents[0],style='ent',jupyter=True)