# Part of Speech Assessment
For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [1]:
# import spacy and load language library
import spacy
import en_core_web_sm
nlp=en_core_web_sm.load()
from spacy import displacy

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [46]:
# Create a doc object
with open("../TextFiles/peterrabbit.txt","r") as f:
    doc=nlp(f.read().replace("\n"," "))

In [47]:
sents=list(doc.sents)

For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.

In [26]:

for token in sents[2]:
    print(f"{token.text:{10}} {token.pos_:{10}} {token.tag_:{10}} {spacy.explain(token.tag_)}")

           SPACE      _SP        None
They       PRON       PRP        pronoun, personal
lived      VERB       VBD        verb, past tense
with       ADP        IN         conjunction, subordinating or preposition
their      PRON       PRP$       pronoun, possessive
Mother     NOUN       NN         noun, singular or mass
in         ADP        IN         conjunction, subordinating or preposition
a          DET        DT         determiner
sand       NOUN       NN         noun, singular or mass
-          PUNCT      HYPH       punctuation mark, hyphen
bank       NOUN       NN         noun, singular or mass
,          PUNCT      ,          punctuation mark, comma
underneath ADP        IN         conjunction, subordinating or preposition
the        DET        DT         determiner
root       NOUN       NN         noun, singular or mass
of         ADP        IN         conjunction, subordinating or preposition
a          DET        DT         determiner
very       ADV        RB         adve

**Provide a frequency list of POS tags from the entire document**

In [48]:
POS_counts=doc.count_by(spacy.attrs.POS)

In [49]:
POS_counts

{90: 95,
 96: 71,
 85: 126,
 97: 173,
 93: 9,
 103: 39,
 86: 65,
 98: 16,
 92: 171,
 95: 105,
 87: 42,
 84: 56,
 89: 61,
 100: 140,
 94: 29}

In [50]:
for k,v in sorted(POS_counts.items()):
    print(f"{k:{3}}. {doc.vocab[k].text:{10}}: {v}")

 84. ADJ       : 56
 85. ADP       : 126
 86. ADV       : 65
 87. AUX       : 42
 89. CCONJ     : 61
 90. DET       : 95
 92. NOUN      : 171
 93. NUM       : 9
 94. PART      : 29
 95. PRON      : 105
 96. PROPN     : 71
 97. PUNCT     : 173
 98. SCONJ     : 16
100. VERB      : 140
103. SPACE     : 39


**CHALLENGE: What percentage of tokens are nouns?**

In [55]:
per_noun=100*(POS_counts[92]/len(doc))
print(f"Percentage of Nouns in doc: {per_noun:{.4}}%")

Percentage of Nouns in doc: 14.27%


**Display the Dependency Parse for the third sentence**

In [58]:
displacy.render(sents[2],style="dep",jupyter=True,options={'distance':100})

**Show the first two named entities from Beatrix Potter's 'The Tale of Peter Rabbit'**

In [65]:
for ent in doc.ents[:2]:
    print(f"{ent.text}: {ent.label_} - {spacy.explain(ent.label_)}")
    
    

The Tale of Peter Rabbit: WORK_OF_ART - Titles of books, songs, etc.
Beatrix Potter: ORG - Companies, agencies, institutions, etc.


In [66]:
# Number of sentences in our 
len(sents)

57

In [74]:
# Number of sentences have named entities
list_of_sents = [nlp(sent.text) for sent in doc.sents]
list_of_ners = [doc for doc in list_of_sents if doc.ents]
len(list_of_ners)

27

In [75]:
len(doc.ents)

48

**Display entities**

In [76]:
sents[0]

The Tale of Peter Rabbit, by Beatrix Potter (1902).

In [77]:
displacy.render(sents[0],style='ent',jupyter=True)