# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [0]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [5]:
with open('drive/My Drive/Pytorch_DataSet/TextFiles/peterrabbit.txt','r') as f:
  doc = nlp(f.read())

doc[:50]


The Tale of Peter Rabbit, by Beatrix Potter (1902).

Once upon a time there were four little Rabbits, and their names
were--

          Flopsy,
       Mopsy,
   Cotton-tail,
and Peter.

They lived with their Mother

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [13]:
count = 0
for sent in doc.sents:
  if count == 3:
    sen = sent
    break
  count+=1

print(sen)

for token in sen:
  print(f'{token.text:{10}} {token.pos_:{10}} {token.tag_:{10}} {spacy.explain(token.tag_)}')

They lived with their Mother in a sand-bank, underneath the root of a
very big fir-tree.


They       PRON       PRP        pronoun, personal
lived      VERB       VBD        verb, past tense
with       ADP        IN         conjunction, subordinating or preposition
their      DET        PRP$       pronoun, possessive
Mother     PROPN      NNP        noun, proper singular
in         ADP        IN         conjunction, subordinating or preposition
a          DET        DT         determiner
sand       NOUN       NN         noun, singular or mass
-          PUNCT      HYPH       punctuation mark, hyphen
bank       NOUN       NN         noun, singular or mass
,          PUNCT      ,          punctuation mark, comma
underneath ADP        IN         conjunction, subordinating or preposition
the        DET        DT         determiner
root       NOUN       NN         noun, singular or mass
of         ADP        IN         conjunction, subordinating or preposition
a          DET        DT     

**3. Provide a frequency list of POS tags from the entire document**

In [16]:
POS_counts = doc.count_by(spacy.attrs.POS)

for k,v in sorted(POS_counts.items()):
    print(f'{k}. {doc.vocab[k].text:{5}}: {v}')

84. ADJ  : 50
85. ADP  : 123
86. ADV  : 67
87. AUX  : 48
89. CCONJ: 61
90. DET  : 118
92. NOUN : 171
93. NUM  : 8
94. PART : 29
95. PRON : 81
96. PROPN: 73
97. PUNCT: 174
98. SCONJ: 20
100. VERB : 136
103. SPACE: 99


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 91

In [19]:
percent = 100*POS_counts[92]/len(doc)

print(f'{POS_counts[92]}/{len(doc)} = {percent:{.4}}%')

171/1258 = 13.59%


**5. Display the Dependency Parse for the third sentence**

In [22]:
displacy.render(sen,style='dep',jupyter=True)

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [24]:
for ent in doc.ents[:2]:
    print(ent.text+' - '+ent.label_+' - '+str(spacy.explain(ent.label_)))

Peter Rabbit - PERSON - People, including fictional
Beatrix Potter - PERSON - People, including fictional


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [27]:
len([sent for sent in doc.sents])

68

**8. CHALLENGE: How many sentences contain named entities?**

In [28]:
list_of_sents = [nlp(sent.text) for sent in doc.sents]
list_of_ners = [doc for doc in list_of_sents if doc.ents]
len(list_of_ners)

40

**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [30]:
displacy.render(list_of_sents[0],style='ent',jupyter=True)

### Great Job!