___

<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>
___

# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [1]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [7]:
with open('../TextFiles/peterrabbit.txt') as f:
    doc = nlp(f.read())
    print(doc)


The Tale of Peter Rabbit, by Beatrix Potter (1902).

Once upon a time there were four little Rabbits, and their names
were--

          Flopsy,
       Mopsy,
   Cotton-tail,
and Peter.

They lived with their Mother in a sand-bank, underneath the root of a
very big fir-tree.

'Now my dears,' said old Mrs. Rabbit one morning, 'you may go into
the fields or down the lane, but don't go into Mr. McGregor's garden:
your Father had an accident there; he was put in a pie by Mrs.
McGregor.'

'Now run along, and don't get into mischief. I am going out.'

Then old Mrs. Rabbit took a basket and her umbrella, and went through
the wood to the baker's. She bought a loaf of brown bread and five
currant buns.

Flopsy, Mopsy, and Cottontail, who were good little bunnies, went
down the lane to gather blackberries:

But Peter, who was very naughty, ran straight away to Mr. McGregor's
garden, and squeezed under the gate!

First he ate some lettuces and some French beans; and then he ate
some radishes;

And

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [10]:
# Enter your code here:
for token in list(doc.sents)[3]:
    print(f'{token.text:{10}} {token.pos_:{10}} {token.tag_:{10}} {spacy.explain(token.tag_)}')



They       PRON       PRP        pronoun, personal
lived      VERB       VBD        verb, past tense
with       ADP        IN         conjunction, subordinating or preposition
their      DET        PRP$       pronoun, possessive
Mother     PROPN      NNP        noun, proper singular
in         ADP        IN         conjunction, subordinating or preposition
a          DET        DT         determiner
sand       NOUN       NN         noun, singular or mass
-          PUNCT      HYPH       punctuation mark, hyphen
bank       NOUN       NN         noun, singular or mass
,          PUNCT      ,          punctuation mark, comma
underneath ADP        IN         conjunction, subordinating or preposition
the        DET        DT         determiner
root       NOUN       NN         noun, singular or mass
of         ADP        IN         conjunction, subordinating or preposition
a          DET        DT         determiner

          SPACE      _SP        None
very       ADV        RB         adver

**3. Provide a frequency list of POS tags from the entire document**

In [15]:
POS_counts = doc.count_by(spacy.attrs.POS)
POS_counts

for k, v in sorted(POS_counts.items()):
    print(f'{k}. {doc.vocab[k].text:{10}}: {v}')

84. ADJ       : 57
85. ADP       : 129
86. ADV       : 75
89. CCONJ     : 61
90. DET       : 118
92. NOUN      : 166
93. NUM       : 8
94. PART      : 34
95. PRON      : 78
96. PROPN     : 75
97. PUNCT     : 173
100. VERB      : 185
103. SPACE     : 99


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 92

In [75]:
# Teacher Solution
print(len(doc)) # words in the document
print(POS_counts[92]) # words that are NOUN in the document

1258
166


In [72]:
# POS = Part Of Speech, it contains all tags (NOUN, VERB, NUM, ...)
result = 100 * POS_counts[92] / len(doc)
print(f'{round(result, 2)}%')

13.2%


In [23]:
# MY SOLUTION
all_tokens = 0
noun_tokens = 0
for k, v in sorted(POS_counts.items()):
    if (doc.vocab[k] == 92):
        noun_tokens = v
        print(f'Noun Token: {k}, {v}')
    all_tokens += v

result = (noun_tokens * 100) / all_tokens

print(f'{noun_tokens}/{all_tokens} = {round(result, 2)}%')

Noun Token: 92, 166
166/1258 = 13.2%


**5. Display the Dependency Parse for the third sentence**

In [76]:
# Teacher Solution
displacy.render(list(doc.sents)[3], style='dep', jupyter=True)

In [40]:
# MY SOLUTION
index = 0
for sent in doc.sents:
    if(index == 3):
        displacy.render(sent, style='dep', jupyter=True, options={'distance': 150})
        break
    index += 1

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [87]:
# Teacher Solution
for ent in doc.ents[:2]:
    print(ent.text + ' - ' + ent.label_ + ' - ' + str(spacy.explain(ent.label_)))

Peter Rabbit - PERSON - People, including fictional
Beatrix Potter - PERSON - People, including fictional


In [85]:
# My solution
for ent in list(doc.sents)[0].ents:
    print(ent.text + ' - ' + ent.label_ + ' - ' + str(spacy.explain(ent.label_)))


Peter Rabbit - PERSON - People, including fictional
Beatrix Potter - PERSON - People, including fictional
1902 - DATE - Absolute or relative dates or periods


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [46]:
len(list(doc.sents))

60

**8. CHALLENGE: How many sentences contain named entities?**

In [92]:
# Teacher Solution
list_of_sents = [nlp(sent.text) for sent in doc.sents]
list_of_nent = [doc for doc in list_of_sents if doc.ents] # return sentence that contains entities in its doc
len(list_of_nent)

39

In [51]:
# My Solution
def count_sent_named_entities(doc):
    counter = 0
    # iterate over each sentence
    for sent in doc.sents:
        if sent.ents:
            counter += 1
    print(counter)

# 49

In [52]:
count_sent_named_entities(doc)

39


**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [93]:
# Teacher Solution
displacy.render(list_of_sents[0], style='ent', jupyter=True)

In [94]:
# MY SOLUTION
displacy.render(nlp(list(doc.sents)[0].text), style='ent', jupyter=True)

### Great Job!