___

<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>
___

# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [2]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [3]:
with open('peterrabbit.txt') as f:
    lines = f.readlines()

In [4]:
whole_text = ''
for line in lines:
    whole_text += line

In [5]:
print(whole_text)

The Tale of Peter Rabbit, by Beatrix Potter (1902).

Once upon a time there were four little Rabbits, and their names
were--

          Flopsy,
       Mopsy,
   Cotton-tail,
and Peter.

They lived with their Mother in a sand-bank, underneath the root of a
very big fir-tree.

'Now my dears,' said old Mrs. Rabbit one morning, 'you may go into
the fields or down the lane, but don't go into Mr. McGregor's garden:
your Father had an accident there; he was put in a pie by Mrs.
McGregor.'

'Now run along, and don't get into mischief. I am going out.'

Then old Mrs. Rabbit took a basket and her umbrella, and went through
the wood to the baker's. She bought a loaf of brown bread and five
currant buns.

Flopsy, Mopsy, and Cottontail, who were good little bunnies, went
down the lane to gather blackberries:

But Peter, who was very naughty, ran straight away to Mr. McGregor's
garden, and squeezed under the gate!

First he ate some lettuces and some French beans; and then he ate
some radishes;

And

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [6]:
doc = nlp(whole_text)
sentences = [sent for sent in doc.sents]

In [7]:
# Enter your code here:
for token in sentences[2]:
    print(f'{token.text:{15}} {token.pos_:{5}} {token.dep_:{12}} {spacy.explain(token.tag_)}')


They            PRON  nsubj        pronoun, personal
lived           VERB  ROOT         verb, past tense
with            ADP   prep         conjunction, subordinating or preposition
their           DET   poss         pronoun, possessive
Mother          PROPN pobj         noun, proper singular
in              ADP   prep         conjunction, subordinating or preposition
a               DET   det          determiner
sand            NOUN  compound     noun, singular or mass
-               PUNCT punct        punctuation mark, hyphen
bank            NOUN  pobj         noun, singular or mass
,               PUNCT punct        punctuation mark, comma
underneath      ADP   prep         conjunction, subordinating or preposition
the             DET   det          determiner
root            NOUN  pobj         noun, singular or mass
of              ADP   prep         conjunction, subordinating or preposition
a               DET   det          determiner

               SPACE              None
very

**3. Provide a frequency list of POS tags from the entire document**

In [14]:
POS_counts = doc.count_by(spacy.attrs.POS)
for k,v in sorted(POS_counts.items()):
    print(f'{k}, {doc.vocab[k].text:{5}}: {v}')


84, ADJ  : 49
85, ADP  : 122
86, ADV  : 67
87, AUX  : 48
89, CCONJ: 61
90, DET  : 117
92, NOUN : 169
93, NUM  : 8
94, PART : 28
95, PRON : 82
96, PROPN: 75
97, PUNCT: 174
98, SCONJ: 20
100, VERB : 139
103, SPACE: 99


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 92

In [19]:
num=0
den = 0
for k,v in sorted(POS_counts.items()):
    if int(k) == 92:
        num = int(v)
    den += int(k)

num/den

0.1216702663786897

**5. Display the Dependency Parse for the third sentence**

In [21]:
displacy.render(list(doc.sents)[2], style='dep', jupyter=True, options={'distance': 50})

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [27]:
for ent in doc.ents[:10]:
    print(f'{ent.text:{20}} {ent.label_:{30}} {spacy.explain(ent.label_)}')

Peter Rabbit         PERSON                         People, including fictional
Beatrix Potter       PERSON                         People, including fictional
1902                 DATE                           Absolute or relative dates or periods
four                 CARDINAL                       Numerals that do not fall under another type
Mopsy                ORG                            Companies, agencies, institutions, etc.
Peter                PERSON                         People, including fictional
Rabbit               PERSON                         People, including fictional
one morning          TIME                           Times smaller than a day
McGregor             PERSON                         People, including fictional
Rabbit               PERSON                         People, including fictional


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [28]:
len([sent for sent in doc.sents])

62

**8. CHALLENGE: How many sentences contain named entities?**

In [34]:
a = [sent for sent in doc.sents if (len(sent.ents) != 0)]

**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [56]:
displacy.render(list_of_sents[0], style='ent', jupyter=True)

### Great Job!