___

<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>
___

# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [1]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [2]:
with open('./peterrabbit.txt') as f:
    doc = nlp(f.read())

print(doc)

The Tale of Peter Rabbit, by Beatrix Potter (1902).

Once upon a time there were four little Rabbits, and their names
were--

          Flopsy,
       Mopsy,
   Cotton-tail,
and Peter.

They lived with their Mother in a sand-bank, underneath the root of a
very big fir-tree.

'Now my dears,' said old Mrs. Rabbit one morning, 'you may go into
the fields or down the lane, but don't go into Mr. McGregor's garden:
your Father had an accident there; he was put in a pie by Mrs.
McGregor.'

'Now run along, and don't get into mischief. I am going out.'

Then old Mrs. Rabbit took a basket and her umbrella, and went through
the wood to the baker's. She bought a loaf of brown bread and five
currant buns.

Flopsy, Mopsy, and Cottontail, who were good little bunnies, went
down the lane to gather blackberries:

But Peter, who was very naughty, ran straight away to Mr. McGregor's
garden, and squeezed under the gate!

First he ate some lettuces and some French beans; and then he ate
some radishes;

And

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [3]:
doc_sents = [sent for sent in doc.sents]

In [4]:
doc_sents[2]

They lived with their Mother in a sand-bank, underneath the root of a
very big fir-tree.


In [5]:
for token in doc_sents[2]:
    print(f"{token.text:{13}}{token.pos_:{7}}{token.tag_:{7}} {spacy.explain(token.tag_)}")

They         PRON   PRP     pronoun, personal
lived        VERB   VBD     verb, past tense
with         ADP    IN      conjunction, subordinating or preposition
their        PRON   PRP$    pronoun, possessive
Mother       NOUN   NN      noun, singular or mass
in           ADP    IN      conjunction, subordinating or preposition
a            DET    DT      determiner
sand         NOUN   NN      noun, singular or mass
-            PUNCT  HYPH    punctuation mark, hyphen
bank         NOUN   NN      noun, singular or mass
,            PUNCT  ,       punctuation mark, comma
underneath   ADP    IN      conjunction, subordinating or preposition
the          DET    DT      determiner
root         NOUN   NN      noun, singular or mass
of           ADP    IN      conjunction, subordinating or preposition
a            DET    DT      determiner

            SPACE  _SP     whitespace
very         ADV    RB      adverb
big          ADJ    JJ      adjective (English), other noun-modifier (Chinese)
fi

**3. Provide a frequency list of POS tags from the entire document**

In [6]:
POS_count = doc.count_by(spacy.attrs.POS)

In [7]:
for k, v in sorted(POS_count.items()):
    print(f"{k}. {doc.vocab[k].text}: {v}")

84. ADJ: 53
85. ADP: 125
86. ADV: 63
87. AUX: 49
89. CCONJ: 61
90. DET: 90
92. NOUN: 172
93. NUM: 9
94. PART: 28
95. PRON: 110
96. PROPN: 74
97. PUNCT: 171
98. SCONJ: 19
100. VERB: 135
103. SPACE: 99


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 91

In [8]:
print(f"{POS_count[92]}/{doc.count_by(0)[0]} = {'{:.2f}'.format(POS_count[92] / doc.count_by(0)[0] * 100)}%")

172/1258 = 13.67%


**5. Display the Dependency Parse for the third sentence**

In [9]:
from spacy import displacy

In [10]:
displacy.render(doc_sents[2], style='dep')

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [11]:
doc_ent = [ent for ent in doc.ents]
for token in doc_ent[:2]:
    print(f"{token.text} - {token.label_} - {spacy.explain(token.label_)}")

The Tale of Peter Rabbit - WORK_OF_ART - Titles of books, songs, etc.
Beatrix Potter - PERSON - People, including fictional


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [12]:
len(doc_ent)

60

**8. CHALLENGE: How many sentences contain named entities?**

In [14]:
len([doc for doc in doc_sents if doc.ents])

35

**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [21]:
displacy.render(doc_ent[:5], style='ent')

### Great Job!