___

<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>
___

# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [1]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [2]:
with open(u'peterrabbit.txt') as f:
    my_val = f.read()
    
doc = nlp(my_val)

type(doc)

spacy.tokens.doc.Doc

In [33]:
for sent in doc.sents:
    print(sent)

The Tale of Peter Rabbit, by Beatrix Potter (1902).


Once upon a time there were four little Rabbits, and their names
were--

          Flopsy,
       Mopsy,
   Cotton-tail,
and Peter.


They lived with their Mother in a sand-bank, underneath the root of a
very big fir-tree.


'
Now my dears,' said old Mrs. Rabbit one morning, 'you may go into

the fields or down the lane, but don't go into Mr. McGregor's garden:
your Father had an accident there; he was put in a pie by Mrs.
McGregor.'


'Now run along, and don't get into mischief.
I am going out.'


Then old Mrs. Rabbit took a basket and her umbrella, and went through
the wood to the baker's.
She bought a loaf of brown bread and five
currant buns.


Flopsy, Mopsy, and Cottontail, who were good little bunnies, went
down the lane to gather blackberries:


But Peter, who was very naughty, ran straight away to Mr. McGregor's
garden, and squeezed under the gate!


First he ate some lettuces and some French beans; and then he ate
some radi

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [8]:
sents = [sent for sent in doc.sents]
len(sents)

62

In [9]:
print(sents[2].text)

They lived with their Mother in a sand-bank, underneath the root of a
very big fir-tree.




In [12]:
doc_sent_2 = sents[2]

In [13]:
# Enter your code here:

for token in doc_sent_2:
    print(f"{token.text:{10}} {token.pos_:{10}} {token.tag_:{10}} {spacy.explain(token.tag_)}")


They       PRON       PRP        pronoun, personal
lived      VERB       VBD        verb, past tense
with       ADP        IN         conjunction, subordinating or preposition
their      DET        PRP$       pronoun, possessive
Mother     PROPN      NNP        noun, proper singular
in         ADP        IN         conjunction, subordinating or preposition
a          DET        DT         determiner
sand       NOUN       NN         noun, singular or mass
-          PUNCT      HYPH       punctuation mark, hyphen
bank       NOUN       NN         noun, singular or mass
,          PUNCT      ,          punctuation mark, comma
underneath ADP        IN         conjunction, subordinating or preposition
the        DET        DT         determiner
root       NOUN       NN         noun, singular or mass
of         ADP        IN         conjunction, subordinating or preposition
a          DET        DT         determiner

          SPACE      _SP        None
very       ADV        RB         adver

**3. Provide a frequency list of POS tags from the entire document**

In [29]:
from collections import Counter
counts = Counter()
for token in doc:
    counts[token.pos ,token.pos_] += 1 # Equivalently, token.text
print(counts)

Counter({(97, 'PUNCT'): 174, (92, 'NOUN'): 169, (100, 'VERB'): 139, (85, 'ADP'): 122, (90, 'DET'): 117, (103, 'SPACE'): 99, (95, 'PRON'): 82, (96, 'PROPN'): 75, (86, 'ADV'): 67, (89, 'CCONJ'): 61, (84, 'ADJ'): 49, (87, 'AUX'): 48, (94, 'PART'): 28, (98, 'SCONJ'): 20, (93, 'NUM'): 8})


In [35]:
pos_counts = doc.count_by(spacy.attrs.POS)

for k,v in sorted(pos_counts.items()):
    print(f"{k} {doc.vocab[k].text:{5}}:{v}")

84 ADJ  :49
85 ADP  :122
86 ADV  :67
87 AUX  :48
89 CCONJ:61
90 DET  :117
92 NOUN :169
93 NUM  :8
94 PART :28
95 PRON :82
96 PROPN:75
97 PUNCT:174
98 SCONJ:20
100 VERB :139
103 SPACE:99


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 91

In [42]:
49+122+67+48+61+177+169+8+28+82+75+174+20+139+99

1318

In [43]:
1318-169

1149

In [64]:
(169/1258)*100

13.43402225755167

In [63]:
len(doc)

1258

In [67]:
pos_counts[92]

169

In [65]:
(169/1258)*100

13.43402225755167

**5. Display the Dependency Parse for the third sentence**

In [48]:
type(doc_sent_2)

spacy.tokens.span.Span

In [58]:
displacy.render(list(doc.sents)[2], style='dep', jupyter=True, options={'distance': 110})

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit**

In [59]:
for ent in doc.ents[:2]:
    print(ent.text+' - '+ent.label_+' - '+str(spacy.explain(ent.label_)))

Peter Rabbit - PERSON - People, including fictional
Beatrix Potter - PERSON - People, including fictional


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [68]:
len(list(doc.sents))

62

In [60]:
sents = [sent for sent in doc.sents]
len(sents)

62

**8. CHALLENGE: How many sentences contain named entities?**

In [61]:
list_of_sents = [nlp(sent.text) for sent in doc.sents]
list_of_ners = [doc for doc in list_of_sents if doc.ents]
len(list_of_ners)

36

**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [62]:
displacy.render(list_of_sents[0], style='ent', jupyter=True)

### Great Job!