___

<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>
___

# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [1]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [2]:
with open('../TextFiles/peterrabbit.txt') as f:
    doc = nlp(f.read())


**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [19]:
[token.text for token in doc]

['The',
 'Tale',
 'of',
 'Peter',
 'Rabbit',
 ',',
 'by',
 'Beatrix',
 'Potter',
 '(',
 '1902',
 ')',
 '.',
 '\n\n',
 'Once',
 'upon',
 'a',
 'time',
 'there',
 'were',
 'four',
 'little',
 'Rabbits',
 ',',
 'and',
 'their',
 'names',
 '\n',
 'were--',
 '\n\n          ',
 'Flopsy',
 ',',
 '\n       ',
 'Mopsy',
 ',',
 '\n   ',
 'Cotton',
 '-',
 'tail',
 ',',
 '\n',
 'and',
 'Peter',
 '.',
 '\n\n',
 'They',
 'lived',
 'with',
 'their',
 'Mother',
 'in',
 'a',
 'sand',
 '-',
 'bank',
 ',',
 'underneath',
 'the',
 'root',
 'of',
 'a',
 '\n',
 'very',
 'big',
 'fir',
 '-',
 'tree',
 '.',
 '\n\n',
 "'",
 'Now',
 'my',
 'dears',
 ',',
 "'",
 'said',
 'old',
 'Mrs.',
 'Rabbit',
 'one',
 'morning',
 ',',
 "'",
 'you',
 'may',
 'go',
 'into',
 '\n',
 'the',
 'fields',
 'or',
 'down',
 'the',
 'lane',
 ',',
 'but',
 'do',
 "n't",
 'go',
 'into',
 'Mr.',
 'McGregor',
 "'s",
 'garden',
 ':',
 '\n',
 'your',
 'Father',
 'had',
 'an',
 'accident',
 'there',
 ';',
 'he',
 'was',
 'put',
 'in',
 'a',


In [27]:
span = doc[2:4]
span

of Peter

In [25]:
third_sentence = [sent.text for sent in doc.sents][4]

In [26]:
third_sentence

'They lived with their Mother in a sand-bank, underneath the root of a\nvery big fir-tree.\n\n'

In [31]:
third_sentence.split()[0]

'They'

In [35]:
for token in list(doc.sents)[4]:
    print(f'{token.text:{12}} {token.pos_:{6}} {token.tag_:{6}} {spacy.explain(token.tag_)}')

They         PRON   PRP    pronoun, personal
lived        VERB   VBD    verb, past tense
with         ADP    IN     conjunction, subordinating or preposition
their        PRON   PRP$   pronoun, possessive
Mother       PROPN  NNP    noun, proper singular
in           ADP    IN     conjunction, subordinating or preposition
a            DET    DT     determiner
sand         NOUN   NN     noun, singular or mass
-            PUNCT  HYPH   punctuation mark, hyphen
bank         NOUN   NN     noun, singular or mass
,            PUNCT  ,      punctuation mark, comma
underneath   ADP    IN     conjunction, subordinating or preposition
the          DET    DT     determiner
root         NOUN   NN     noun, singular or mass
of           ADP    IN     conjunction, subordinating or preposition
a            DET    DT     determiner

            SPACE  _SP    None
very         ADV    RB     adverb
big          ADJ    JJ     adjective
fir          NOUN   NN     noun, singular or mass
-            PUNCT 

**3. Provide a frequency list of POS tags from the entire document**

In [36]:
POS_counts = doc.count_by(spacy.attrs.POS)

In [37]:
POS_counts

{90: 91,
 96: 75,
 85: 123,
 97: 173,
 93: 8,
 103: 99,
 86: 66,
 98: 20,
 92: 172,
 95: 108,
 87: 55,
 84: 49,
 89: 61,
 100: 129,
 94: 29}

In [38]:
POS_counts.items()

dict_items([(90, 91), (96, 75), (85, 123), (97, 173), (93, 8), (103, 99), (86, 66), (98, 20), (92, 172), (95, 108), (87, 55), (84, 49), (89, 61), (100, 129), (94, 29)])

In [39]:
for k , v in POS_counts.items():
    print(f"{k}. {doc.vocab[k].text:{5}} : {v}")

90. DET   : 91
96. PROPN : 75
85. ADP   : 123
97. PUNCT : 173
93. NUM   : 8
103. SPACE : 99
86. ADV   : 66
98. SCONJ : 20
92. NOUN  : 172
95. PRON  : 108
87. AUX   : 55
84. ADJ   : 49
89. CCONJ : 61
100. VERB  : 129
94. PART  : 29


In [40]:
for k , v in sorted(POS_counts.items()):
    print(f"{k}. {doc.vocab[k].text:{5}} : {v}")

84. ADJ   : 49
85. ADP   : 123
86. ADV   : 66
87. AUX   : 55
89. CCONJ : 61
90. DET   : 91
92. NOUN  : 172
93. NUM   : 8
94. PART  : 29
95. PRON  : 108
96. PROPN : 75
97. PUNCT : 173
98. SCONJ : 20
100. VERB  : 129
103. SPACE : 99


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 92

In [49]:
percent = 100 * POS_counts[92]

In [50]:
percent = 100*POS_counts[92]/len(doc)

In [46]:
percent

13.672496025437201

In [48]:
print(f'{POS_counts[92]}/{len(doc)} = {percent:{.4}}%')

172/1258 = 13.67%


**5. Display the Dependency Parse for the third sentence**

In [54]:
displacy.render(list(doc.sents)[2], style='dep', jupyter=True, options={'distance': 110})

In [53]:
displacy.render(list(doc.sents)[2] , style='dep' , jupyter=True , options= {'distance' : 110})

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [55]:
for ent in doc.ents[:2]:
    print(ent.text+' - '+ent.label_+' - '+str(spacy.explain(ent.label_)))

Peter Rabbit - PERSON - People, including fictional
Beatrix Potter - PERSON - People, including fictional


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [57]:
len([sent for sent in doc.sents])

62

**8. CHALLENGE: How many sentences contain named entities?**

In [60]:
list_of_sents = [nlp(sent.text) for sent in doc.sents]

In [61]:
list_of_sents

[The Tale of Peter Rabbit, by Beatrix Potter (1902).
 , Once upon a time there were four little Rabbits, and their names, were--
 
           , Flopsy,
        Mopsy,
    Cotton-tail,
 and Peter.
 , They lived with their Mother in a sand-bank, underneath the root of a
 very big fir-tree.
 , ', Now my dears,' said old Mrs. Rabbit one morning, 'you may go into
 the fields or down the lane, but don't go into Mr. McGregor's garden:, your Father had an accident there; he was put in a pie by Mrs.
 McGregor.'
 , 'Now run along, and don't get into mischief., I am going out.'
 , Then old Mrs. Rabbit took a basket and her umbrella, and went through
 the wood to the baker's., She bought a loaf of brown bread and five
 currant buns.
 , Flopsy, Mopsy, and Cottontail, who were good little bunnies, went
 down the lane to gather blackberries:
 
 But Peter, who was very naughty, ran straight away to Mr. McGregor's
 garden, and squeezed under the gate!
 , First he ate some lettuces and some French beans

In [62]:
list_of_ners = [doc for doc in list_of_sents if doc.ents]

In [63]:
list_of_ners

[The Tale of Peter Rabbit, by Beatrix Potter (1902).
 , Once upon a time there were four little Rabbits, and their names, Flopsy,
        Mopsy,
    Cotton-tail,
 and Peter.
 , Now my dears,' said old Mrs. Rabbit one morning, 'you may go into
 the fields or down the lane, but don't go into Mr. McGregor's garden:, Then old Mrs. Rabbit took a basket and her umbrella, and went through
 the wood to the baker's., She bought a loaf of brown bread and five
 currant buns.
 , Flopsy, Mopsy, and Cottontail, who were good little bunnies, went
 down the lane to gather blackberries:
 
 But Peter, who was very naughty, ran straight away to Mr. McGregor's
 garden, and squeezed under the gate!
 , First he ate some lettuces and some French beans; and then he ate
 some radishes;
 , Mr. McGregor was on his hands and knees planting out young cabbages,
 but he jumped up and ran after Peter, waving a rake and calling out,
 'Stop thief!'
 , Peter was most dreadfully frightened; he rushed all over the garden,

In [64]:
len(list_of_ners)

35

**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [65]:
displacy.render(list_of_sents[0], style='ent', jupyter=True)

### Great Job!