___

<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>
___

# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [1]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')

from spacy import displacy

In [2]:
import os
data_folder = os.path.abspath(os.getcwd()).replace("\\", "/") + "/"

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [3]:
with open(data_folder + 'peterrabbit.txt') as f:
    doc = nlp(f.read())

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [4]:
# Enter your code here:
for token in list(doc.sents)[3]: # Ã© a 4a
    print(f'{token.text:{12}} {token.pos_:{8}} {token.tag_:{8}} {spacy.explain(token.tag_)}')

They         PRON     PRP      pronoun, personal
lived        VERB     VBD      verb, past tense
with         ADP      IN       conjunction, subordinating or preposition
their        DET      PRP$     pronoun, possessive
Mother       PROPN    NNP      noun, proper singular
in           ADP      IN       conjunction, subordinating or preposition
a            DET      DT       determiner
sand         NOUN     NN       noun, singular or mass
-            PUNCT    HYPH     punctuation mark, hyphen
bank         NOUN     NN       noun, singular or mass
,            PUNCT    ,        punctuation mark, comma
underneath   ADP      IN       conjunction, subordinating or preposition
the          DET      DT       determiner
root         NOUN     NN       noun, singular or mass
of           ADP      IN       conjunction, subordinating or preposition
a            DET      DT       determiner

            SPACE    _SP      None
very         ADV      RB       adverb
big          ADJ      JJ       adj

**3. Provide a frequency list of POS tags from the entire document**

In [5]:
POS_counts = doc.count_by(spacy.attrs.POS)

sorted_POS_counts = sorted(POS_counts.items(), key=lambda x: x[1], reverse=True)
print(sorted_POS_counts, "\n")

for pos, count in sorted_POS_counts:
    print(f"{pos:{6}}. {doc.vocab[pos].text:{5}} : {count:{4}}")    

[(97, 174), (92, 171), (100, 136), (85, 123), (90, 118), (103, 99), (95, 81), (96, 73), (86, 67), (89, 61), (84, 50), (87, 48), (94, 29), (98, 20), (93, 8)] 

    97. PUNCT :  174
    92. NOUN  :  171
   100. VERB  :  136
    85. ADP   :  123
    90. DET   :  118
   103. SPACE :   99
    95. PRON  :   81
    96. PROPN :   73
    86. ADV   :   67
    89. CCONJ :   61
    84. ADJ   :   50
    87. AUX   :   48
    94. PART  :   29
    98. SCONJ :   20
    93. NUM   :    8


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 91

In [6]:
print(f'{POS_counts[92]/len(doc)*100: {0.3}}%')

 13.6%


**5. Display the Dependency Parse for the third sentence**

In [7]:
from spacy import displacy

# Render the dependency parse immediately inside Jupyter:
displacy.render(list(doc.sents)[3], style='dep', jupyter=True, options={'distance': 110})

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [8]:
for ent in doc.ents[:2]:
    print(ent.text + ' - ' + ent.label_ + ' - ' + str(spacy.explain(ent.label_)))

Peter Rabbit - PERSON - People, including fictional
Beatrix Potter - PERSON - People, including fictional


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [9]:
len(list(doc.sents))

68

**8. CHALLENGE: How many sentences contain named entities?**

In [10]:
%%time
sents_list = [nlp(sent.text) for sent in doc.sents]
ners_list = [doc for doc in sents_list if doc.ents]
len(ners_list)

Wall time: 976 ms


40

In [11]:
%%time
# or
counting_ners = 0
for sent in doc.sents:
    if nlp(sent.text).ents:
        counting_ners += 1

counting_ners

Wall time: 956 ms


40

**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [12]:
displacy.render(sents_list[0], style='ent', jupyter=True)

### Great Job!