___

<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>
___

# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [1]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [2]:
with open('../TextFiles/peterrabbit.txt') as f:
    doc = nlp(f.read())

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [3]:
doc_sents = [sentence for sentence in doc.sents]
third_sents = doc_sents[2]

In [4]:
for token in third_sents:
    print(f"{token.text:{10}} {token.pos_:{10}} {token.tag_:{10}} {str(spacy.explain(token.tag_)):{10}}")

They       PRON       PRP        pronoun, personal
lived      VERB       VBD        verb, past tense
with       ADP        IN         conjunction, subordinating or preposition
their      ADJ        PRP$       pronoun, possessive
Mother     PROPN      NNP        noun, proper singular
in         ADP        IN         conjunction, subordinating or preposition
a          DET        DT         determiner
sand       NOUN       NN         noun, singular or mass
-          PUNCT      HYPH       punctuation mark, hyphen
bank       NOUN       NN         noun, singular or mass
,          PUNCT      ,          punctuation mark, comma
underneath ADP        IN         conjunction, subordinating or preposition
the        DET        DT         determiner
root       NOUN       NN         noun, singular or mass
of         ADP        IN         conjunction, subordinating or preposition
a          DET        DT         determiner

          SPACE                 None      
very       ADV        RB        

In [5]:
# Enter your code here:




**3. Provide a frequency list of POS tags from the entire document**

In [6]:
POS_counts = doc.count_by(spacy.attrs.POS)
for k,v in sorted(POS_counts.items()):
    print(f'{k}. {doc.vocab[k].text:{5}}: {v}')

83. ADJ  : 83
84. ADP  : 127
85. ADV  : 75
88. CCONJ: 61
89. DET  : 90
91. NOUN : 176
92. NUM  : 8
93. PART : 36
94. PRON : 72
95. PROPN: 75
96. PUNCT: 174
99. VERB : 182
102. SPACE: 99


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 91

In [7]:
num_tokens = sum(POS_counts.values())
num_nouns = POS_counts[91]
percentage = '{:.2%}'.format(num_nouns / num_tokens)
print(str(num_nouns) + '/' + str(num_tokens) + ' = ', percentage)

176/1258 =  13.99%


**5. Display the Dependency Parse for the third sentence**

In [8]:
for token in third_sents:
    print(token.text, token.pos_, end=' ')

They PRON lived VERB with ADP their ADJ Mother PROPN in ADP a DET sand NOUN - PUNCT bank NOUN , PUNCT underneath ADP the DET root NOUN of ADP a DET 
 SPACE very ADV big ADJ fir NOUN - PUNCT tree NOUN . PUNCT 

 SPACE 

In [9]:
displacy.render(doc_sents[2],style='dep',jupyter=True,options={'distance':80})

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [10]:
first_ent = doc.ents[0]
sec_ent = doc.ents[1]
print(first_ent.text + ' - ' + first_ent.label_  + ' - ' + str(spacy.explain(first_ent.label_)))
print(sec_ent.text + ' - ' + sec_ent.label_  + ' - ' + str(spacy.explain(sec_ent.label_)))

The Tale of Peter Rabbit - WORK_OF_ART - Titles of books, songs, etc.
Beatrix Potter - PERSON - People, including fictional


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [11]:
len(doc_sents)

56

**8. CHALLENGE: How many sentences contain named entities?**

In [12]:
list_of_sents = [nlp(sentence.text) for sentence in doc.sents]

In [13]:
ners = [doc for doc in list_of_sents if doc.ents]
len(ners)

49

In [14]:
count = 0
for sentence in list_of_sents:
    if len(sentence.ents) > 0:
        count += 1
print(count)

49


In [15]:
list_of_sents[0]

The Tale of Peter Rabbit, by Beatrix Potter (1902).


**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [16]:
displacy.render(list_of_sents[0], style='ent', jupyter=True)

### Great Job!