# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [1]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>


In [2]:
# Enter your code here:

with open("peterrabbit.txt", "r") as txt:
    doc = nlp(txt.read())

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [3]:
# Enter your code here:

# getting sentences of doc
sentences = [sent for sent in doc.sents]

# 3rd sentence's tokens
for token in sentences[2]:
    print(f'{token.text:{10}} {token.pos_:{10}} {token.tag_:{10}} {spacy.explain(token.tag_):{10}}')

'          VERB       VBP        verb, non-3rd person singular present
Now        ADV        RB         adverb    
my         PRON       PRP$       pronoun, possessive
dears      NOUN       NNS        noun, plural
,          PUNCT      ,          punctuation mark, comma
'          PUNCT      ''         closing quotation mark
said       VERB       VBD        verb, past tense
old        ADJ        JJ         adjective (English), other noun-modifier (Chinese)
Mrs.       PROPN      NNP        noun, proper singular
Rabbit     PROPN      NNP        noun, proper singular
one        NUM        CD         cardinal number
morning    NOUN       NN         noun, singular or mass
,          PUNCT      ,          punctuation mark, comma
'          PUNCT      ``         opening quotation mark
you        PRON       PRP        pronoun, personal
may        AUX        MD         verb, modal auxiliary
go         VERB       VB         verb, base form
into       ADP        IN         conjunction, subordinat

**3. Provide a frequency list of POS tags from the entire document**

In [4]:
# Enter your code here:

# returning integers that map to different parts of speech
counts_dict = doc.count_by(spacy.attrs.IDS['POS'])

# printing part of speech tags
for pos, count in counts_dict.items():
    tag_name = doc.vocab[pos].text
    print(f'{pos:{5}} {tag_name:{8}} {count}')

   86 ADV      111
   98 SCONJ    80
   90 DET      432
   92 NOUN     833
   95 PRON     259
  100 VERB     493
   93 NUM      45
   84 ADJ      240
   97 PUNCT    550
   89 CCONJ    205
  103 SPACE    420
   96 PROPN    485
   85 ADP      520
   87 AUX      177
   94 PART     100
  101 X        55
   99 SYM      6
   91 INTJ     2


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 91

In [5]:
# note in my case NOUN is under 92

percent_of_noun = (counts_dict[92] * 100)/ len(doc)
percent_of_noun

16.616796329543188

**5. Display the Dependency Parse for the third sentence**

In [6]:
 displacy.render(sentences[2], style='dep', jupyter=True, options={'distance': 120})

**6. Show the first two named entities from Beatrix Potter's The Tale of Peter Rabbit**

In [7]:
print(f'{doc.ents[0]}, {doc.ents[0].label_}, {spacy.explain(doc.ents[0].label_)}', "\n")
print(f'{doc.ents[1]}, {doc.ents[1].label_}, {spacy.explain(doc.ents[1].label_)}')

four, CARDINAL, Numerals that do not fall under another type 

were--

          Flopsy, ORG, Companies, agencies, institutions, etc.


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [8]:
len(sentences)

177

**8. CHALLENGE: How many sentences contain named entities?**

In [9]:
num_of_ner_sents = 0

for sent in sentences:
    if sent.ents != []:
        num_of_ner_sents += 1
    
num_of_ner_sents

123

**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [10]:
displacy.render(sentences[0], style="ent", jupyter=True)