# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [1]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [3]:
with open('peterrabbit.txt','r') as txt:
    doc = txt.read()


In [None]:
print(doc)

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [None]:
Doc = nlp(doc)
sen=[sent for sent in Doc.sents] 
sen

In [8]:
sen3 = sen[2]
sen3

They lived with their Mother in a sand-bank, underneath the root of a
very big fir-tree.

'

In [12]:
for token in sen3:
  print(f'{token.text:{10}} {token.pos_:{5}} {token.tag_:{8}} {spacy.explain(token.tag_)}')

They       PRON  PRP      pronoun, personal
lived      VERB  VBD      verb, past tense
with       ADP   IN       conjunction, subordinating or preposition
their      PRON  PRP$     pronoun, possessive
Mother     PROPN NNP      noun, proper singular
in         ADP   IN       conjunction, subordinating or preposition
a          DET   DT       determiner
sand       NOUN  NN       noun, singular or mass
-          PUNCT HYPH     punctuation mark, hyphen
bank       NOUN  NN       noun, singular or mass
,          PUNCT ,        punctuation mark, comma
underneath ADP   IN       conjunction, subordinating or preposition
the        DET   DT       determiner
root       NOUN  NN       noun, singular or mass
of         ADP   IN       conjunction, subordinating or preposition
a          DET   DT       determiner

          SPACE _SP      whitespace
very       ADV   RB       adverb
big        ADJ   JJ       adjective (English), other noun-modifier (Chinese)
fir        NOUN  NN       noun, singular 

**3. Provide a frequency list of POS tags from the entire document**

In [18]:
tt = Doc.count_by(spacy.attrs.POS)
tt

{90: 90,
 96: 76,
 85: 122,
 97: 173,
 93: 8,
 103: 99,
 86: 67,
 98: 20,
 92: 166,
 95: 109,
 100: 135,
 84: 54,
 89: 61,
 87: 49,
 94: 29}

In [26]:
keys=sorted(list(tt.keys()))
keys

[84, 85, 86, 87, 89, 90, 92, 93, 94, 95, 96, 97, 98, 100, 103]

In [27]:

for i in range(len(tt)):
  print(f'{keys[i]:{5}} {Doc.vocab[keys[i]].text:{5}} : {tt.get(keys[i])}')


   84 ADJ   : 54
   85 ADP   : 122
   86 ADV   : 67
   87 AUX   : 49
   89 CCONJ : 61
   90 DET   : 90
   92 NOUN  : 166
   93 NUM   : 8
   94 PART  : 29
   95 PRON  : 109
   96 PROPN : 76
   97 PUNCT : 173
   98 SCONJ : 20
  100 VERB  : 135
  103 SPACE : 99


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 91

In [30]:
# noun 92: count 166 on all other counts

sum = 0
for i in range(len(tt)):
  sum+=tt.get(keys[i])

print(round(166/sum *100,2),'%')

13.2 %


**5. Display the Dependency Parse for the third sentence**

In [31]:
# Import the displaCy library
from spacy import displacy

displacy.render(sen3, style='dep' ,jupyter=True)

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [51]:
d = sen[0]

In [36]:
def show_ents(doc):
    if doc.ents:
        for ent in doc.ents:
            print(ent.text+' - '+ent.label_+' - '+str(spacy.explain(ent.label_)))
    else:
        print('No named entities found.')

In [52]:
show_ents(d)

The Tale of Peter Rabbit - WORK_OF_ART - Titles of books, songs, etc.
Beatrix Potter - PERSON - People, including fictional
1902 - DATE - Absolute or relative dates or periods


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [38]:
sen=[sent for sent in Doc.sents] 
print(len(sen))

55


**8. CHALLENGE: How many sentences contain named entities?**

In [46]:
len([ent for ent in  Doc.ents if ent.label_ !=['DATE','TIME','PERCENT','MONEY','QUANTITY','ORDINAL','CARDINAL']])

38

**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [53]:
s0=sen[0]
s0

The Tale of Peter Rabbit, by Beatrix Potter (1902).


In [54]:
# Import the displaCy library
from spacy import displacy
displacy.render(s0, style='ent', jupyter=True)

### Great Job!