# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [1]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [2]:
with open('/content/peterrabbit.txt') as f:
  doc = nlp(f.read())


**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [3]:
# Enter your code here:
for token in list(doc.sents)[2]:
  print(f"{token.text:{10}} {token.pos_:{10}} {token.tag_:{10}} {str(spacy.explain(token.tag_))}")





         SPACE      _SP        whitespace
They       PRON       PRP        pronoun, personal
lived      VERB       VBD        verb, past tense
with       ADP        IN         conjunction, subordinating or preposition
their      PRON       PRP$       pronoun, possessive
Mother     PROPN      NNP        noun, proper singular
in         ADP        IN         conjunction, subordinating or preposition
a          DET        DT         determiner
sand       NOUN       NN         noun, singular or mass
-          PUNCT      HYPH       punctuation mark, hyphen
bank       NOUN       NN         noun, singular or mass
,          PUNCT      ,          punctuation mark, comma
underneath ADP        IN         conjunction, subordinating or preposition
the        DET        DT         determiner
root       NOUN       NN         noun, singular or mass
of         ADP        IN         conjunction, subordinating or preposition
a          DET        DT         determiner

          SPACE      _SP       

**3. Provide a frequency list of POS tags from the entire document**

In [6]:
POS_Counts = doc.count_by(spacy.attrs.POS)

for k,v in sorted(POS_Counts.items()):
  print(f"id:{k} POS:{doc.vocab[k].text} {v} counts")




id:84 POS:ADJ 56 counts
id:85 POS:ADP 124 counts
id:86 POS:ADV 63 counts
id:87 POS:AUX 49 counts
id:89 POS:CCONJ 61 counts
id:90 POS:DET 91 counts
id:92 POS:NOUN 170 counts
id:93 POS:NUM 8 counts
id:94 POS:PART 30 counts
id:95 POS:PRON 108 counts
id:96 POS:PROPN 73 counts
id:97 POS:PUNCT 171 counts
id:98 POS:SCONJ 20 counts
id:100 POS:VERB 135 counts
id:103 POS:SPACE 99 counts


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 91

In [7]:
len(doc)



1258

In [8]:
100* POS_Counts[92]/len(doc)

13.513513513513514

**5. Display the Dependency Parse for the third sentence**

In [9]:
displacy.render(list(doc.sents)[2],style='dep', jupyter=True)

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [12]:
for ent in doc.ents[:2]:
  print(ent.text + '  '+ ent.label_ + '  '+ str(spacy.explain(ent.label_)))  


Beatrix Potter  PERSON  People, including fictional
1902  DATE  Absolute or relative dates or periods


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [14]:
len(list(doc.sents))

54

**8. CHALLENGE: How many sentences contain named entities?**

In [15]:
list_of_sents = [nlp(sent.text) for sent in doc.sents]
list_of_ners = [doc for doc in list_of_sents if doc.ents]

len(list_of_ners)


24

**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [16]:
display.render(list_of_sents[0], style='ent', jupyter=True)

AttributeError: ignored

### Great Job!