# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [1]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [2]:
with open('../UPDATED_NLP_COURSE/TextFiles/peterrabbit.txt') as file:
    doc = nlp(file.read())


**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [34]:
def print_tags(doc):
    print("{:20}{:20}{:20}{:20}\n".format("Token", "POS-tag", "FG-tag", "description"))
    for token in doc:
        text = token.text if token.text else ""
        pos = token.pos_ if token.pos_ else ""
        tag = token.tag_ if token.tag_ else ""
        explain = spacy.explain(tag) if spacy.explain(tag) else ""
        print(f"{text:{20}}{pos:{20}}{tag:{20}}{explain:{20}}")

In [35]:
sentences = list(doc.sents)
third_s = sentences[2]
print_tags(third_s)

Token               POS-tag             FG-tag              description         

They                PRON                PRP                 pronoun, personal   
lived               VERB                VBD                 verb, past tense    
with                ADP                 IN                  conjunction, subordinating or preposition
their               ADJ                 PRP$                pronoun, possessive 
Mother              PROPN               NNP                 noun, proper singular
in                  ADP                 IN                  conjunction, subordinating or preposition
a                   DET                 DT                  determiner          
sand                NOUN                NN                  noun, singular or mass
-                   PUNCT               HYPH                punctuation mark, hyphen
bank                NOUN                NN                  noun, singular or mass
,                   PUNCT               ,                

**3. Provide a frequency list of POS tags from the entire document**

In [40]:
POS_count = doc.count_by(spacy.attrs.POS)
doc.vocab[96].text

'PUNCT'

In [60]:
from typing import Dict, Tuple
from collections import OrderedDict

def get_pos_counts(doc: spacy.tokens.doc.Doc) -> Dict[int, Tuple[str, int]]:
    pos_count = doc.count_by(spacy.attrs.POS)
    pos_count = OrderedDict(sorted(pos_count.items()))
    return {key: (doc.vocab[key].text, value) for key, value in pos_count.items()}

pos_counts = get_pos_counts(doc)
for key, item in pos_counts.items():
    print(f"{key}. {item[0]} : {item[1]}")

83. ADJ : 83
84. ADP : 127
85. ADV : 75
88. CCONJ : 61
89. DET : 90
91. NOUN : 176
92. NUM : 8
93. PART : 36
94. PRON : 72
95. PROPN : 75
96. PUNCT : 174
99. VERB : 182
102. SPACE : 99


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 91

In [87]:
nouns = pos_counts[91][1]
total_pos = sum([v for _, v in list(pos_counts.values())])
noun_percentage = (nouns/total_pos) * 100
print(f"{nouns}/{total_pos} = {noun_percentage:.2f}%")

176/1258 = 13.99%


**5. Display the Dependency Parse for the third sentence**

In [91]:
from spacy import displacy

displacy.render(sentences[2], style='dep', jupyter=True)

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [93]:
for entity in doc.ents[:2]:
    print(f"{entity.text} - {entity.label_} - {spacy.explain(entity.label_)}")

The Tale of Peter Rabbit - WORK_OF_ART - Titles of books, songs, etc.
Beatrix Potter - PERSON - People, including fictional


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [94]:
len(sentences)

56

**8. CHALLENGE: How many sentences contain named entities?**

In [101]:
s_with_ents = [sentence for sentence in sentences if sentence.ents]
len(s_with_ents)

51

**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [105]:
displacy.render(sentences[0], style='ent', jupyter=True)

### Great Job!