___

<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>
___

# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [1]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load(r'en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [2]:
with open('peterrabbit.txt') as f:
    text = f.read()
doc = nlp(text)

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [3]:
for token in list(doc.sents)[3]:
    print(f'{token.text:15} {token.pos_:7} {token.tag_:7}  {spacy.explain(token.tag_)}')

They            PRON    PRP      pronoun, personal
lived           VERB    VBD      verb, past tense
with            ADP     IN       conjunction, subordinating or preposition
their           PRON    PRP$     pronoun, possessive
Mother          PROPN   NNP      noun, proper singular
in              ADP     IN       conjunction, subordinating or preposition
a               DET     DT       determiner
sand            NOUN    NN       noun, singular or mass
-               PUNCT   HYPH     punctuation mark, hyphen
bank            NOUN    NN       noun, singular or mass
,               PUNCT   ,        punctuation mark, comma
underneath      ADP     IN       conjunction, subordinating or preposition
the             DET     DT       determiner
root            NOUN    NN       noun, singular or mass
of              ADP     IN       conjunction, subordinating or preposition
a               DET     DT       determiner

               SPACE   _SP      None
very            ADV     RB       adver

**3. Provide a frequency list of POS tags from the entire document**

In [4]:
POS_COUNTS = doc.count_by(spacy.attrs.POS)
for keys in POS_COUNTS.keys():
    print(f'{keys:6}. {doc.vocab[keys].text:7} : {POS_COUNTS[keys]}')

    90. DET     : 95
    96. PROPN   : 75
    85. ADP     : 124
    97. PUNCT   : 173
    93. NUM     : 9
   103. SPACE   : 99
    86. ADV     : 65
    98. SCONJ   : 16
    92. NOUN    : 171
    95. PRON    : 105
    87. AUX     : 43
    84. ADJ     : 53
    89. CCONJ   : 61
   100. VERB    : 138
    94. PART    : 31


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 91

In [5]:
print(POS_COUNTS[92],'/',sum(list(POS_COUNTS.values())),"  =  ",100*POS_COUNTS[92]/sum(list(POS_COUNTS.values())))

171 / 1258   =   13.593004769475357


**5. Display the Dependency Parse for the third sentence**

In [6]:
displacy.render(list(doc.sents)[3],style='dep',jupyter=True)

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [7]:
for ent in doc.ents[:2]:
    print(f'{ent.text:30} {ent.label_:14} {spacy.explain(ent.label_)}')
    

The Tale of Peter Rabbit       WORK_OF_ART    Titles of books, songs, etc.
Beatrix Potter                 PERSON         People, including fictional


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [8]:
len(list(doc.sents))

74

**8. CHALLENGE: How many sentences contain named entities?**

In [9]:
list_of_sents = list(doc.sents)
list_of_ents = list(doc.ents)
senti = 0
for sent in list_of_sents:
    for ent in list_of_ents:
        if str(ent) in str(sent):
            senti += 1
            break
senti

35

In [10]:
los = [nlp(sent.text) for sent in doc.sents]
lon = [doc for doc in los if doc.ents]
len(lon)

25

**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [11]:
displacy.render(list(doc.sents)[0],style='ent',jupyter=True)

### Great Job!