# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [1]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>

In [9]:
with open ('C:\\Users\\PC\\Desktop\\nlp\\UPDATED_NLP_COURSE\\UPDATED_NLP_COURSE\\TextFiles\\peterrabbit.txt') as f:
    doc = nlp(f.read())

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [10]:
# Enter your code here:
for token in list(doc.sents)[2]:
    print(f'{token.text:{12}} {token.pos_:{10}} {token.tag_:{10}} {spacy.explain(token.tag_)}')





           SPACE      _SP        whitespace
They         PRON       PRP        pronoun, personal
lived        VERB       VBD        verb, past tense
with         ADP        IN         conjunction, subordinating or preposition
their        PRON       PRP$       pronoun, possessive
Mother       PROPN      NNP        noun, proper singular
in           ADP        IN         conjunction, subordinating or preposition
a            DET        DT         determiner
sand         NOUN       NN         noun, singular or mass
-            PUNCT      HYPH       punctuation mark, hyphen
bank         NOUN       NN         noun, singular or mass
,            PUNCT      ,          punctuation mark, comma
underneath   ADP        IN         conjunction, subordinating or preposition
the          DET        DT         determiner
root         NOUN       NN         noun, singular or mass
of           ADP        IN         conjunction, subordinating or preposition
a            DET        DT         determine

**3. Provide a frequency list of POS tags from the entire document**

In [11]:
POS_count = doc.count_by(spacy.attrs.POS)


for k,v in sorted(POS_count.items()):
    print(f'{k}. {doc.vocab[k].text:{5}}: {v}')

84. ADJ  : 56
85. ADP  : 124
86. ADV  : 63
87. AUX  : 49
89. CCONJ: 61
90. DET  : 91
92. NOUN : 170
93. NUM  : 8
94. PART : 30
95. PRON : 108
96. PROPN: 73
97. PUNCT: 171
98. SCONJ: 20
100. VERB : 135
103. SPACE: 99


**4. CHALLENGE: What percentage of tokens are nouns?**<br>


In [12]:
percent = 100*POS_count[92]/len(doc)
percent



13.513513513513514

**5. Display the Dependency Parse for the third sentence**

In [15]:
displacy.render(list(doc.sents)[2], style='dep', jupyter=True, options={'distance': 110})

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [19]:
for ent in doc.ents[:3]:
    print(ent.text+' - '+ent.label_+' - '+str(spacy.explain(ent.label_)))


Beatrix Potter - PERSON - People, including fictional
1902 - DATE - Absolute or relative dates or periods
four - CARDINAL - Numerals that do not fall under another type


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [25]:
len([sent for sent in doc.sents])


54

**8. CHALLENGE: How many sentences contain named entities?**

In [29]:
list = [nlp(sent.text) for sent in doc.sents]
ent= [doc for doc in list if doc.ents]

len(ent)

24

**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [32]:
displacy.render(list[0], style='ent', jupyter=True)

### Thank you!