# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [1]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [2]:

file_path = '../TextFiles/peterrabbit.txt'

with open(file_path, 'r') as f:
    text = f.read()
    doc = nlp(text)

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [3]:
sentences = list(doc.sents)

In [4]:
sentence = sentences[2].text
print(sentence)

They lived with their Mother in a sand-bank, underneath the root of a
very big fir-tree.




In [5]:
# Enter your code here:
sentence = sentences[2]
for token in sentence:
    print(f"Token: {token.text}")
    print(f"POS Tag: {token.pos_}")
    print(f"Fine-grained TAG: {token.tag_}")
    print(f"Description: {spacy.explain(token.tag_)}")
    print('---')


Token: They
POS Tag: PRON
Fine-grained TAG: PRP
Description: pronoun, personal
---
Token: lived
POS Tag: VERB
Fine-grained TAG: VBD
Description: verb, past tense
---
Token: with
POS Tag: ADP
Fine-grained TAG: IN
Description: conjunction, subordinating or preposition
---
Token: their
POS Tag: PRON
Fine-grained TAG: PRP$
Description: pronoun, possessive
---
Token: Mother
POS Tag: NOUN
Fine-grained TAG: NN
Description: noun, singular or mass
---
Token: in
POS Tag: ADP
Fine-grained TAG: IN
Description: conjunction, subordinating or preposition
---
Token: a
POS Tag: DET
Fine-grained TAG: DT
Description: determiner
---
Token: sand
POS Tag: NOUN
Fine-grained TAG: NN
Description: noun, singular or mass
---
Token: -
POS Tag: PUNCT
Fine-grained TAG: HYPH
Description: punctuation mark, hyphen
---
Token: bank
POS Tag: NOUN
Fine-grained TAG: NN
Description: noun, singular or mass
---
Token: ,
POS Tag: PUNCT
Fine-grained TAG: ,
Description: punctuation mark, comma
---
Token: underneath
POS Tag: ADP


**3. Provide a frequency list of POS tags from the entire document**

In [7]:
from collections import Counter

In [8]:
pos_counts = Counter(token.pos_ for token in doc)

# Now, let's print the frequency list in the desired format
for i, (pos, count) in enumerate(sorted(pos_counts.items()), start=83):
    print(f"{i}. {pos} : {count}")

83. ADJ : 53
84. ADP : 125
85. ADV : 63
86. AUX : 49
87. CCONJ : 61
88. DET : 90
89. NOUN : 172
90. NUM : 9
91. PART : 28
92. PRON : 110
93. PROPN : 74
94. PUNCT : 171
95. SCONJ : 19
96. SPACE : 99
97. VERB : 135


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 91

In [9]:
pos_counts = Counter(token.pos_ for token in doc)
total_tokens = len(doc)
noun_tokens = pos_counts['NOUN']
noun_percentage = (noun_tokens / total_tokens) * 100
print(f"Percentage of tokens that are nouns: {noun_percentage:.2f}%")

Percentage of tokens that are nouns: 13.67%


**5. Display the Dependency Parse for the third sentence**

In [10]:
from spacy import displacy
third_sentence = list(doc.sents)[2]
displacy.render(third_sentence, style='dep', jupyter=True, options={'distance': 100})

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [15]:
named_entities = list(doc.ents)

for entity in named_entities[:2]:
    print(f"Entity: {entity.text}, Label: {entity.label_}")

Entity: The Tale of Peter Rabbit, Label: WORK_OF_ART
Entity: Beatrix Potter, Label: PERSON


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [12]:
num_sentences = len(list(doc.sents))
print(f"The document contains {num_sentences} sentences.")

The document contains 55 sentences.


**8. CHALLENGE: How many sentences contain named entities?**

In [16]:
sentences_with_entities = 0
list_of_sents = []

for sentence in doc.sents:
    if sentence.ents:
        list_of_sents.append(sentence)
        sentences_with_entities += 1

print(f"Number of sentences containing named entities: {sentences_with_entities}")

Number of sentences containing named entities: 35


49

**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [17]:
first_sentence_with_entities = list_of_sents[0]

displacy.render(first_sentence_with_entities, style='ent', jupyter=True)