# Mounting Google Drive

In [1]:
from google.colab import drive
drive.mount("/content/Drive")

base_path = "/content/Drive/MyDrive/NLP-Course/02-Parts-of-Speech-Tagging/"

Drive already mounted at /content/Drive; to attempt to forcibly remount, call drive.mount("/content/Drive", force_remount=True).


# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [2]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy
from prettytable import PrettyTable

**1. Create a Doc object from the file `peterrabbit.txt`**<br>


In [3]:
f = open(base_path + "peterrabbit.txt", "r")
doc = nlp(f.read())
f.close()
doc

The Tale of Peter Rabbit, by Beatrix Potter (1902).

Once upon a time there were four little Rabbits, and their names
were--

          Flopsy,
       Mopsy,
   Cotton-tail,
and Peter.

They lived with their Mother in a sand-bank, underneath the root of a
very big fir-tree.

'Now my dears,' said old Mrs. Rabbit one morning, 'you may go into
the fields or down the lane, but don't go into Mr. McGregor's garden:
your Father had an accident there; he was put in a pie by Mrs.
McGregor.'

'Now run along, and don't get into mischief. I am going out.'

Then old Mrs. Rabbit took a basket and her umbrella, and went through
the wood to the baker's. She bought a loaf of brown bread and five
currant buns.

Flopsy, Mopsy, and Cottontail, who were good little bunnies, went
down the lane to gather blackberries:

But Peter, who was very naughty, ran straight away to Mr. McGregor's
garden, and squeezed under the gate!

First he ate some lettuces and some French beans; and then he ate
some radishes;

And

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [4]:
t = PrettyTable(["Token Text", "POS Tag", "DEP Tag", "DEP Details"])

for token in list(doc.sents)[3]:
    t.add_row([token.text, token.pos_, token.dep_, spacy.explain(token.dep_)])

print(t)

+------------+---------+----------+------------------------+
| Token Text | POS Tag | DEP Tag  |      DEP Details       |
+------------+---------+----------+------------------------+
|    They    |   PRON  |  nsubj   |    nominal subject     |
|   lived    |   VERB  |   ROOT   |          None          |
|    with    |   ADP   |   prep   | prepositional modifier |
|   their    |   DET   |   poss   |  possession modifier   |
|   Mother   |  PROPN  |   pobj   | object of preposition  |
|     in     |   ADP   |   prep   | prepositional modifier |
|     a      |   DET   |   det    |       determiner       |
|    sand    |   NOUN  | compound |        compound        |
|     -      |  PUNCT  |  punct   |      punctuation       |
|    bank    |   NOUN  |   pobj   | object of preposition  |
|     ,      |  PUNCT  |  punct   |      punctuation       |
| underneath |   ADP   |   prep   | prepositional modifier |
|    the     |   DET   |   det    |       determiner       |
|    root    |   NOUN  |

**3. Provide a frequency list of POS tags from the entire document**

In [5]:
POS_counts = doc.count_by(spacy.attrs.POS)

t = PrettyTable(["POS Id", "POS Tag", "Count"])

for k, v in sorted(POS_counts.items()):
    t.add_row([k, doc.vocab[k].text, v])

print(t)

+--------+---------+-------+
| POS Id | POS Tag | Count |
+--------+---------+-------+
|   84   |   ADJ   |   50  |
|   85   |   ADP   |  123  |
|   86   |   ADV   |   67  |
|   87   |   AUX   |   48  |
|   89   |  CCONJ  |   61  |
|   90   |   DET   |  118  |
|   92   |   NOUN  |  171  |
|   93   |   NUM   |   8   |
|   94   |   PART  |   29  |
|   95   |   PRON  |   81  |
|   96   |  PROPN  |   73  |
|   97   |  PUNCT  |  174  |
|   98   |  SCONJ  |   20  |
|  100   |   VERB  |  136  |
|  103   |  SPACE  |   99  |
+--------+---------+-------+


**4. CHALLENGE: What percentage of tokens are nouns?**<br>


In [6]:
noun_count = POS_counts[92]
total_pos_count = sum(POS_counts.values())

noun_percentage = round((noun_count / total_pos_count) * 100, 2)
print("Noun Percentage :", noun_percentage, "%")

Noun Percentage : 13.59 %


**5. Display the Dependency Parse for the third sentence**

In [7]:
displacy.render(list(doc.sents)[3], style="dep", jupyter=True, options={})

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit***

In [8]:
t = PrettyTable(["Entity Text", "Entity Label", "Entity Details"])

for ent in doc.ents[:2]:
    t.add_row([ent.text, ent.label_, str(spacy.explain(ent.label_))])

print(t)

+----------------+--------------+-----------------------------+
|  Entity Text   | Entity Label |        Entity Details       |
+----------------+--------------+-----------------------------+
|  Peter Rabbit  |    PERSON    | People, including fictional |
| Beatrix Potter |    PERSON    | People, including fictional |
+----------------+--------------+-----------------------------+


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [9]:
print(len(list(doc.sents)))

68


**8. CHALLENGE: How many sentences contain named entities?**

In [10]:
list_of_sents = [nlp(sent.text) for sent in doc.sents]
list_of_ners = [doc for doc in list_of_sents if doc.ents]

len(list_of_ners)

40

**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [11]:
displacy.render(list_of_sents[0], style='ent', jupyter=True)