# Mounting the Google Drive

In [1]:
from google.colab import drive
drive.mount("/content/Drive")

base_path = "/content/Drive/MyDrive/NLP-Course/01-NLP-Python-Basics/"

Drive already mounted at /content/Drive; to attempt to forcibly remount, call drive.mount("/content/Drive", force_remount=True).


# NLP Basics Assessment

In [2]:
# Standard Imports
import spacy
nlp = spacy.load("en_core_web_sm")

## 1. Create a Doc object from the file

In [3]:
f = open(base_path + "owlcreek.txt", "r")
doc = nlp(f.read())

In [4]:
doc[:36]

AN OCCURRENCE AT OWL CREEK BRIDGE

by Ambrose Bierce

I

A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.  

## 2. How many tokens are contained in the file

In [5]:
len(doc)

4835

## 3. How many sentences are contained in the file?

In [6]:
sents = [sent for sent in doc.sents]
len(sents)

249

## 4. Print the second sentence in the document

In [7]:
sents[2].text

'A man stood upon a railroad bridge in northern Alabama, looking down\ninto the swift water twenty feet below.  '

## 5. For each token in the sentence above, print its `text`, `POS` tag, `dep` and `lemma`

In [8]:
from prettytable import PrettyTable

t = PrettyTable(["Text", "Part of Speech", "Syntactic Dependency", "Lemma"])

for token in sents[2]:
  t.add_row([token.text, token.pos_, token.dep_, token.lemma_])

print(t)

+----------+----------------+----------------------+----------+
|   Text   | Part of Speech | Syntactic Dependency |  Lemma   |
+----------+----------------+----------------------+----------+
|    A     |      DET       |         det          |    a     |
|   man    |      NOUN      |        nsubj         |   man    |
|  stood   |      VERB      |         ROOT         |  stand   |
|   upon   |     SCONJ      |         prep         |   upon   |
|    a     |      DET       |         det          |    a     |
| railroad |      NOUN      |       compound       | railroad |
|  bridge  |      NOUN      |         pobj         |  bridge  |
|    in    |      ADP       |         prep         |    in    |
| northern |      ADJ       |         amod         | northern |
| Alabama  |     PROPN      |         pobj         | Alabama  |
|    ,     |     PUNCT      |        punct         |    ,     |
| looking  |      VERB      |        advcl         |   look   |
|   down   |      ADV       |         pr

## 6. Write a matcher called "Swimming" that finds both occurrneces of the phrase "swimming vigorously" in the text

In [9]:
# Import the Matcher library:
from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)

In [10]:
# Create a pattern and add it to matcher:
pattern = [{'LOWER': 'swimming'}, {'IS_SPACE': True, 'OP':'*'}, {'LOWER': 'vigorously'}]

matcher.add('Swimming', None, pattern)

In [11]:
t = PrettyTable(["Match Id", "String Id", "Start", "End", "Matched Text"])

# Finding the matches in the doc for the specified pattern
found_matches = matcher(doc)

for match_id, start, end in found_matches:
  string_id = nlp.vocab.strings[match_id]  # get string representation
  span = doc[start:end]                    # get the matched span
  t.add_row([match_id, string_id, start, end, span.text])

print(t)

+----------------------+-----------+-------+------+--------------+
|       Match Id       | String Id | Start | End  | Matched Text |
+----------------------+-----------+-------+------+--------------+
| 12881893835109366681 |  Swimming |  1274 | 1277 |   swimming   |
|                      |           |       |      |  vigorously  |
| 12881893835109366681 |  Swimming |  3609 | 3612 |   swimming   |
|                      |           |       |      |  vigorously  |
+----------------------+-----------+-------+------+--------------+


## 7. Printing the text surrounding each found match


In [12]:
for match_id, start, end in found_matches:
  span = doc[start-5:end+5]
  print(span, "\n\n")

evade the bullets and, swimming
vigorously, reach the bank, 


shoulder; he was now swimming
vigorously with the current.   




## 8. Print the *sentence* that contains each found match

In [13]:
for sent in sents:
    if found_matches[0][1] < sent.end:
        print(sent)
        break

By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home.  


In [14]:
for sent in sents:
    if found_matches[1][1] < sent.end:
        print(sent)
        break

The hunted man saw all this over his shoulder; he was now swimming
vigorously with the current.  
