# NLP Basics Assessment

For this assessment we'll be using the short story [_An Occurrence at Owl Creek Bridge_](https://en.wikipedia.org/wiki/An_Occurrence_at_Owl_Creek_Bridge) by Ambrose Bierce (1890). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/375.txt.utf-8).

In [2]:
# DRIVE
from google.colab import drive
drive.mount('/content/drive')

data_path = 'drive/MyDrive/Colab Notebooks/NLP Course/data'

Mounted at /content/drive


In [1]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')

**1. Create a Doc object from the file `owlcreek.txt`**<br>
> HINT: Use `with open('../TextFiles/owlcreek.txt') as f:`

In [5]:
# Enter your code here:
with open(f'{data_path}/owlcreek.txt') as f:
  doc = nlp(f.read())


In [6]:
# Run this cell to verify it worked:

doc[:36]

AN OCCURRENCE AT OWL CREEK BRIDGE

by Ambrose Bierce

I

A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.  

**2. How many tokens are contained in the file?**

In [8]:
len(doc)

4835

4833

**3. How many sentences are contained in the file?**<br>HINT: You'll want to build a list first!

In [14]:
sentences = []
for sentence in doc.sents:
  sentences.append(sentence)
len(sentences)

249

211

**4. Print the second sentence in the document**<br> HINT: Indexing starts at zero, and the title counts as the first sentence.

In [18]:
sentences[2]

A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.  

A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.  


** 5. For each token in the sentence above, print its `text`, `POS` tag, `dep` tag and `lemma`<br>
CHALLENGE: Have values line up in columns in the print output.**

In [25]:
def show_info(doc):
    for token in doc:
        print(f'{token.text:12} {token.pos_:9} {token.dep_:<10} {token.lemma_}')
      


show_info(sentences[2])

A            DET       det        a
man          NOUN      nsubj      man
stood        VERB      ROOT       stand
upon         SCONJ     prep       upon
a            DET       det        a
railroad     NOUN      compound   railroad
bridge       NOUN      pobj       bridge
in           ADP       prep       in
northern     ADJ       amod       northern
Alabama      PROPN     pobj       Alabama
,            PUNCT     punct      ,
looking      VERB      advcl      look
down         ADV       prt        down

            SPACE                

into         ADP       prep       into
the          DET       det        the
swift        ADJ       amod       swift
water        NOUN      pobj       water
twenty       NUM       nummod     twenty
feet         NOUN      npadvmod   foot
below        ADV       advmod     below
.            PUNCT     punct      .
             SPACE                 


In [None]:
# NORMAL SOLUTION:



A DET det a
man NOUN nsubj man
stood VERB ROOT stand
upon ADP prep upon
a DET det a
railroad NOUN compound railroad
bridge NOUN pobj bridge
in ADP prep in
northern ADJ amod northern
Alabama PROPN pobj alabama
, PUNCT punct ,
looking VERB advcl look
down PART prt down

 SPACE  

into ADP prep into
the DET det the
swift ADJ amod swift
water NOUN pobj water
twenty NUM nummod twenty
feet NOUN npadvmod foot
below ADV advmod below
. PUNCT punct .
  SPACE   


In [None]:
# CHALLENGE SOLUTION:



A               DET   det        a              
man             NOUN  nsubj      man            
stood           VERB  ROOT       stand          
upon            ADP   prep       upon           
a               DET   det        a              
railroad        NOUN  compound   railroad       
bridge          NOUN  pobj       bridge         
in              ADP   prep       in             
northern        ADJ   amod       northern       
Alabama         PROPN pobj       alabama        
,               PUNCT punct      ,              
looking         VERB  advcl      look           
down            PART  prt        down           

               SPACE            
              
into            ADP   prep       into           
the             DET   det        the            
swift           ADJ   amod       swift          
water           NOUN  pobj       water          
twenty          NUM   nummod     twenty         
feet            NOUN  npadvmod   foot           
below           ADV 

**6. Write a matcher called 'Swimming' that finds both occurrences of the phrase "swimming vigorously" in the text**<br>
HINT: You should include an `'IS_SPACE': True` pattern between the two words!

In [34]:
def print_matches(doc, matcher, surrounding_tokens=0):
  found_matches = matcher(doc)
  for match_id, start, end in found_matches:
      string_id = nlp.vocab.strings[match_id]  # get string representation
      span = doc[start-surrounding_tokens:end+surrounding_tokens]                    # get the matched span
      print(f"{match_id:<24}{string_id:16}{start:7}{end:7} {span.text}")

In [50]:
# Import the Matcher library:

from spacy.matcher import Matcher


In [51]:
matcher = Matcher(nlp.vocab)
pattern1 = [{'LOWER':'swimming vigorously'}]
matcher.add('SwimmingVig', None, pattern1)
# matcher(doc)
print_matches(doc, matcher)

In [52]:
matcher = Matcher(nlp.vocab)
pattern1 = [{'LOWER':'swimming'},{'IS_SPACE':True},{'LOWER':'vigorously'}]
matcher.add('SwimmingVig', None, pattern1)
print_matches(doc, matcher)

15333998451617592167    SwimmingVig        1274   1277 swimming
vigorously
15333998451617592167    SwimmingVig        3609   3612 swimming
vigorously


In [None]:
# Create a list of matches called "found_matches" and print the list:




[(12881893835109366681, 1274, 1277), (12881893835109366681, 3607, 3610)]


**7. Print the text surrounding each found match**

In [36]:
print_matches(doc, matcher, 5)

15333998451617592167    SwimmingVig        1274   1277 evade the bullets and, swimming
vigorously, reach the bank,
15333998451617592167    SwimmingVig        3609   3612 shoulder; he was now swimming
vigorously with the current.  


By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home


over his shoulder; he was now swimming
vigorously with the current.  


**EXTRA CREDIT:<br>Print the *sentence* that contains each found match**

In [57]:
matcher = Matcher(nlp.vocab)
pattern1 = [{'LOWER':'swimming'},{'IS_SPACE':True},{'LOWER':'vigorously'}]
matcher.add('SwimmingVig', None, pattern1)


In [60]:
def print_sentence_of_matches(doc, matcher):
  matches = matcher(doc)
  for _, start, _ in matches:
    token = doc[start]
    print(token.sent)
    print()

print_sentence_of_matches(doc, matcher)

By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home.  

The hunted man saw all this over his shoulder; he was now swimming
vigorously with the current.  



By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home.  


The hunted man saw all this over his shoulder; he was now swimming
vigorously with the current.  


### Great Job!