# NLP Basics Assessment

For this assessment we'll be using the short story [_An Occurrence at Owl Creek Bridge_](https://en.wikipedia.org/wiki/An_Occurrence_at_Owl_Creek_Bridge) by Ambrose Bierce (1890). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/375.txt.utf-8).

In [1]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')

**1. Create a Doc object from the file `owlcreek.txt`**<br>
> HINT: Use `with open('../TextFiles/owlcreek.txt') as f:`

In [2]:
# Enter your code here:
   
with open('Story.txt', "r") as txt:
    doc = nlp(txt.read())

In [3]:
# Run this cell to verify it worked:

doc[:36]

An Occurrence at Owl Creek Bridge.



I


A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below. The man’s hands

**2. How many tokens are contained in the file?**

In [4]:
# Enter your code here

len(doc)

8343

**3. How many sentences are contained in the file?**<br>HINT: You'll want to build a list first!

In [5]:
# Enter your code here

sentences = [sent for sent in doc.sents]
len(sentences)

313

**4. Print the second sentence in the document**<br> HINT: Indexing starts at zero, and the title counts as the first sentence.

In [6]:
# Enter your code here

s2 = sentences[2]
print(s2)

The man’s hands were behind his
back, the wrists bound with a cord.


**5. For each token in the sentence above, print its `text`, `POS` tag, `dep` tag and `lemma`<br>
CHALLENGE: Have values line up in columns in the print output.**

In [7]:
# Enter your code here 
for token in s2:
    print(f'{token.text:{10}} {token.pos:<{6}} {token.dep:<{22}} {token.lemma:<{24}} ')

The        90     415                    7425985699627899538      
man        92     440                    3104811030673030468      
’s         94     8110129090154140942    614914527630368944       
hands      92     429                    10690717480206833971     
were       87     8206900633647566924    10382539506755952630     
behind     85     443                    9368086581607646285      
his        95     440                    2661093235354845946      

          103    414                    962983613142996970       
back       92     439                    15255859468896132977     
,          97     445                    2593208677638477497      
the        90     415                    7425985699627899538      
wrists     92     403                    40049004327531306        
bound      100    451                    16578919470474021089     
with       85     443                    12510949447758279278     
a          90     415                    11901859001352538922 

**6. Write a matcher called 'Swimming' that finds both occurrences of the phrase "swimming vigorously" in the text**<br>
HINT: You should include an `'IS_SPACE': True` pattern between the two words!

In [8]:
# Import the Matcher library:
# Enter your code here

from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)

In [9]:
# Create a pattern and add it to matcher:

pattern1 = [{'LOWER': 'swimming'}, {'IS_SPACE': True}, {'LOWER': 'vigorously'}]

matcher.add('Swimming', [pattern1], on_match = None)

In [10]:
# Create a list of matches called "found_matches" and print the list:

found_matches = matcher(doc)
print(found_matches)  

[(12881893835109366681, 1232, 1235), (12881893835109366681, 3490, 3493)]


**7. Print the text surrounding each found match**

In [11]:
# Enter your code here

doc[found_matches[0][1] - 9:found_matches[0][2] + 14]

By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home.

In [12]:
# Enter your code here

doc[found_matches[1][1] - 7:found_matches[1][2] + 4] 

#Hint
# over his shoulder; he was now swimming
# vigorously with the current.

over his shoulder; he was now swimming
vigorously with the current.

**EXTRA CREDIT:<br>Print the *sentence* that contains each found match**

In [13]:
# Enter your code here

for sent in sentences:
    # checking if the result is empty list => no mapping, which means the sentence does not include the pattern
    if matcher(sent) == []:
        continue
    # if the result is not empty => print the sentence as it includes the pattern
    else: 
        print(sent, "\n")

By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home. 

The hunted man saw all this over his shoulder; he was now swimming
vigorously with the current. 



### Great Job!