# NLP Basics Assessment

For this assessment we'll be using the short story [_An Occurrence at Owl Creek Bridge_](https://en.wikipedia.org/wiki/An_Occurrence_at_Owl_Creek_Bridge) by Ambrose Bierce (1890). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/375.txt.utf-8).

In [11]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')

In [14]:
# Enter your code here:
with open ('owlcreek.txt') as f:
    doc = nlp(f.read())

In [15]:
# Run this cell to verify it worked:
doc[:36]

AN OCCURRENCE AT OWL CREEK BRIDGE

by Ambrose Bierce

I

A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.  

**2. How many tokens are contained in the file?**

In [16]:
len(doc)

4835

**3. How many sentences are contained in the file?**<br>HINT: You'll want to build a list first!

In [17]:
sents = []
for sentence in doc.sents:
    sents.append(sentence)
len(sents)

204

**4. Print the second sentence in the document**<br> HINT: Indexing starts at zero, and the title counts as the first sentence.

In [18]:
sents[2]

A rope closely encircled his
neck.  

** 5. For each token in the sentence above, print its `text`, `POS` tag, `dep` tag and `lemma`<br>
CHALLENGE: Have values line up in columns in the print output.**

In [19]:
# NORMAL SOLUTION:
for token in sents[2]:
    print(token.text, token.pos, token.dep, token.lemma)

A 90 415 11901859001352538922
rope 92 429 1669970585553776194
closely 86 400 9696970313201087903
encircled 100 8206900633647566924 11949761049292768571
his 95 440 2661093235354845946

 103 414 962983613142996970
neck 92 416 8732108505081431184
. 97 445 12646065887601541794
  103 414 8532415787641010193


In [20]:
# CHALLENGE SOLUTION:
for token in sents[2]:
    print(f'{token.text:<{10}} {token.pos:{5}} {token.dep:<{20}} {token.lemma:<{22}} {token.lemma_:{10}}')

A             90 415                  11901859001352538922   a         
rope          92 429                  1669970585553776194    rope      
closely       86 400                  9696970313201087903    closely   
encircled    100 8206900633647566924  11949761049292768571   encircle  
his           95 440                  2661093235354845946    his       

            103 414                  962983613142996970     
         
neck          92 416                  8732108505081431184    neck      
.             97 445                  12646065887601541794   .         
             103 414                  8532415787641010193              


**6. Write a matcher called 'Swimming' that finds both occurrences of the phrase "swimming vigorously" in the text**<br>
HINT: You should include an `'IS_SPACE': True` pattern between the two words!

In [25]:
# Import the Matcher library:
from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)
matcher

<spacy.matcher.matcher.Matcher at 0x130abc6add0>

In [35]:
# Create a pattern and add it to the Matcher
pattern = [{'LOWER': 'swimming'}, {'IS_SPACE': True}, {'LOWER': 'vigorously'}]
matcher.add('SwimmingVigorously', [pattern])

In [36]:
# Create a list of matches called "found_matches" and print the list:
found_matches = matcher(doc)
print(found_matches)

[(13245044497498710760, 1274, 1277), (13245044497498710760, 3609, 3612)]


**7. Print the text surrounding each found match**

In [37]:
match_id, start_match1, end_match1 = found_matches[0]
span = doc[start_match1-9:end_match1+13]             
print(span.text)

By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home


In [39]:
match_id, start_match2, end_match2 = found_matches[1]
span = doc[start_match2-7:end_match1+5]             
print(span.text)




**EXTRA CREDIT:<br>Print the *sentence* that contains each found match**

In [40]:
for sent in sents:
    if start_match1<sent.end:
        print(sent)
        break

By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home.  


In [41]:
for sent in sents:
    if start_match2<sent.end:
        print(sent)
        break

The hunted man saw all this over his shoulder; he was now swimming
vigorously with the current.  


### Great Job!