# NLP Basics Assessment

For this assessment we'll be using the short story [_An Occurrence at Owl Creek Bridge_](https://en.wikipedia.org/wiki/An_Occurrence_at_Owl_Creek_Bridge) by Ambrose Bierce (1890). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/375.txt.utf-8).

In [1]:
import spacy

nlp = spacy.load('en_core_web_sm')

**1. Create a Doc object from the file `owlcreek.txt`**<br>
> HINT: Use `with open('../TextFiles/owlcreek.txt') as f:`

In [347]:

file_path = r'C:\Users\iamke\OneDrive\Desktop\NLP\Owlcreek.txt'

# Try to open the file and print its contents
try:
    with open(file_path, 'r', encoding='utf-8') as f:
        text = f.read()
        doc = nlp(text)
        # print(text)
except FileNotFoundError:
    print(f"Could not find file at path: {file_path}")


In [348]:
(doc[:36])

An Occurrence at Owl Creek Bridge

by Ambrose Bierce

THE MILLENNIUM FULCRUM EDITION, 1988




I


A man stood upon a railroad bridge in northern Alabama, looking down
into the

**2. How many tokens are contained in the file?**

In [349]:
print(len(doc))

4682


**3. How many sentences are contained in the file?**<br>HINT: You'll want to build a list first!

In [350]:
sentences = list(doc.sents)
print(len(sentences))

204


**4. Print the second sentence in the document**<br> HINT: Indexing starts at zero, and the title counts as the first sentence.

In [351]:
print(sentences[1])

I


A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.


 5. For each token in the sentence above, print its `text`, `POS` tag, `dep` tag and `lemma`<br>
CHALLENGE: Have values line up in columns in the print output.**

In [352]:
 if len(sentences) > 1:
    
        second_sentence = sentences[2]
        doc = nlp(second_sentence.text)
        
        for token in doc:
            print(f"{token.text} {token.pos_} {token.dep_} {token.lemma_}")

The DET det the
man NOUN poss man
’s PART case ’s
hands NOUN nsubj hand
were AUX ROOT be
behind ADP prep behind
his PRON poss his

 SPACE dep 

back NOUN pobj back
, PUNCT punct ,
the DET det the
wrists NOUN npadvmod wrist
bound VERB acl bind
with ADP prep with
a DET det a
cord NOUN pobj cord
. PUNCT punct .


In [353]:
for token in second_sentence:
    print(f"{token.text:<15} {token.pos_:<10} {token.dep_:<10} {token.lemma_:<15}")

The             DET        det        the            
man             NOUN       poss       man            
’s              PART       case       ’s             
hands           NOUN       nsubj      hand           
were            AUX        ROOT       be             
behind          ADP        prep       behind         
his             PRON       poss       his            

               SPACE      dep        
              
back            NOUN       pobj       back           
,               PUNCT      punct      ,              
the             DET        det        the            
wrists          NOUN       appos      wrist          
bound           VERB       acl        bind           
with            ADP        prep       with           
a               DET        det        a              
cord            NOUN       pobj       cord           
.               PUNCT      punct      .              


**6. Write a matcher called 'Swimming' that finds both occurrences of the phrase "swimming vigorously" in the text**<br>
HINT: You should include an `'IS_SPACE': True` pattern between the two words!

In [354]:
# Import the Matcher library:

from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)

In [365]:
pattern = [
       
        {"IS_SPACE": True, "OP": "*"},
         {"LOWER": "swimming"},
        {"LOWER": "vigorously"}
    ]

matcher.add("Swimming", [pattern])
matches = matcher(doc)
found_matches = []

for match_id, start, end in matches:
        matched_span = doc[start:end]
        found_matches.append(matched_span.text)

In [366]:
found_matches = [doc[start:end].text for match_id, start, end in matches]
print(found_matches)

['swimming\nvigorously', 'swimming\nvigorously']


**7. Print the text surrounding each found match**

In [367]:
context_size = 30
for match_id, start, end in matches:
    start_context = max(0, start - context_size)
    end_context = min(len(doc), end + context_size)
    context = doc[start_context:end_context]
    print(f"Context around match '{doc[start:end]}':")
    print(context.text)
    print("\n" + "="*50 + "\n")

Context around match 'swimming
vigorously':
my hands,” he thought, “I might throw off the noose and spring
into the stream. By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home. My
home, thank God, is as yet outside their lines; my wife


Context around match 'swimming
vigorously':
thrust into their sockets. The two
sentinels fired again, independently and ineffectually.

The hunted man saw all this over his shoulder; he was now swimming
vigorously with the current. His brain was as energetic as his arms and
legs; he thought with the rapidity of lightning:

“The officer,”




In [368]:
for match_id, start, end in matches:
    start_char = max(0, doc[start].idx - context_size)
    end_char = min(len(doc.text), doc[end - 1].idx + len(doc[end - 1].text) + context_size)
    context = doc.text[start_char:end_char]
    match_text = doc[start:end].text
    print(f"Context around match '{match_text}':")
    print(context.replace(match_text, f"[{match_text}]"))
    print("\n" + "="*50 + "\n")

Context around match 'swimming
vigorously':
 could evade the bullets and, [swimming
vigorously], reach the bank, take to the 


Context around match 'swimming
vigorously':
over his shoulder; he was now [swimming
vigorously] with the current. His brain w


