### 0. Notebook setup

Setup spacy

In [5]:
import spacy
nlp = spacy.load('en_core_web_sm')

Set path vars

In [6]:
texts_path  = './content/UPDATED_NLP_COURSE/TextFiles/'

### 1. Load Owl Creek text

In [7]:
with open(texts_path + 'owlcreek.txt') as f:
    doc = nlp(f.read())

In [8]:
# Verify it worked
doc[:36]

AN OCCURRENCE AT OWL CREEK BRIDGE

by Ambrose Bierce

I

A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.  

### 2. Number tokens in document

In [9]:
len(doc)

4833

### 3. Number sentences in document

In [10]:
# Can't get full length of generator directly
list_sent = [sent for sent in doc.sents]
len(list_sent)

211

### 4. Second sentence

In [11]:
list_sent[1]

A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.  

### 5. Print Characteristics

Normal (no formatting)

In [12]:
for word in list_sent[1]:
    print (f'{word.text} {word.pos_} {word.dep_} {word.lemma_}')

A DET det a
man NOUN nsubj man
stood VERB ROOT stand
upon ADP prep upon
a DET det a
railroad NOUN compound railroad
bridge NOUN pobj bridge
in ADP prep in
northern ADJ amod northern
Alabama PROPN pobj alabama
, PUNCT punct ,
looking VERB advcl look
down PART prt down

 SPACE  

into ADP prep into
the DET det the
swift ADJ amod swift
water NOUN pobj water
twenty NUM nummod twenty
feet NOUN npadvmod foot
below ADV advmod below
. PUNCT punct .
  SPACE   


"Challenge" (formatting)

In [13]:
for word in list_sent[1]:
    print (f'{word.text:{15}} {word.pos_:{5}} {word.dep_:{10}} {word.lemma_:{10}}')

A               DET   det        a         
man             NOUN  nsubj      man       
stood           VERB  ROOT       stand     
upon            ADP   prep       upon      
a               DET   det        a         
railroad        NOUN  compound   railroad  
bridge          NOUN  pobj       bridge    
in              ADP   prep       in        
northern        ADJ   amod       northern  
Alabama         PROPN pobj       alabama   
,               PUNCT punct      ,         
looking         VERB  advcl      look      
down            PART  prt        down      

               SPACE            
         
into            ADP   prep       into      
the             DET   det        the       
swift           ADJ   amod       swift     
water           NOUN  pobj       water     
twenty          NUM   nummod     twenty    
feet            NOUN  npadvmod   foot      
below           ADV   advmod     below     
.               PUNCT punct      .         
                SPACE           

### 6. "Swimming" matcher

Import spacy classes

In [14]:
from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)

In [15]:
# Creating the patterns:
swimming_pattern = [{'LOWER': 'swimming'}, {'IS_SPACE': True}, {'LOWER': 'vigorously'}]

In [16]:
matcher.add('Swimming', None, swimming_pattern)

In [17]:
found_matches = matcher(doc)
print(found_matches)

[(12881893835109366681, 1274, 1277), (12881893835109366681, 3607, 3610)]


### 7. Match context

Just an arbitrary amount of context (i.e. not sentence)

In [18]:
first_match = found_matches[0]
doc[first_match[1]-9:first_match[2]+13]

By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home

In [19]:
second_match = found_matches[1]
doc[second_match[1]-7:second_match[2]+5]

over his shoulder; he was now swimming
vigorously with the current.  

Now, get the sentence

In [20]:
def get_containing_sentence(pos, document):
    sent_text = ''
    for sent in doc.sents:
        if sent.start <= pos and pos < sent.start + len(sent):
            return sent

In [21]:
get_containing_sentence(found_matches[0][1], doc)

By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home.  

In [22]:
get_containing_sentence(found_matches[1][1], doc)

The hunted man saw all this over his shoulder; he was now swimming
vigorously with the current.  