___

# NLP Basics

We we'll be using the short story [_An Occurrence at Owl Creek Bridge_](https://en.wikipedia.org/wiki/An_Occurrence_at_Owl_Creek_Bridge) by Ambrose Bierce (1890). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/375.txt.utf-8).

In [1]:
import spacy
nlp = spacy.load('en_core_web_sm')

**1. Create a Doc object from the file `owlcreek.txt`**<br>

In [2]:
with open('files/owlcreek.txt') as f:
    doc = nlp(f.read())

In [3]:
doc[:36]

AN OCCURRENCE AT OWL CREEK BRIDGE

by Ambrose Bierce

I

A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.  

**2. How many tokens are contained in the file?**

In [4]:
len(doc)

4835

**3. How many sentences are contained in the file?**<br>

In [5]:
list = []
for sent in doc.sents:
    list.append(sent)
print(len(list))

204


**4. Print the second sentence in the document**<br>

In [6]:
list[1]

The man's hands were behind
his back, the wrists bound with a cord.  


**5. For each token in the sentence above, print its `text`, `POS` tag, `dep` tag, and `lemma`, and ensure that the values are aligned in columns in the print output***

In [7]:
for tok in list[1]:
    print(tok.text, tok.pos_, tok.dep_, tok.lemma_)

The DET det the
man NOUN poss man
's PART case 's
hands NOUN nsubj hand
were AUX ROOT be
behind ADP prep behind

 SPACE dep 

his PRON poss his
back NOUN pobj back
, PUNCT punct ,
the DET det the
wrists NOUN appos wrist
bound VERB acl bind
with ADP prep with
a DET det a
cord NOUN pobj cord
. PUNCT punct .
  SPACE dep  


In [8]:
for tok in list[1]:
    print(f'{tok.text:{15}} {tok.pos_:{5}} {tok.dep_:{10}} {tok.lemma_:{15}}')

The             DET   det        the            
man             NOUN  poss       man            
's              PART  case       's             
hands           NOUN  nsubj      hand           
were            AUX   ROOT       be             
behind          ADP   prep       behind         

               SPACE dep        
              
his             PRON  poss       his            
back            NOUN  pobj       back           
,               PUNCT punct      ,              
the             DET   det        the            
wrists          NOUN  appos      wrist          
bound           VERB  acl        bind           
with            ADP   prep       with           
a               DET   det        a              
cord            NOUN  pobj       cord           
.               PUNCT punct      .              
                SPACE dep                       


**6. Write a matcher called 'Swimming' that finds both occurrences of the phrase "swimming vigorously" in the text**<br>

In [9]:
# Import the Matcher:

from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)

In [10]:
# Create a pattern and add it to the matcher:
pattern = [{'LOWER': 'swimming'},{'IS_SPACE': True},{'LOWER':'vigorously'}]


matcher.add('Swimming', [pattern])

In [11]:
# A list of matches:

match = matcher(doc)
print(match)

[(12881893835109366681, 1274, 1277), (12881893835109366681, 3609, 3612)]


**7. Print the text surrounding each found match**

In [12]:
print(doc[1270:1285])

the bullets and, swimming
vigorously, reach the bank, take to the


In [13]:
print(doc[3606:3615])

he was now swimming
vigorously with the current


**<br>Print the *sentence* that contains each found match**

In [14]:
for sent in doc[1270:1285].sents:
    print(sent)

By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home.  


In [15]:
for sent in doc[3606:3615].sents:
    print(sent)

The hunted man saw all this over his shoulder; he was now swimming
vigorously with the current.  
