<a href="https://colab.research.google.com/github/chitreshkr/AI-Workshop/blob/master/07_NLP_Basics_Assessment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NLP Basics Assessment

For this assessment we'll be using the short story [_An Occurrence at Owl Creek Bridge_](https://en.wikipedia.org/wiki/An_Occurrence_at_Owl_Creek_Bridge) by Ambrose Bierce (1890). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/375.txt.utf-8).

In [0]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')

**1. Create a Doc object from the file `owlcreek.txt`**<br>
> HINT: Use `with open('../TextFiles/owlcreek.txt') as f:`

In [0]:
# Enter your code here:
with open('/content/owlcreek.txt') as f:
  doc = nlp(f.read())

In [7]:
# Run this cell to verify it worked:

doc[:36]

AN OCCURRENCE AT OWL CREEK BRIDGE

by Ambrose Bierce

I

A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.  

**2. How many tokens are contained in the file?**

In [8]:
from typing import List

#doc: list[str]
#each item in the list is 1 token

print(len(doc))

4833


**3. How many sentences are contained in the file?**<br>HINT: You'll want to build a list first!

In [10]:
sentences = list(doc.sents)
print(len(sentences))

222


**4. Print the second sentence in the document**<br> HINT: Indexing starts at zero, and the title counts as the first sentence.

In [12]:
print(sentences[1].text)

I




In [13]:
print(sentences[2])

A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.  


In [14]:
for sentence in sentences:
  print(sentence,"\n")

AN OCCURRENCE AT OWL CREEK BRIDGE

by Ambrose Bierce

 

I

 

A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.   

The man's hands were behind
his back, the wrists bound with a cord.   

A rope closely encircled his
neck.   

It was attached to a stout cross-timber above his head and the
slack fell to the level of his knees.   

Some loose boards laid upon the
ties supporting the rails of the railway supplied a footing for him
and his executioners--two private soldiers of the Federal army,
directed by a sergeant who in civil life may have been a deputy
sheriff.   

At a short remove upon the same temporary platform was an
officer in the uniform of his rank, armed.   

He was a captain.   

A
sentinel at each end of the bridge stood with his rifle in the
position known as "support," that is to say, vertical in front of the
left shoulder, the hammer resting on the forearm thrown straight
across the chest--a formal and unnatural

** 5. For each token in the sentence above, print its `text`, `POS` tag, `dep` tag and `lemma`<br>
CHALLENGE: Have values line up in columns in the print output.**

In [24]:
# NORMAL SOLUTION:
for token in sentences[2]:
  #print(type(token.text))
  #print(type(token.dep_))
  print(type(token.lemma))
  print(token,token.text,token.pos,token.dep_,token.lemma_)

<class 'int'>
A A 90 det a
<class 'int'>
man man 92 nsubj man
<class 'int'>
stood stood 100 ROOT stand
<class 'int'>
upon upon 85 prep upon
<class 'int'>
a a 90 det a
<class 'int'>
railroad railroad 92 compound railroad
<class 'int'>
bridge bridge 92 pobj bridge
<class 'int'>
in in 85 prep in
<class 'int'>
northern northern 84 amod northern
<class 'int'>
Alabama Alabama 96 pobj Alabama
<class 'int'>
, , 97 punct ,
<class 'int'>
looking looking 100 advcl look
<class 'int'>
down down 86 advmod down
<class 'int'>

 
 103  

<class 'int'>
into into 85 prep into
<class 'int'>
the the 90 det the
<class 'int'>
swift swift 84 amod swift
<class 'int'>
water water 92 pobj water
<class 'int'>
twenty twenty 93 nummod twenty
<class 'int'>
feet feet 92 npadvmod foot
<class 'int'>
below below 86 advmod below
<class 'int'>
. . 97 punct .
<class 'int'>
    103   


In [37]:
# CHALLENGE SOLUTION:
for token in sentences[2]:
  print(f"{token.text:{20}} {token.pos:{5}} {token.dep_:{20}} {token.lemma_:{20}}")


A                       90 det                  a                   
man                     92 nsubj                man                 
stood                  100 ROOT                 stand               
upon                    85 prep                 upon                
a                       90 det                  a                   
railroad                92 compound             railroad            
bridge                  92 pobj                 bridge              
in                      85 prep                 in                  
northern                84 amod                 northern            
Alabama                 96 pobj                 Alabama             
,                       97 punct                ,                   
looking                100 advcl                look                
down                    86 advmod               down                

                      103                      
                   
into                    85 prep   

**6. Write a matcher called 'Swimming' that finds both occurrences of the phrase "swimming vigorously" in the text**<br>
HINT: You should include an `'IS_SPACE': True` pattern between the two words!

In [0]:
# Import the Matcher library:

from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)

In [0]:
# Create a pattern and add it to matcher:

pattern = [{'Lower':'swimming'},{'IS_SPACE':True,'OP':'*'},{'lower':'vigorously'}]

matcher.add('vigorously',None,pattern)



In [54]:
# Create a list of matches called "found_matches" and print the list:
# [(12881893835109366681, 1274, 1277), (12881893835109366681, 3607, 3610)]

print(matcher(doc))


[(11766727115402679900, 1274, 1277), (11766727115402679900, 3607, 3610)]


**7. Print the text surrounding each found match**

In [59]:
matches = matcher(doc)
#print(matches)
for match in matches:
  start = match[1]
  end = match[2]
  print(doc[start-10:end+10],"\n")

 By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and 

saw all this over his shoulder; he was now swimming
vigorously with the current.  His brain was as energetic 



over his shoulder; he was now swimming
vigorously with the current.  


**EXTRA CREDIT:<br>Print the *sentence* that contains each found match**

In [63]:
for sentence in sentences:
  if matches[0][1] < sentence.end:
      print(sentence)
      break

By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home.  


In [64]:
for sentence in sentences:
  if matches[1][1] < sentence.end:
      print(sentence)
      break

The hunted man saw all this over his shoulder; he was now swimming
vigorously with the current.  


In [66]:
for sentence in sentences:
  for match in matches:
    if matches[1][1] < sentence.end:
      print(sentence,"\n")
      break

The hunted man saw all this over his shoulder; he was now swimming
vigorously with the current.   

His brain was as energetic as his arms
and legs; he thought with the rapidity of lightning:

"The officer," he reasoned, "will not make that martinet's error a
second time.   

It is as easy to dodge a volley as a single shot.   

He
has probably already given the command to fire at will.   

God help me,
I cannot dodge them all!"

 

An appalling splash within two yards of him was followed by a loud,
rushing sound, DIMINUENDO, which seemed to travel back through the air
to the fort and died in an explosion which stirred the very river to
its deeps!   

A rising sheet of water curved over him, fell down upon
him, blinded him, strangled him!   

The cannon had taken an hand in the
game.   

As he shook his head free from the commotion of the smitten
water he heard the deflected shot humming through the air ahead, and
in an instant it was cracking and smashing the branches in the forest
be

In [69]:
for match in matches:
  for sentence in sentences:
    if match[0] > sentence.end:
      print(sentence,"\n")
      break 

AN OCCURRENCE AT OWL CREEK BRIDGE

by Ambrose Bierce

 

AN OCCURRENCE AT OWL CREEK BRIDGE

by Ambrose Bierce

 

