# NLP Basics Assessment

For this assessment we'll be using the short story [_An Occurrence at Owl Creek Bridge_](https://en.wikipedia.org/wiki/An_Occurrence_at_Owl_Creek_Bridge) by Ambrose Bierce (1890). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/375.txt.utf-8).

In [3]:
import spacy
import en_core_web_sm

nlp=en_core_web_sm.load()

In [4]:
# Create a doc object
with open("../TextFiles/owlcreek.txt") as f:
    doc=nlp(f.read())

In [5]:
doc[:36]

AN OCCURRENCE AT OWL CREEK BRIDGE

by Ambrose Bierce

I

A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.  

In [7]:
# No of tokens in the doc object
len(doc)

4835

In [20]:
# No of sentences in the doc object
sents=[sent for sent in doc.sents]


In [21]:
len(sents)

254

In [27]:
# Printing fourth sentence of the doc
sents[3]

A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.

For each token in the sentence above, print its text, POS tag, dep tag and lemma

In [28]:
doc1=sents[3]

In [32]:
for token in doc1:
    print(f"{token.text:{20}} {token.pos_:{10}} {token.dep_:{10}} {token.lemma_}")

A                    DET        det        a
man                  NOUN       nsubj      man
stood                VERB       ROOT       stand
upon                 SCONJ      prep       upon
a                    DET        det        a
railroad             NOUN       compound   railroad
bridge               NOUN       pobj       bridge
in                   ADP        prep       in
northern             ADJ        amod       northern
Alabama              PROPN      pobj       Alabama
,                    PUNCT      punct      ,
looking              VERB       advcl      look
down                 ADP        prep       down

                    SPACE      pobj       

into                 ADP        prep       into
the                  DET        det        the
swift                ADJ        amod       swift
water                NOUN       pobj       water
twenty               NUM        nummod     twenty
feet                 NOUN       npadvmod   foot
below                ADV        advmod

**Write a matcher called 'Swimming' that finds both occurrences of the phrase "swimming vigorously" in the doc**

In [33]:
from spacy.matcher import Matcher
matcher=Matcher(nlp.vocab)

In [34]:
pattern=[{"LOWER":'swimming'},{"IS_SPACE":True,"OP":"*"},{'LOWER':"vigorously"}]
matcher.add('Swimming',[pattern])

In [35]:
find_matches=matcher(doc)

In [36]:
find_matches

[(12881893835109366681, 1274, 1277), (12881893835109366681, 3609, 3612)]

In [37]:
for match_id, start, end in find_matches:
    string_id=nlp.vocab.strings[match_id]
    span=doc[start-5:end+6]
    print(match_id,string_id,start,end,span)

12881893835109366681 Swimming 1274 1277 evade the bullets and, swimming
vigorously, reach the bank, take
12881893835109366681 Swimming 3609 3612 shoulder; he was now swimming
vigorously with the current.  His


In [39]:
# Find the sentences in which these matches occured
for sent in sents:
    if find_matches[0][1] < sent.end:
        print(sent)
        break

 By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home.


In [40]:
for sent in sents:
    if find_matches[1][1] < sent.end:
        print(sent)
        break

The hunted man saw all this over his shoulder; he was now swimming
vigorously with the current.
