# NLP Basics Assessment

Here we'll be using the short story [_An Occurrence at Owl Creek Bridge_](https://en.wikipedia.org/wiki/An_Occurrence_at_Owl_Creek_Bridge) by Ambrose Bierce (1890). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/375.txt.utf-8).

In [1]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')

**1. Create a Doc object from the file `owlcreek.txt`**<br>
> HINT: Use `with open('../TextFiles/owlcreek.txt') as f:`

In [2]:
# Enter your code here:
with open('C:\\Users\\DELL\\Desktop\\Machine-Learning\\Lesson-5-Text Mining\\nlplu\\exercise\\TextFiles\\owlcreek.txt') as f:
    doc = nlp(f.read())


In [3]:
# Run this cell to verify it worked:

doc[:36]

AN OCCURRENCE AT OWL CREEK BRIDGE

by Ambrose Bierce

I

A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.  

**2. How many tokens are contained in the file?**

In [4]:
type(doc)

spacy.tokens.doc.Doc

In [5]:
len(doc)

4835

**3. How many sentences are contained in the file?**<br>HINT: You'll want to build a list first!

In [6]:

sentences = list(doc.sents)
print (len(sentences))

249


In [7]:
sents1 = [sent for sent in doc.sents]
sents1

[AN OCCURRENCE AT OWL CREEK BRIDGE
 
 by Ambrose Bierce
 , I
 , A man stood upon a railroad bridge in northern Alabama, looking down
 into the swift water twenty feet below.  , The man's hands were behind, his back, the wrists bound with a cord.  , A rope closely encircled his
 neck.  , It was attached to a stout cross-timber above his head and, the
 slack fell to the level of his knees.  , Some loose boards laid upon the
 ties supporting the rails of the railway supplied a footing for him
 and his executioners--two private soldiers of the Federal army,
 directed by a sergeant who in civil life may have been a deputy
 sheriff.  , At a short remove upon the same temporary platform was an
 officer in the uniform of his rank, armed.  , He was a captain.  , A
 sentinel at each end of the bridge stood with his rifle in the
 position known as "support," that is to say, vertical in front of the
 left shoulder, the hammer resting on the forearm thrown straight
 across the chest--a formal and u

In [8]:
len(sents1)

249

**4. Print the second sentence in the document**<br> HINT: Indexing starts at zero, and the title counts as the first sentence.

In [9]:
print(sentences[2].text)

A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.  


** 5. For each token in the sentence above, print its `text`, `POS` tag, `dep` tag and `lemma`<br>
CHALLENGE: Have values line up in columns in the print output.**

In [10]:
# NORMAL SOLUTION:

for token in sentences[2]:
    print(token.text,token.pos_,token.dep_,token.lemma_)


A DET det a
man NOUN nsubj man
stood VERB ROOT stand
upon SCONJ prep upon
a DET det a
railroad NOUN compound railroad
bridge NOUN pobj bridge
in ADP prep in
northern ADJ amod northern
Alabama PROPN pobj Alabama
, PUNCT punct ,
looking VERB advcl look
down ADV prt down

 SPACE  

into ADP prep into
the DET det the
swift ADJ amod swift
water NOUN pobj water
twenty NUM nummod twenty
feet NOUN npadvmod foot
below ADV advmod below
. PUNCT punct .
  SPACE   


In [11]:
# CHALLENGE SOLUTION:
for s in sentences[2]:
    print(f'{s.text:{15}} {s.pos_:{15}} {s.dep_:{15}} {s.lemma_:{15}}')


A               DET             det             a              
man             NOUN            nsubj           man            
stood           VERB            ROOT            stand          
upon            SCONJ           prep            upon           
a               DET             det             a              
railroad        NOUN            compound        railroad       
bridge          NOUN            pobj            bridge         
in              ADP             prep            in             
northern        ADJ             amod            northern       
Alabama         PROPN           pobj            Alabama        
,               PUNCT           punct           ,              
looking         VERB            advcl           look           
down            ADV             prt             down           

               SPACE                           
              
into            ADP             prep            into           
the             DET             det     

**6. Write a matcher called 'Swimming' that finds both occurrences of the phrase "swimming vigorously" in the text**<br>
HINT: You should include an `'IS_SPACE': True` pattern between the two words!

In [12]:
# Import the Matcher library:

from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)

In [13]:
# Create a pattern and add it to matcher:

pattern = [{'LOWER':'swimming'},{'IS_SPACE':True},{'LOWER':'vigorously'}]

matcher.add('swimming',None,pattern)

In [14]:
# Create a list of matches called "found_matches" and print the list:

found_matches = matcher(doc)
print(found_matches)

[(12526975369366237900, 1274, 1277), (12526975369366237900, 3609, 3612)]


**7. Print the text surrounding each found match**

In [15]:
print(doc[1270:1300])

the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home.  My
home, thank God, is


In [16]:
print(doc[3600:3620])

all this over his shoulder; he was now swimming
vigorously with the current.  His brain was


**EXTRA CREDIT:<br>Print the *sentence* that contains each found match**

In [17]:
type(found_matches)

list

In [18]:
found_matches[0]

(12526975369366237900, 1274, 1277)

In [19]:
for sent in sentences:
    print(sent.end)

11
13
36
43
54
63
76
88
138
161
167
233
274
302
309
318
366
423
453
472
484
505
527
557
574
594
616
661
705
713
735
761
787
821
840
866
885
895
909
922
953
973
981
987
1006
1049
1061
1072
1081
1100
1109
1128
1146
1154
1161
1164
1179
1193
1196
1204
1212
1224
1237
1265
1292
1323
1360
1366
1368
1388
1397
1398
1417
1488
1490
1507
1515
1532
1554
1590
1630
1647
1671
1695
1718
1756
1761
1763
1779
1784
1786
1800
1826
1828
1868
1870
1875
1888
1918
1931
1944
1959
1974
1985
1993
1995
2015
2051
2074
2097
2113
2134
2143
2168
2174
2194
2211
2252
2277
2303
2315
2321
2346
2368
2402
2431
2432
2447
2475
2502
2506
2508
2513
2522
2525
2551
2562
2576
2597
2609
2641
2678
2691
2704
2723
2770
2782
2794
2822
2840
2866
2898
2916
2966
2973
2987
3047
3058
3069
3088
3101
3130
3163
3190
3218
3227
3256
3287
3303
3343
3385
3388
3390
3394
3396
3399
3402
3407
3411
3422
3468
3489
3509
3539
3545
3584
3596
3617
3638
3660
3674
3688
3702
3750
3772
3784
3828
3854
3884
3890
3892
3907
3933
3955
3983
4023
4053
4074
4082
4098
41

In [20]:
for sent in sentences:
    print(sent.start)

0
11
13
36
43
54
63
76
88
138
161
167
233
274
302
309
318
366
423
453
472
484
505
527
557
574
594
616
661
705
713
735
761
787
821
840
866
885
895
909
922
953
973
981
987
1006
1049
1061
1072
1081
1100
1109
1128
1146
1154
1161
1164
1179
1193
1196
1204
1212
1224
1237
1265
1292
1323
1360
1366
1368
1388
1397
1398
1417
1488
1490
1507
1515
1532
1554
1590
1630
1647
1671
1695
1718
1756
1761
1763
1779
1784
1786
1800
1826
1828
1868
1870
1875
1888
1918
1931
1944
1959
1974
1985
1993
1995
2015
2051
2074
2097
2113
2134
2143
2168
2174
2194
2211
2252
2277
2303
2315
2321
2346
2368
2402
2431
2432
2447
2475
2502
2506
2508
2513
2522
2525
2551
2562
2576
2597
2609
2641
2678
2691
2704
2723
2770
2782
2794
2822
2840
2866
2898
2916
2966
2973
2987
3047
3058
3069
3088
3101
3130
3163
3190
3218
3227
3256
3287
3303
3343
3385
3388
3390
3394
3396
3399
3402
3407
3411
3422
3468
3489
3509
3539
3545
3584
3596
3617
3638
3660
3674
3688
3702
3750
3772
3784
3828
3854
3884
3890
3892
3907
3933
3955
3983
4023
4053
4074
4082
4098


In [21]:
for sent in sentences:
    if found_matches[0][1]<sent.end:
        print(sent)
        break

By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home.  


In [22]:
for sent in sentences:
    if found_matches[1][1]<sent.end:
        print(sent)
        break

The hunted man saw all this over his shoulder; he was now swimming
vigorously with the current.  


# Done!