In [1]:
# Perform standard imports
import spacy
nlp = spacy.load('en_core_web_sm')

In [2]:
# Import the Matcher library
from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)

In [4]:
pattern1 = [{'LOWER': 'solarpower'}]
pattern2 = [{'LOWER': 'solar'}, {'LOWER': 'power'}]
pattern3 = [{'LOWER': 'solar'}, {'IS_PUNCT': True}, {'LOWER': 'power'}]

matcher.add('SolarPower', [pattern1, pattern2, pattern3])

In [5]:
doc = nlp(u'The Solar Power industry continues to grow as demand \
for solarpower increases. Solar-power cars are gaining popularity.')

In [6]:
found_matches = matcher(doc)
print(found_matches)

[(8656102463236116519, 1, 3), (8656102463236116519, 10, 11), (8656102463236116519, 13, 16)]


In [7]:
for match_id, start, end in found_matches:
    string_id = nlp.vocab.strings[match_id]  # get string representation
    span = doc[start:end]                    # get the matched span
    print(match_id, string_id, start, end, span.text)

8656102463236116519 SolarPower 1 3 Solar Power
8656102463236116519 SolarPower 10 11 solarpower
8656102463236116519 SolarPower 13 16 Solar-power


In [8]:
# Redefine the patterns:
pattern1 = [{'LOWER': 'solarpower'}]
pattern2 = [{'LOWER': 'solar'}, {'IS_PUNCT': True, 'OP':'*'}, {'LOWER': 'power'}]

# Remove the old patterns to avoid duplication:
matcher.remove('SolarPower')

# Add the new set of patterns to the 'SolarPower' matcher:
matcher.add('SolarPower', [pattern1, pattern2])

In [9]:
found_matches2 = matcher(doc)
print(found_matches2)

[(8656102463236116519, 1, 3), (8656102463236116519, 10, 11), (8656102463236116519, 13, 16)]


In [10]:
pattern1 = [{'LOWER': 'solarpower'}]
pattern2 = [{'LOWER': 'solar'}, {'IS_PUNCT': True, 'OP':'*'}, {'LEMMA': 'power'}] # CHANGE THIS PATTERN

# Remove the old patterns to avoid duplication:
matcher.remove('SolarPower')

# Add the new set of patterns to the 'SolarPower' matcher:
matcher.add('SolarPower', [pattern1, pattern2])

In [11]:
doc2 = nlp(u'Solar-powered energy runs solar-powered cars.')
found_matches3 = matcher(doc2)
print(found_matches3)

[(8656102463236116519, 0, 3), (8656102463236116519, 5, 8)]


In [12]:
for token in doc2:
    print(token.text, token.lemma_)

Solar solar
- -
powered power
energy energy
runs run
solar solar
- -
powered power
cars car
. .


# PhraseMatcher

In [13]:
# Perform standard imports, reset nlp
import spacy
nlp = spacy.load('en_core_web_sm')

In [14]:
# Import the PhraseMatcher library
from spacy.matcher import PhraseMatcher
matcher = PhraseMatcher(nlp.vocab)

In [17]:
with open('TextFiles/reaganomics2.txt', encoding='utf8') as f:
    doc3 = nlp(f.read())

In [18]:
# First, create a list of match phrases:
phrase_list = ['voodoo economics', 'supply-side economics', 'trickle-down economics', 'free-market economics']

# Next, convert each phrase to a Doc object:
phrase_patterns = [nlp(text) for text in phrase_list]

# Pass each Doc object into matcher (note the use of the asterisk!):
matcher.add('VoodooEconomics', phrase_patterns)

# Build a list of matches:
matches = matcher(doc3)

In [19]:
matches

[(3473369816841043438, 39, 43),
 (3473369816841043438, 44, 48),
 (3473369816841043438, 51, 53),
 (3473369816841043438, 66, 70),
 (3473369816841043438, 732, 736),
 (3473369816841043438, 6623, 6625)]

In [20]:
doc3[:75]

Reaganomics (/reɪɡəˈnɒmɪks/; a portmanteau of Reagan and economics attributed to Paul Harvey),[1] or Reaganism, were the neoliberal[2][3][4] economic policies promoted by U.S. President Ronald Reagan during the 1980s. These policies are characterized as supply-side economics, trickle-down economics, or "voodoo economics" by opponents,[5][6] while Reagan and his advocates preferred to call it free-market economics.

The pillars of

In [21]:
doc3[720:740]

policies. At the same time he attracted a following from the supply-side economics movement, which formed

In [23]:
doc3[6620:6635] 

Reagonomics or 'voodoo economics'?". BBC News. June 5,

In [24]:
for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]
    span = doc3[start-5:end+5]
    print(match_id, string_id, start, end, span)

3473369816841043438 VoodooEconomics 39 43 These policies are characterized as supply-side economics, trickle-down economics
3473369816841043438 VoodooEconomics 44 48 supply-side economics, trickle-down economics, or "voodoo economics
3473369816841043438 VoodooEconomics 51 53 down economics, or "voodoo economics" by opponents,[5][6] while
3473369816841043438 VoodooEconomics 66 70 advocates preferred to call it free-market economics.

The pillars of
3473369816841043438 VoodooEconomics 732 736 attracted a following from the supply-side economics movement, which formed in
3473369816841043438 VoodooEconomics 6623 6625 
 "Reagonomics or 'voodoo economics'?". BBC


In [25]:
# Build a list of sentences
sents = [sent for sent in doc3.sents]

# In the next section we'll see that sentences contain start and end token values:
print(sents[0].start, sents[0].end)

0 34


In [26]:
sent_index = 0

while sent_index < len(sents):
    sent = sents[sent_index]
    for match_id, start, end in matches:
        string_id = nlp.vocab.strings[match_id]
        if sent.end >= end and sent.start <= start:
            span = doc3[sent.start:sent.end]
            print(match_id, string_id, start, end, span)
    sent_index += 1

3473369816841043438 VoodooEconomics 39 43 These policies are characterized as supply-side economics, trickle-down economics, or "voodoo economics" by opponents,[5][6] while Reagan and his advocates preferred to call it free-market economics.


3473369816841043438 VoodooEconomics 44 48 These policies are characterized as supply-side economics, trickle-down economics, or "voodoo economics" by opponents,[5][6] while Reagan and his advocates preferred to call it free-market economics.


3473369816841043438 VoodooEconomics 51 53 These policies are characterized as supply-side economics, trickle-down economics, or "voodoo economics" by opponents,[5][6] while Reagan and his advocates preferred to call it free-market economics.


3473369816841043438 VoodooEconomics 66 70 These policies are characterized as supply-side economics, trickle-down economics, or "voodoo economics" by opponents,[5][6] while Reagan and his advocates preferred to call it free-market economics.


3473369816841043438 Vood