VOcabulay and Matching

Rule based Matching

In [None]:
import spacy
nlp = spacy.load("en_core_web_sm")

In [None]:
# import the matcher library
from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)

Creating patterns

In [None]:
pattern1 = [{'LOWER': 'solarpower'}]
pattern2 = [{'LOWER': 'solar'}, {'LOWER': 'power'}]
pattern3 = [{'LOWER': 'solar'}, {'IS_PUNCT': True}, {'LOWER': 'power'}]

matcher.add('SolarPower', [pattern1, pattern2, pattern3])

Applying the matcher to a DOC Object

In [None]:
doc = nlp(u'The Solar Power industry continues to grow as demand \
for solarpower increases. Solar-power cars are gaining popularity.')

In [None]:
found_matches = matcher(doc)
print(found_matches)

[(8656102463236116519, 1, 3), (8656102463236116519, 10, 11), (8656102463236116519, 13, 16)]


In [None]:
for match_id, start, end in found_matches:
    string_id = nlp.vocab.strings[match_id]  # get string representation
    span = doc[start:end]                    # get the matched span
    print(match_id, string_id, start, end, span.text)

8656102463236116519 SolarPower 1 3 Solar Power
8656102463236116519 SolarPower 10 11 solarpower
8656102463236116519 SolarPower 13 16 Solar-power


setting pattern options and quantifiers

In [None]:
# redifine patterns:
pattern1 = [{'LOWER': 'solarpower'}]
pattern2 = [{'LOWER': 'solar'}, {'IS_PUNCT': True, 'OP':'*'}, {'LOWER': 'power'}]

# add the new set of patterns to the 'solarpower' matcher:
matcher.add('SolarPower', [pattern1, pattern2])

In [None]:
found_matches = matcher(doc)
print(found_matches)

[(8656102463236116519, 1, 3), (8656102463236116519, 10, 11), (8656102463236116519, 13, 16)]


BE careful with lemmas

In [None]:
# redifine patterns:
pattern1 = [{'LOWER': 'solarpower'}]
pattern2 = [{'LOWER': 'solar'}, {'IS_PUNCT': True, 'OP':'*'}, {'LEMMA': 'power'}]

# remove the older patterns to avoid duplication:
matcher.remove('SolarPower')

# add the new set of patterns to the 'solarpower' matcher:
matcher.add('SolarPower', [pattern1, pattern2])

In [None]:
doc2 = nlp(u'Solar-powered energy runs solar-powered cars.')

In [None]:
found_matches = matcher(doc2)
print(found_matches)

[(8656102463236116519, 0, 3), (8656102463236116519, 5, 8)]


Phase Matcher

In [1]:
# perform imports
import spacy
nlp = spacy.load('en_core_web_sm')

In [2]:
# import the phrase matcher library
from spacy.matcher import PhraseMatcher
matcher = PhraseMatcher(nlp.vocab)

In [5]:
with open("/content/reaganomics.txt","r", encoding='latin-1') as f:
    doc3 = nlp(f.read())

In [6]:
# first create a list of match phrases:
phrase_list = ['voodoo economics', 'supply-side economics', 'trickle-down economics', 'free-market economics']

# Next, convert each phrase to a Doc object:
phrase_patterns = [nlp(text) for text in phrase_list]

# Pass each doc object into matcher(note the use of arterisk(*))
matcher.add('VoodooEconomics', None, *phrase_patterns)

# Build a list of matches:
matches = matcher(doc3)

In [7]:
# match_id, start, end
matches

[(3473369816841043438, 41, 45),
 (3473369816841043438, 49, 53),
 (3473369816841043438, 54, 56),
 (3473369816841043438, 61, 65),
 (3473369816841043438, 673, 677),
 (3473369816841043438, 2986, 2990)]

In [8]:
doc3[:70]

REAGANOMICS
https://en.wikipedia.org/wiki/Reaganomics

Reaganomics (a portmanteau of [Ronald] Reagan and economics attributed to Paul Harvey)[1] refers to the economic policies promoted by U.S. President Ronald Reagan during the 1980s. These policies are commonly associated with supply-side economics, referred to as trickle-down economics or voodoo economics by political opponents, and free-market economics by political advocates.


Viewing matches

In [9]:
doc3[665:685]  # note that the fifth match start at doc3[673]

same time he attracted a following from the supply-side economics movement, which formed in opposition to Keynesian

In [10]:
doc3[2975:2995]  # the sixth match start at doc3[2985]

lawsuits against institutions.[66] His policies became widely known as "trickle-down economics", due to the

In [11]:
# build a list of sentences
sents = [sent for sent in doc3.sents]

# print the second, fourth and fifth sentence in the document
print(sents[1].text)
print(sents[3].text)
print(sents[4].text)

These policies are commonly associated with supply-side economics, referred to as trickle-down economics or voodoo economics by political opponents, and free-market economics by political advocates.


Supporters point to the end of stagflation, stronger GDP growth, and an entrepreneur revolution in the decades that followed.[3][4] Critics point to the widening income gap, an atmosphere of greed, and the national debt tripling in eight years which ultimately reversed the post-World War II trend of a shrinking national debt as percentage of GDP.[5][6]

HISTORICAL CONTEXT

Prior to the Reagan administration, the United States economy experienced a decade of high unemployment and persistently high inflation (known as stagflation).
Attacks on Keynesian economic orthodoxy as well as empirical economic models such as the Phillips Curve grew.
