# Regex Hearst Patterns
---
In this experiment we test the utility of Hearst Patterns for detecting the ingroup and outgroup of a text.

For this experiment regex is used with code taken from: https://github.com/mmichelsonIF/hearst_patterns_python/blob/master/hearstPatterns/hearstPatterns.py

Hypernym relations are semantic relationships between two concepts: C1 is a hypernym of C2 means that C1 categorizes C2 (e.g. “instrument” is a hypernym of “Piano”). For this research, the phrase, "America has enemies, such as Al Qaeda and the Taliban" would return the following '[('Al Qaeda', 'enemy'), ('the Taliban', 'enemy')]'. In this example, the categorising term 'enemy' is a hypernym of both 'Al Qaeda' and the 'Taliban'; conversely 'al Qaeda' and 'the Tabliban' are hyponyms of 'enemy'. Using this technique, hypernym terms could be classified as ingroup or outgroup and named entities identified as hyponym terms could be identified as either group.

This experiment has not produced any results from the bin Laden text, but has produced some promising results from the Bush text:

In [19]:
h = HearstPatterns(extended=True, merge = False)

true_positives = [
    "The evidence we have gathered all points to a collection of loosely affiliated terrorist organizations known as al Qaeda.",
    "Terrorist groups like al Qaeda depend upon the aid or indifference of governments.",
    "Other close friends, including Canada, Australia, Germany and France, have pledged forces as the operation unfolds.",
]
             
for sentence in true_positives:
    print(h.find_hyponyms(sentence))

NP_the_evidence -PRON- have gather NP_all_point to NP_a_collection of NP_loosely_affiliate_terrorist_organization know as NP_al_Qaeda .
[('al Qaeda', 'loosely affiliate terrorist organization')]
NP_terrorist_group like NP_al_Qaeda depend upon NP_the_aid or NP_indifference of NP_government .
[('al Qaeda', 'terrorist group')]
other NP_close_friend , include NP_Canada , NP_Australia , NP_Germany and NP_France , have pledge NP_force as NP_the_operation unfold .
[('Canada', 'close friend'), ('Australia', 'close friend'), ('Germany', 'close friend'), ('France', 'close friend'), ('force', 'the operation')]


But there are some false positives

In [10]:
false_positives = [
    "This new law that I sign today will allow surveillance of all communications used by terrorists, including e-mails, the Internet, and cell phones.",
    "From this day forward, any nation that continues to harbor or support terrorism will be regarded by the United States as a hostile regime."
]

for sentence in false_positives:
    print(h.find_hyponyms(sentence))

-----
NP_terrorist , include NP_e__mail , NP_the_internet , and NP_cell_phone 
[('e  mail', 'terrorist'), ('the internet', 'terrorist'), ('cell phone', 'terrorist')]
-----
NP_the_United_States as NP_a_hostile_regime
[('the United States', 'a hostile regime')]


In [10]:
%%time

import os
import importlib
import cndobjects
importlib.reload(cndobjects)


dirpath = r'C:\\Users\\Steve\\OneDrive - University of Southampton\\CNDPipeline\\dataset'

orators = cndobjects.Dataset(dirpath)

orators.summarise()

Wall time: 339 ms


Unnamed: 0_level_0,Name,Text Count,Word Count,File Size
Ref,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
hitler,Adolf Hitler,0.0,0.0,56.0
bush,George Bush,14.0,143936.0,56.0
tolstoy,Leo Tolstoy,0.0,0.0,56.0
king,Martin Luther King,5.0,122815.0,56.0
laden,Osama bin Laden,5.0,77440.0,56.0
Totals,,24.0,344191.0,280.0


In [9]:
%%time

"""
the following code is taken from: https://github.com/mmichelsonIF/hearst_patterns_python/blob/master/hearstPatterns/test/test_hearstPatterns.py

"""

import re
import string
import spacy
from spacy.pipeline import merge_noun_chunks
from spacy.pipeline import merge_entities


class HearstPatterns(object):

    def __init__(self, extended=False, merge = False):

        self.__adj_stopwords = [
            'able', 'available', 'brief', 'certain',
            'different', 'due', 'enough', 'especially', 'few', 'fifth',
            'former', 'his', 'howbeit', 'immediate', 'important', 'inc',
            'its', 'last', 'latter', 'least', 'less', 'likely', 
            'little', 'mainly', 'many', 'ml', 'more', 'most', 'mostly', 'much', 
            'my', 'necessary', 'new', 'next', 'non', 'notably', 'old', 'other', 
            'our', 'ours', 'own', 'particular', 'particularly', 'principally',
            'past', 'possible', 'present', 'proud', 'recent', 'same', 'several', 
            'significant', 'similar', 'such', 'sup', 'sure', 'these', 'those'
        ]

        # now define the Hearst patterns
        # format is <hearst-pattern>, <general-term>
        # so, what this means is that if you apply the first pattern,
        # the first Noun Phrase (NP)
        # is the general one, and the rest are specific NPs
        self.__hearst_patterns = [
            (
                '(NP_\\w+ (, )?such as (NP_\\w+ ?(, )?(and |or )?)+)',
                'first'
            ),
            (
                '(NP_\\w+ (, )?know as (NP_\\w+ ?(, )?(and |or )?)+)', # added for this experiment
                'first'
            ),
            (
                '(such NP_\\w+ (, )?as (NP_\\w+ ?(, )?(and |or )?)+)',
                'first'
            ),
            (
                '(NP_\\w+ (, )?include (NP_\\w+ ?(, )?(and |or )?)+)',
                'first'
            ),
            (
                '(NP_\\w+ (, )?especially (NP_\\w+ ?(, )?(and |or )?)+)',
                'first'
            ),
            (
                '((NP_\\w+ ?(, )?)+(and |or )?other NP_\\w+)',
                'last'
            ),
        ]

        if extended:
            self.__hearst_patterns.extend([
                (
                    '(NP_\\w+ (, )?like (NP_\\w+ ?(, )?(and |or )?)+)',
                    'first'
                ),
                (
                    '(NP_\\w+ (, )?mainly (NP_\\w+ ?(, )?(and |or )?)+)',
                    'first'
                ),
                (
                    '(NP_\\w+ (, )?mostly (NP_\\w+ ?(, )?(and |or )?)+)',
                    'first'
                ),
                (
                    '(NP_\\w+ (, )?notably (NP_\\w+ ?(, )?(and |or )?)+)',
                    'first'
                ),
                (
                    '(NP_\\w+ (, )?particularly (NP_\\w+ ?(, )?(and |or )?)+)', ######
                    'first'
                ),
                (
                    '(NP_\\w+ (, )?principally (NP_\\w+ ?(, )?(and |or )?)+)',
                    'first'
                ),
                (
                    '(NP_\\w+ (, )?in particular (NP_\\w+ ?(, )?(and |or )?)+)',
                    'first'
                ),
                (
                    '(NP_\\w+ (, )?except (NP_\\w+ ?(, )?(and |or )?)+)',
                    'first'
                ),
                (
                    '(NP_\\w+ (, )?other than (NP_\\w+ ?(, )?(and |or )?)+)',
                    'first'
                ),
                (
                    '(NP_\\w+ (, )?e.g. (, )?(NP_\\w+ ? (, )?(and |or )?)+)',
                    'first'
                ),
                (
                    '(NP_\\w+ \\( (e.g.|i.e.) (, )?(NP_\\w+ ? (, )?(and |or )?)+'
                    '(\\. )?\\))',
                    'first'
                ),
                (
                    '(NP_\\w+(, )?i.e. (, )?(NP_\\w+ ? (, )?(and |or )?)+)',
                    'first'
                ),
                (
                    'example of (NP_\\w+ (, )?be (NP_\\w+ ?(, )?(and |or )?)+)', 
                    'first'
                ),
                (
                    '(NP_\\w+ (, )?for example (, )?(NP_\\w+ ?(, )?(and |or )?)+)', #####
                    'first'
                ),
                (
                    '(NP_\\w+ (, )?which be similar to (NP_\\w+ ?(, )?(and |or )?)+)',
                    'first'
                ),
                (
                    '(NP_\\w+ (, )?example of this be (NP_\\w+ ?(, )?(and |or )?)+)',
                    'first'
                ),
                (
                    '(NP_\\w+ (, )?whether (NP_\\w+ ?(, )?(and |or )?)+)',
                    'first'
                ),
                (
                    '(NP_\\w+ (, )?compare to (NP_\\w+ ?(, )?(and |or )?)+)',
                    'first'
                ),
                (
                    '(NP_\\w+ (, )?among -PRON- (NP_\\w+ ?(, )?(and |or )?)+)',
                    'first'
                ),
                (
                    '(NP_\\w+ (, )?type (NP_\\w+ ?(, )?(and |or )?)+)',
                    'first'
                ),
                (
                    '(NP_\\w+ (, )? (NP_\\w+ ? (, )?(and |or )?)+ for instance)',
                    'first'
                ),
                (
                    '(NP_\\w+ (, )?which may include (NP_\\w+ ?(, )?(and |or )?)+)',
                    'first'
                ),
                (
                    '((NP_\\w+ ?(, )?)+(and |or )?any other NP_\\w+)',
                    'last'
                ),
                (
                    '((NP_\\w+ ?(, )?)+(and |or )?some other NP_\\w+)',
                    'last'
                ),
                (
                    '((NP_\\w+ ?(, )?)+(and |or )?be a NP_\\w+)',
                    'last'
                ),

#                 (
#                     'such (NP_\\w+ (, )?as (NP_\\w+ ? (, )?(and |or )?)+)',
#                     'first'
#                 ),
                (
                    '((NP_\\w+ ?(, )?)+(and |or )?like other NP_\\w+)',
                    'last'
                ),
                (
                    '((NP_\\w+ ?(, )?)+(and |or )?one of the NP_\\w+)',
                    'last'
                ),
                (
                    '((NP_\\w+ ?(, )?)+(and |or )?one of these NP_\\w+)',
                    'last'
                ),
                (
                    '((NP_\\w+ ?(, )?)+(and |or )?one of those NP_\\w+)',
                    'last'
                ),
                (
                    '((NP_\\w+ ?(, )?)+(and |or )?be example of NP_\\w+)',
                    'last'
                ),
                (
                    '((NP_\\w+ ?(, )?)+(and |or )?which be call NP_\\w+)',
                    'last'
                ),
                (
                    '((NP_\\w+ ?(, )?)+(and |or )?which be name NP_\\w+)',
                    'last'
                ),
                (
                    '((NP_\\w+ ?(, )?)+(and|or)? a kind of NP_\\w+)',
                    'last'
                ),
                (
                    '((NP_\\w+ ?(, )?)+(and|or)? kind of NP_\\w+)',
                    'last'
                ),
                (
                    '((NP_\\w+ ?(, )?)+(and|or)? form of NP_\\w+)',
                    'last'
                ),
                (
                    '((NP_\\w+ ?(, )?)+(and |or )?which look like NP_\\w+)',
                    'last'
                ),
                (
                    '((NP_\\w+ ?(, )?)+(and |or )?which sound like NP_\\w+)',
                    'last'
                ),
                (
                    '((NP_\\w+ ?(, )?)+(and |or )? NP_\\w+ type)',
                    'last'
                ),
                (
                    '(compare (NP_\\w+ ?(, )?)+(and |or )?with NP_\\w+)',
                    'last'
                ),
                (
                    '((NP_\\w+ ?(, )?)+(and |or )?as NP_\\w+)',
                    'last'
                ),
                (
                    '((NP_\\w+ ?(, )?)+(and|or)? sort of NP_\\w+)',
                    'last'
                )
            ])

        self.__spacy_nlp = spacy.load('en_core_web_sm')
            

    def chunk(self, rawtext):
        doc = self.__spacy_nlp(rawtext)
        chunks = []
        for sentence in doc.sents:
            sentence_text = sentence.lemma_
            for chunk in sentence.noun_chunks:
                if chunk.lemma_.lower() == "example":
                    start = chunk.start
                    pre_token = sentence[start - 1].lemma_.lower()
                    post_token = sentence[start + 1].lemma_.lower()
                    if start > 0 and\
                            (pre_token == "for" or post_token == "of"):
                        continue
                if chunk.lemma_.lower() == "type":
                    continue
                chunk_arr = []
                replace_arr = []
                # print("chunk:", chunk)
                for token in chunk:
                    if token.lemma_ in self.__adj_stopwords + ["i.e.", "e.g."]:
                        continue
                    chunk_arr.append(token.lemma_)
                    # Remove punctuation and stopword adjectives
                    # (generally quantifiers of plurals)
                    if token.lemma_.isalnum():
                        replace_arr.append(token.lemma_)
                    else:
                        replace_arr.append(''.join(
                            char for char in token.lemma_ if char.isalnum()
                        ))
                if len(chunk_arr) == 0:
                    chunk_arr.append(chunk[-1].lemma_)
                chunk_lemma = ' '.join(chunk_arr)
                # print(chunk_lemma)
                replacement_value = 'NP_' + '_'.join(replace_arr)
                if chunk_lemma:
                    sentence_text = re.sub(r'\b%s\b' % re.escape(chunk_lemma),
                                           r'%s' % replacement_value,
                                           sentence_text)
            chunks.append(sentence_text)
        return chunks

    """
        This is the main entry point for this code.
        It takes as input the rawtext to process and returns a list
        of tuples (specific-term, general-term)
        where each tuple represents a hypernym pair.
    """
    
    def find_hyponyms(self, rawtext):

        hyponyms = []
        np_tagged_sentences = self.chunk(rawtext)

        for sentence in np_tagged_sentences:
            # two or more NPs next to each other should be merged
            # into a single NP, it's a chunk error
            
            #hyponyms.append(sentence)

            for (hearst_pattern, parser) in self.__hearst_patterns:
                matches = re.search(hearst_pattern, sentence)
                if matches:
                    match_str = matches.group(0)

                    nps = [a for a in match_str.split() if a.startswith("NP_")]

                    if parser == "first":
                        general = nps[0]
                        specifics = nps[1:]
                    else:
                        general = nps[-1]
                        specifics = nps[:-1]

                    for i in range(len(specifics)):
                        pair = (
                            self.clean_hyponym_term(specifics[i]),
                            self.clean_hyponym_term(general)
                        )
                        # reduce duplicates
                        if pair not in hyponyms:
                            hyponyms.append(pair)

        return hyponyms

    def clean_hyponym_term(self, term):
        # good point to do the stemming or lemmatization
        return term.replace("NP_", "").replace("_", " ")


Wall time: 0 ns


In [36]:
%%time
h = HearstPatterns(extended=True, merge = False)

Wall time: 1.08 s


In [37]:
hyponyms = h.find_hyponyms(orators["bush"][4])
print(len(hyponyms))
print(hyponyms)

14
[('an exceptional man', 'passenger'), ('al Qaeda', 'loosely affiliate terrorist organization'), ('woman', 'civilian'), ('child', 'civilian'), ('the Egyptian Islamic Jihad', 'country'), ('the Islamic Movement', 'country'), ('Afghanistan', 'place'), ('american citizen', 'all foreign national'), ('Egypt', 'muslim country'), ('Saudi Arabia', 'muslim country'), ('Jordan', 'muslim country'), ('the will', 'every value'), ('the United States', 'a hostile regime'), ('terrorism', 'a threat')]


In [34]:
type(orators["bush"][4])

str

In [None]:
import os
import json

h = HearstPatterns(extended=True, merge = False)

dirpath = os.getcwd()
file = "last_docs.json"

with open(os.path.join(dirpath, file), "r") as f:
    last_docs = json.load(f)

for doc in last_docs:
    hyponyms = h.find_hyponyms(doc[1])
    #if len(hyponyms[1:]) != 3:
    print(doc[1])
    print(doc[0], '=>', hyponyms)
    print('----------')

In [37]:
%%time

h = HearstPatterns(extended=True, merge = False)

dirpath = r"C:\Users\Steve\OneDrive - University of Southampton\CNDPipeline\dataset\Tolstoy"
file = "warandpeace_testdata.json"

with open(os.path.join(dirpath, file), "r") as f:
    docs = json.load(f)
    
for doc in docs:
    hyponyms = h.find_hyponyms(doc[2])
    #if len(hyponyms[1:]) != 3:
    print(doc[2])
    print(doc[1])
    print(doc[0], '=>', hyponyms)
    print('----------')

The younger ones occupied themselves as before, some playing cards (there was plenty of money, though there was no food), some with more innocent games, such as quoits and skittles
True
such_as => [('quoit', 'innocent game'), ('skittle', 'innocent game')]
----------
The trench itself was the room, in which the lucky ones, such as the squadron commander, had a board, lying on piles at the end opposite the entrance, to serve as a table.
True
such_as => [('the squadron commander', 'the lucky one')]
----------
Through the hard century-old bark, even where there were no twigs, leaves had sprouted such as one could hardly believe the old veteran could have produced.
False
such_as => []
----------
Religion alone can explain to us what without its help man cannot comprehend: why, for what cause, kind and noble beings able to find happiness in life—not merely harming no one but necessary to the happiness of others—are called away to God, while cruel, useless, harmful persons, or such as are a b

In [14]:
import unittest

class TestHearstPatterns(unittest.TestCase):

    def test_hyponym_finder(self):
        h = HearstPatterns(extended=True)

        # H1
        hyps1 = h.find_hyponyms("Forty-four percent of patients with uveitis had one or more identifiable signs or symptoms, such as red eye, ocular pain, visual acuity, or photophobia, in order of decreasing frequency.")

        self.assertEqual(tuple(map(str.lower, hyps1[0])), ("red eye", "symptom"))
        self.assertEqual(tuple(map(str.lower, hyps1[1])), ("ocular pain", "symptom"))
        self.assertEqual(tuple(map(str.lower, hyps1[2])), ("visual acuity", "symptom"))
        self.assertEqual(tuple(map(str.lower, hyps1[3])), ("photophobia", "symptom"))

        # H2
        hyps2 = h.find_hyponyms("There are works by such authors as Herrick, Goldsmith, and Shakespeare.")
        self.assertEqual(tuple(map(str.lower, hyps2[0])), ("herrick", "author"))
        self.assertEqual(tuple(map(str.lower, hyps2[1])), ("goldsmith", "author"))
        self.assertEqual(tuple(map(str.lower, hyps2[2])), ("shakespeare", "author"))

        # H3
        hyps3 = h.find_hyponyms("There were bruises, lacerations, or other injuries were not prevalent.")
        self.assertEqual(tuple(map(str.lower, hyps3[0])), ("bruise", "injury"))
        self.assertEqual(tuple(map(str.lower, hyps3[1])), ("laceration", "injury"))

        # H4
        hyps4 = h.find_hyponyms("common law countries, including Canada, Australia, and England enjoy toast.")
        self.assertEqual(tuple(map(str.lower, hyps4[0])), ("canada", "common law country"))
        self.assertEqual(tuple(map(str.lower, hyps4[1])), ("australia", "common law country"))
        self.assertEqual(tuple(map(str.lower, hyps4[2])), ("england", "common law country"))

        # H5
        hyps5 = h.find_hyponyms("Many countries, especially France, England and Spain also enjoy toast.")
        self.assertEqual(tuple(map(str.lower, hyps5[0])), ("france", "country"))
        self.assertEqual(tuple(map(str.lower, hyps5[1])), ("england", "country"))
        self.assertEqual(tuple(map(str.lower, hyps5[2])), ("spain", "country"))

        # H2
        hyps6 = h.find_hyponyms("There are such benefits as postharvest losses reduction, food increase and soil fertility improvement.")
        self.assertEqual(tuple(map(str.lower, hyps6[0])), ("postharvest loss reduction", "benefit"))
        self.assertEqual(tuple(map(str.lower, hyps6[1])), ("food increase", "benefit"))
        self.assertEqual(tuple(map(str.lower, hyps6[2])), ("soil fertility improvement", "benefit"))

        # H'1
        hyps7 = h.find_hyponyms("Fruits, i.e. , apples, bananas, oranges and peaches.")
        self.assertEqual(tuple(map(str.lower, hyps7[0])), ("apple", "fruit"))
        self.assertEqual(tuple(map(str.lower, hyps7[1])), ("banana", "fruit"))
        self.assertEqual(tuple(map(str.lower, hyps7[2])), ("orange", "fruit"))
        self.assertEqual(tuple(map(str.lower, hyps7[3])), ("peach", "fruit"))

        hyps7 = h.find_hyponyms("Fruits, e.g. apples, bananas, oranges and peaches.")
        self.assertEqual(tuple(map(str.lower, hyps7[0])), ("apple", "fruit"))
        self.assertEqual(tuple(map(str.lower, hyps7[1])), ("banana", "fruit"))
        self.assertEqual(tuple(map(str.lower, hyps7[2])), ("orange", "fruit"))
        self.assertEqual(tuple(map(str.lower, hyps7[3])), ("peach", "fruit"))

        # H'2

        hyps10 = h.find_hyponyms("Fruits (e.g. apples, bananas, oranges and peaches.)")
        self.assertEqual(tuple(map(str.lower, hyps10[0])), ("apple", "fruit"))
        self.assertEqual(tuple(map(str.lower, hyps10[1])), ("banana", "fruit"))
        self.assertEqual(tuple(map(str.lower, hyps10[2])), ("orange", "fruit"))
        self.assertEqual(tuple(map(str.lower, hyps10[3])), ("peach", "fruit"))

        hyps10 = h.find_hyponyms("Fruits (i.e. apples, bananas, oranges and peaches.)")
        self.assertEqual(tuple(map(str.lower, hyps10[0])), ("apple", "fruit"))
        self.assertEqual(tuple(map(str.lower, hyps10[1])), ("banana", "fruit"))
        self.assertEqual(tuple(map(str.lower, hyps10[2])), ("orange", "fruit"))
        self.assertEqual(tuple(map(str.lower, hyps10[3])), ("peach", "fruit"))

        # H'3
        hyps8 = h.find_hyponyms("Fruits, for example apples, bananas, oranges and peaches.")
        self.assertEqual(tuple(map(str.lower, hyps8[0])), ("apple", "fruit"))
        self.assertEqual(tuple(map(str.lower, hyps8[1])), ("banana", "fruit"))
        self.assertEqual(tuple(map(str.lower, hyps8[2])), ("orange", "fruit"))
        self.assertEqual(tuple(map(str.lower, hyps8[3])), ("peach", "fruit"))

        # H'4
        hyps9 = h.find_hyponyms("Fruits, which may include apples, bananas, oranges and peaches.")
        self.assertEqual(tuple(map(str.lower, hyps9[0])), ("apple", "fruit"))
        self.assertEqual(tuple(map(str.lower, hyps9[1])), ("banana", "fruit"))
        self.assertEqual(tuple(map(str.lower, hyps9[2])), ("orange", "fruit"))
        self.assertEqual(tuple(map(str.lower, hyps9[3])), ("peach", "fruit"))


if __name__ == '__main__':
    unittest.main(argv=['first-arg-is-ignored'], exit=False)

E
ERROR: test_hyponym_finder (__main__.TestHearstPatterns)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-14-71766177d2a3>", line 47, in test_hyponym_finder
    self.assertEqual(tuple(map(str.lower, hyps7[0])), ("apple", "fruit"))
IndexError: list index out of range

----------------------------------------------------------------------
Ran 1 test in 1.191s

FAILED (errors=1)


In [5]:
import unittest

class TestHearstPatterns(unittest.TestCase):

    def test_hyponym_finder(self):
        h = HearstPatterns()
        hyps1 =  h.find_hyponyms("Forty-four percent of patients with uveitis had one or more identifiable signs or symptoms, such as red eye, ocular pain, visual acuity, or photophobia, in order of decreasing frequency.")
        self.assertEqual(hyps1[0], ("red eye", "symptom"))
        self.assertEqual(hyps1[1], ("ocular pain", "symptom"))
        self.assertEqual(hyps1[2], ("visual acuity", "symptom"))
        self.assertEqual(hyps1[3], ("photophobia", "symptom"))

        hyps2 = h.find_hyponyms("There are works by such authors as Herrick, Goldsmith, and Shakespeare.")
        self.assertEqual(hyps2[0], ("herrick", "author"))
        self.assertEqual(hyps2[1], ("goldsmith", "author"))
        self.assertEqual(hyps2[2], ("shakespeare", "author"))

        hyps3 = h.find_hyponyms("There were bruises, lacerations, or other injuries were not prevalent.")
        self.assertEqual(hyps3[0], ("bruise", "injury"))
        self.assertEqual(hyps3[1], ("laceration", "injury"))

        hyps4 =  h.find_hyponyms("common law countries, including Canada, Australia, and England enjoy toast.")
        self.assertEqual(hyps4[0], ("canada", "common law country"))
        self.assertEqual(hyps4[1], ("australia", "common law country"))
        self.assertEqual(hyps4[2], ("england", "common law country"))

        hyps5 = h.find_hyponyms("Many countries, especially France, England and Spain also enjoy toast.")
        self.assertEqual(hyps5[0], ("france", "country"))
        self.assertEqual(hyps5[1], ("england", "country"))
        self.assertEqual(hyps5[2], ("spain", "country"))

        hyps6 = h.find_hyponyms("There are such benefits as postharvest losses reduction, food increase and soil fertility improvement.")
        self.assertEqual(hyps6[0], ("postharvest loss reduction", "benefit"))
        self.assertEqual(hyps6[1], ("food increase", "benefit"))
        self.assertEqual(hyps6[2], ("soil fertility improvement", "benefit"))

if __name__ == '__main__':
    unittest.main(argv=['first-arg-is-ignored'], exit=False)

F
FAIL: test_hyponym_finder (__main__.TestHearstPatterns)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-5-88c88a93b418>", line 14, in test_hyponym_finder
    self.assertEqual(hyps2[0], ("herrick", "author"))
AssertionError: Tuples differ: ('Herrick', 'author') != ('herrick', 'author')

First differing element 0:
'Herrick'
'herrick'

- ('Herrick', 'author')
?   ^

+ ('herrick', 'author')
?   ^


----------------------------------------------------------------------
Ran 1 test in 1.183s

FAILED (failures=1)
