# spaCy Hearst Patterns
---

In this experiment we test the utility of Hearst Patterns for detecting the ingroup and outgroup of a text.

For this experiment spaCy matcher is used with code adapted from: https://github.com/mmichelsonIF/hearst_patterns_python/blob/master/hearstPatterns/hearstPatterns.py

Hypernym relations are semantic relationships between two concepts: C1 is a hypernym of C2 means that C1 categorizes C2 (e.g. “instrument” is a hypernym of “Piano”). For this research, the phrase, "America has enemies, such as Al Qaeda and the Taliban" would return the following '[('Al Qaeda', 'enemy'), ('the Taliban', 'enemy')]'. In this example, the categorising term 'enemy' is a hypernym of both 'Al Qaeda' and the 'Taliban'; conversely 'al Qaeda' and 'the Tabliban' are hyponyms of 'enemy'. Using this technique, hypernym terms could be classified as ingroup or outgroup and named entities identified as hyponym terms could be identified as either group.

## Setup the spaCy Pipeline

In [30]:
%%time

# class cna_pipe(object):

#     import spacy

#     def __init__(self):
        
#         self.nlp = spacy.load("en_core_web_md")
        
#         for component in self.nlp.pipe_names:
#             if component not in ['tagger', "parser", "ner"]:
#                 self.nlp.remove_pipe(component)
        
#         merge_ents = self.nlp.create_pipe("merge_entities")
#         self.nlp.add_pipe(merge_ents)
        
#     def __call__(self, text):
        
#         doc = self.nlp(text)
        
#         return doc
    
# nlp = cna_pipe()

import spacy

nlp = spacy.load("en_core_web_md")

for component in nlp.pipe_names:
    if component not in ['tagger', "parser", "ner"]:
        self.nlp.remove_pipe(component)

merge_ents = nlp.create_pipe("merge_entities")
nlp.add_pipe(merge_ents)



Wall time: 32.9 s


In [36]:
%%time
import os

class Orator:
    
    """ 
    This is the Orator object which refers to the person giving the speech
    """
    
    def __init__(self, ref = '', name = ''):
        
        self.ref = ref
        
        self.name = name
        
        self.filenames = []
        
        self.texts = []
        
    def __len__(self):
        return len(self.texts)

path  = r"C:\Users\Steve\OneDrive - University of Southampton\CulturalViolence\KnowledgeBases\Speeches"

# access the speeches directory
def initiate_dataset(filepath):
    
    """ 
    this function initiates the dataset by collating speech texts from the directory associated
    with each orator
    
    function returns a dict object with each entry refering to an orator
    
    format:
    
    {"surname" : Orator Object}
    
    """
    
    orators_dict = dict()
    
    for dirpath , dirnames, _ in os.walk(filepath): 
    
        # iterate through the folders in the speeches directory, which relates to each orator
        for orator_dir in dirnames: 
            # iniate orator object and add to orators dict()
            surname = orator_dir.split()[-1].lower()
            orators_dict[surname] = Orator(name  = orator_dir, ref = surname)


            # get the filenames in each orators folder
            for _, _, filenames in os.walk(os.path.join(dirpath, orator_dir)): 

                # iterate through the files
                for file in filenames: 

                    with open(os.path.join(dirpath, orator_dir, file), 'r') as text:

                        if os.path.splitext(file)[1] == ".txt" and (file[:8]).isnumeric(): #check whether file meets speech filename format requirement
                            orators_dict[surname].filenames.append(file)
                            orators_dict[surname].texts.append(text.read())
                            
    return orators_dict



for obj in initiate_dataset(path).values():
    print(obj.name, 'contains', len(obj), "speeches")

George Bush contains 15 speeches
Martin Luther King contains 5 speeches
Osama bin Laden contains 7 speeches
Wall time: 103 ms


In [32]:
%%time

# Hearst patterns take the form of (NP <predicate> (NP (and | or)?)+)

class hearst_patterns(object):
    
    """ Hearst Patterns is a class object used to detects hypernym relations to hyponyms in a text
    
    input: raw text
    returns: list of dict object with each entry all the hypernym-hyponym pairs of a text
    entry format: ["predicate" : [(hyponym, hypernym), (hyponym, hypernym), ..]]
    
    """
    
    import spacy    
    
    def __init__(self, extended=False):
        
        from spacy.matcher import Matcher
        
        # make the patterns easier to read
        hypernym = {"POS" : {"IN": ["NOUN", "PROPN"]}} 
        hyponym = {"POS" : {"IN": ["NOUN", "PROPN"]}}
        punct = {"IS_PUNCT": True, "OP": "?"}
        
        self.patterns = [
            
            # Included in each entry is the original regex pattern now adapted as spaCy patterns
            # Many of these patterns are in the same format. Nnext iteration of code will include an
            # automatic pattern generator for patterns of the same format.
            # these patterns need cleaning up and testing.
            
            # format for the dict entry of each pattern
            # {
            #  "label" : predicate, 
            #  "pattern" : spaCy pattern, 
            #  "posn" : first/last depending on whether the hypernym appears before its hyponym
            #  }
            
            {"label" : "such_as", "pattern" : [
#                 '(NP_\\w+ (, )?such as (NP_\\w+ ?(, )?(and |or )?)+)',
#                 'first'
                 hypernym, punct, {"LEMMA": "such"}, {"LEMMA": "as"}, hyponym
            ], "posn" : "first"},

            {"label" : "know_as", "pattern" : [
#                 '(NP_\\w+ (, )?know as (NP_\\w+ ?(, )?(and |or )?)+)', # added for this experiment
#                 'first'
                 hypernym, punct, {"LEMMA": "know"}, {"LEMMA": "as"}, hyponym
            ], "posn" : "first"},

            {"label" : "such_NOUN_as", "pattern" : [
#                 '(such NP_\\w+ (, )?as (NP_\\w+ ?(, )?(and |or )?)+)',
#                 'first'
                 {"LEMMA": "such"}, hypernym, punct, {"LEMMA": "as"}, hyponym
            ], "posn" : "first"},

            {"label" : "include", "pattern" : [
#                 '(NP_\\w+ (, )?include (NP_\\w+ ?(, )?(and |or )?)+)',
#                 'first'
                 hypernym, punct, {"LEMMA" : "include"}, hyponym
            ], "posn" : "first"},

            {"label" : "especially", "pattern" : [ ## problem - especially is merged as a modifier in to a noun phrase
#                 '(NP_\\w+ (, )?especially (NP_\\w+ ?(, )?(and |or )?)+)',
#                 'first'
                 hypernym, punct, {"LEMMA" : "especially"}, hyponym
            ], "posn" : "first"},
            
            {"label" : "and-or_other", "pattern" : [ ## problem - other is merged as a modifier in to a noun phrase
#                 '((NP_\\w+ ?(, )?)+(and |or )?other NP_\\w+)',
#                 'last'
                 hyponym, punct, {"DEP": "cc"}, {"LEMMA" : "other"}, hypernym
            ], "posn" : "last"},

            ]
    
        if extended:
            self.patterns.extend([
                
                {"label" : "which_may_include", "pattern" : [
#                     '(NP_\\w+ (, )?which may include (NP_\\w+ '
#                     '?(, )?(and |or )?)+)',
#                     'first'
                    hypernym, punct, {"LEMMA" : "which"}, {"LEMMA" : "may"}, {"LEMMA" : "include"}, hyponym
                ], "posn" : "first"},
                
                {"label" : "which_be_similar_to", "pattern" : [
#                     '(NP_\\w+ (, )?which be similar to (NP_\\w+ ? '
#                     '(, )?(and |or )?)+)',
#                     'first'
                    hypernym, punct, {"LEMMA" : "which"}, {"LEMMA" : "be"}, {"LEMMA" : "similar"}, {"LEMMA" : "to"}, hyponym
                ], "posn" : "first"},
                
                {"label" : "example_of_this_be", "pattern" : [
#                     '(NP_\\w+ (, )?example of this be (NP_\\w+ ? '
#                     '(, )?(and |or )?)+)',
#                     'first'
                    hypernym, punct, {"LEMMA" : "example"}, {"LEMMA" : "of"}, {"LEMMA" : "this"}, {"LEMMA" : "be"}, hyponym
                ], "posn" : "first"},
                
                {"label" : ",type", "pattern" : [
#                     '(NP_\\w+ (, )?type (NP_\\w+ ? (, )?(and |or )?)+)',
#                     'first'
                    hypernym, punct, {"LEMMA" : "type"}, punct, hyponym
                ], "posn" : "first"},
                
                {"label" : "mainly", "pattern" : [
#                     '(NP_\\w+ (, )?mainly (NP_\\w+ ? (, )?(and |or )?)+)',
#                     'first'
                    hypernym, punct, {"LEMMA" : "mainly"}, hyponym
                ], "posn" : "first"},
                
                {"label" : "mostly", "pattern" : [
#                     '(NP_\\w+ (, )?mostly (NP_\\w+ ? (, )?(and |or )?)+)',
#                     'first'
                    hypernym, punct, {"LEMMA" : "mostly"}, hyponym
                ], "posn" : "first"},
                
                {"label" : "notably", "pattern" : [
#                     '(NP_\\w+ (, )?notably (NP_\\w+ ? (, )?(and |or )?)+)',
#                     'first'
                    hypernym, punct, {"LEMMA" : "notably"}, hyponym
                ], "posn" : "first"},
                
                {"label" : "particularly", "pattern" : [
#                     '(NP_\\w+ (, )?particularly (NP_\\w+ ? '
#                     '(, )?(and |or )?)+)',
#                     'first'
                    hypernym, punct, {"LEMMA" : "particularly"}, hyponym
                ], "posn" : "first"},
                
                {"label" : "principally", "pattern" : [
#                     '(NP_\\w+ (, )?principally (NP_\\w+ ? (, )?(and |or )?)+)', - fuses in a noun phrase
#                     'first'
                    hypernym, punct, {"LEMMA" : "principally"}, hyponym
                ], "posn" : "first"},
                
                {"label" : "in_particular", "pattern" : [
#                     '(NP_\\w+ (, )?in particular (NP_\\w+ ? '
#                     '(, )?(and |or )?)+)',
#                     'first'
                    hypernym, punct, {"LEMMA" : "in"}, {"LEMMA" : "particular"}, hyponym
                ], "posn" : "first"},
                
                {"label" : "except", "pattern" : [
#                     '(NP_\\w+ (, )?except (NP_\\w+ ? (, )?(and |or )?)+)',
#                     'first'
                    hypernym, punct, {"LEMMA" : "except"}, hyponym
                ], "posn" : "first"},
                
                {"label" : "other_than", "pattern" : [
#                     '(NP_\\w+ (, )?other than (NP_\\w+ ? (, )?(and |or )?)+)',
#                     'first'
                    hypernym, punct, {"LEMMA" : "other"}, {"LEMMA" : "than"}, hyponym
                ], "posn" : "first"},
                
                {"label" : "eg", "pattern" : [
#                     '(NP_\\w+ (, )?e.g. (, )?(NP_\\w+ ? (, )?(and |or )?)+)',
#                     'first'
                    hypernym, punct, {"LEMMA" : {"IN" : ["e.g.", "eg"]}}, hyponym 
                ], "posn" : "first"},
                
#                 {"label" : "eg-ie", "pattern" : [ 
# #                     '(NP_\\w+ \\( (e.g.|i.e.) (, )?(NP_\\w+ ? (, )?(and |or )?)+' - need to understand this pattern better
# #                     '(\\. )?\\))',
# #                     'first'
#                     hypernym, punct, {"LEMMA" : {IN : ["e.g.", "i.e.", "eg", "ie"]}}, {"LEMMA" : "than"}, hyponym
#                 ]},

                {"label" : "ie", "pattern" : [
#                     '(NP_\\w+ (, )?i.e. (, )?(NP_\\w+ ? (, )?(and |or )?)+)',
#                     'first'
                    hypernym, punct, {"LEMMA" : {"IN" : ["i.e.", "ie"]}}, hyponym 
                ], "posn" : "first"},
                
                {"label" : "for_example", "pattern" : [
#                     '(NP_\\w+ (, )?for example (, )?'
#                     '(NP_\\w+ ?(, )?(and |or )?)+)',
#                     'first'
                    hypernym, punct, {"LEMMA" : "for"}, {"LEMMA" : "example"}, punct, hyponym
                ], "posn" : "first"},
                
                {"label" : "example_of_be", "pattern" : [
#                     'example of (NP_\\w+ (, )?be (NP_\\w+ ? '
#                     '(, )?(and |or )?)+)',
#                     'first'
                    {"LEMMA" : "example"}, {"LEMMA" : "of"}, hypernym, punct, {"LEMMA" : "be"}, hyponym
                ], "posn" : "first"},
                
                {"label" : "like", "pattern" : [
#                     '(NP_\\w+ (, )?like (NP_\\w+ ? (, )?(and |or )?)+)',
#                     'first'
                    hypernym, punct, {"LEMMA" : "like"}, hyponym,
                ], "posn" : "first"},

                # repeat of such_as pattern in primary patterns???
#                     'such (NP_\\w+ (, )?as (NP_\\w+ ? (, )?(and |or )?)+)',
#                     'first'
                
                    {"label" : "whether", "pattern" : [
#                     '(NP_\\w+ (, )?whether (NP_\\w+ ? (, )?(and |or )?)+)',
#                     'first'
                    hypernym, punct, {"LEMMA" : "whether"}, hyponym
                ], "posn" : "first"},
                
                {"label" : "compare_to", "pattern" : [
#                     '(NP_\\w+ (, )?compare to (NP_\\w+ ? (, )?(and |or )?)+)',
#                     'first'
                    hypernym, punct, {"LEMMA" : "compare"}, {"LEMMA" : "to"}, hyponym 
                ], "posn" : "first"},
                
                {"label" : "among_-PRON-", "pattern" : [
#                     '(NP_\\w+ (, )?among -PRON- (NP_\\w+ ? '
#                     '(, )?(and |or )?)+)',
#                     'first'
                    hypernym, punct, {"LEMMA" : "among"}, {"LEMMA" : "-PRON-"}, hyponym
                ], "posn" : "first"},
                
                {"label" : "for_instance", "pattern" : [
#                     '(NP_\\w+ (, )? (NP_\\w+ ? (, )?(and |or )?)+ '
#                     'for instance)',
#                     'first'
                    hypernym, punct, hyponym, {"LEMMA" : "for"}, {"LEMMA" : "instance"}
                ], "posn" : "first"},
                
                {"label" : "and-or_any_other", "pattern" : [
#                     '((NP_\\w+ ?(, )?)+(and |or )?any other NP_\\w+)',
#                     'last'
                    hyponym, punct, {"DEP": "cc"}, {"LEMMA" : "any"}, {"LEMMA" : "other"}, hypernym,
                ], "posn" : "last"},
                
                {"label" : "some_other", "pattern" : [
#                     '((NP_\\w+ ?(, )?)+(and |or )?some other NP_\\w+)',
#                     'last'
                    hyponym, punct, {"DEP": "cc", "OP" : "?"}, {"LEMMA" : "some"}, {"LEMMA" : "other"}, hypernym,
                ], "posn" : "last"},
                
                {"label" : "be_a", "pattern" : [
#                     '((NP_\\w+ ?(, )?)+(and |or )?be a NP_\\w+)',
#                     'last'
                    hyponym, punct, {"LEMMA" : "be"}, {"LEMMA" : "a"}, hypernym,
                ], "posn" : "last"},

                {"label" : "like_other", "pattern" : [
#                     '((NP_\\w+ ?(, )?)+(and |or )?like other NP_\\w+)',
#                     'last'
                    hyponym, punct, {"LEMMA" : "like"}, {"LEMMA" : "other"}, hypernym,
                ], "posn" : "last"},

                 {"label" : "one_of_the", "pattern" : [
#                     '((NP_\\w+ ?(, )?)+(and |or )?one of the NP_\\w+)',
#                     'last'
                    hyponym, punct, {"LEMMA" : "one"}, {"LEMMA" : "of"}, {"LEMMA" : "the"}, hypernym,
                ], "posn" : "last"},
                
                {"label" : "one_of_these", "pattern" : [
#                     '((NP_\\w+ ?(, )?)+(and |or )?one of these NP_\\w+)',
#                     'last'
                hyponym, punct, {"LEMMA" : "one"}, {"LEMMA" : "of"}, {"LEMMA" : "these"}, hypernym,
                ], "posn" : "last"},
                
                {"label" : "one_of_those", "pattern" : [
#                     '((NP_\\w+ ?(, )?)+(and |or )?one of those NP_\\w+)',
#                     'last'
                hyponym, punct, {"DEP": "cc", "OP" : "?"}, {"LEMMA" : "one"}, {"LEMMA" : "of"}, {"LEMMA" : "those"}, hypernym,
                ], "posn" : "last"},
                
                {"label" : "be_example_of", "pattern" : [
#                     '((NP_\\w+ ?(, )?)+(and |or )?be example of NP_\\w+)', added optional "an" to spaCy pattern for singular vs. plural
#                     'last'
                    hyponym, punct, {"LEMMA" : "be"}, {"LEMMA" : "an", "OP" : "?"}, {"LEMMA" : "example"}, {"LEMMA" : "of"}, hypernym
                ], "posn" : "last"},
               
                {"label" : "which_be_call", "pattern" : [
#                     '((NP_\\w+ ?(, )?)+(and |or )?which be call NP_\\w+)',
#                     'last'
                    hyponym, punct, {"LEMMA" : "which"}, {"LEMMA" : "be"}, {"LEMMA" : "call"}, hypernym
                ], "posn" : "last"},
#               
                {"label" : "which_be_name", "pattern" : [
#                     '((NP_\\w+ ?(, )?)+(and |or )?which be name NP_\\w+)',
#                     'last'
                    hyponym, punct, {"LEMMA" : "which"}, {"LEMMA" : "be"}, {"LEMMA" : "name"}, hypernym
                ], "posn" : "last"},
                    
                {"label" : "a_kind_of", "pattern" : [
#                     '((NP_\\w+ ?(, )?)+(and|or)? a kind of NP_\\w+)',
#                     'last'
                    hyponym, punct, {"LEMMA" : "a", "OP" : "?"}, {"LEMMA" : "kind"}, {"LEMMA" : "of"}, hypernym
                ], "posn" : "last"},
                
#                     '((NP_\\w+ ?(, )?)+(and|or)? kind of NP_\\w+)', - combined with above
#                     'last'
               
                {"label" : "form_of", "pattern" : [
#                     '((NP_\\w+ ?(, )?)+(and|or)? form of NP_\\w+)',
#                     'last'
                    hyponym, punct, {"LEMMA" : "a", "OP" : "?"}, {"LEMMA" : "form"}, {"LEMMA" : "of"}, hypernym
                ], "posn" : "last"},
                
                {"label" : "which_look_like", "pattern" : [
#                     '((NP_\\w+ ?(, )?)+(and |or )?which look like NP_\\w+)',
#                     'last'
                    hyponym, punct, {"LEMMA" : "which"}, {"LEMMA" : "look"}, {"LEMMA" : "like"}, hyponym
                ], "posn" : "last"},
                
                {"label" : "which_sound_like", "pattern" : [
#                     '((NP_\\w+ ?(, )?)+(and |or )?which sound like NP_\\w+)',
#                     'last'
                    hyponym, punct, {"LEMMA" : "which"}, {"LEMMA" : "sound"}, {"LEMMA" : "like"}, hypernym
                ], "posn" : "last"},
                
                {"label" : "type", "pattern" : [
#                     '((NP_\\w+ ?(, )?)+(and |or )? NP_\\w+ type)',
#                     'last'
                    hyponym, punct, {"LEMMA" : "type"}, hypernym
                ], "posn" : "last"},
                                
                {"label" : "compare_with", "pattern" : [
#                     '(compare (NP_\\w+ ?(, )?)+(and |or )?with NP_\\w+)',
#                     'last'
                    {"LEMMA" : "compare"}, hyponym, punct, {"LEMMA" : "with"}, hypernym
                ], "posn" : "last"},
                
                {"label" : "as", "pattern" : [
#                     '((NP_\\w+ ?(, )?)+(and |or )?as NP_\\w+)',
#                     'last'
                    hyponym, punct, {"LEMMA" : "as"}, hypernym
                ], "posn" : "last"},
                
                {"label" : "sort_of", "pattern" : [
#                     '((NP_\\w+ ?(, )?)+(and|or)? sort of NP_\\w+)',
#                     'last'
                    hyponym, punct, {"LEMMA" : "sort"}, {"LEMMA" : "of"}, hypernym
                ], "posn" : "last"},
                
            ]),        
        
        ## initiate matcher
        self.matcher = Matcher(nlp.vocab, validate = True)
        
        self.predicate_set = set()
        self.predicates = []
        self.first = []
        self.last = []

        for pattern in self.patterns:
            self.matcher.add(pattern["label"], None, pattern["pattern"])
            
            # gather list of predicate terms for the noun_chunk deconfliction
            self.predicate_set.update(pattern["label"].split('_'))
            self.predicates.append(pattern["label"].split('_'))
            
            # gather list of predicates where the hypernym appears first
            if pattern["posn"] == "first":
                self.first.append(pattern["label"])
                
            # gather list of predicates where the hypernym appears last
            if pattern["posn"] == "last":
                self.last.append(pattern["label"])
   
    def find_hyponyms(self, text):
        
        """
        this is the main function of the object
        
        follows logic of:
        1. checks whether text has been parsed
        2. pre-processing for noun_chunks
        3. generate matches
        4. create list of dict obje
        """
        
        #print(self.predicates)
        
        from spacy.tokens import Span
            
        pairs = [] # set up dictionary containing pairs
        
        if type(text) is spacy.tokens.doc.Doc:
            doc = text
        else:
            doc = nlp(text) # initiate doc 
        
        ## Pre-processing
        # there are some predicate terms, such as "particularly", "especially" and "some other" which are
        # merged with the noun phrase. Such terms are part of the pattern and become part of the
        # merged noun-chunk, consequently, they are not detected in by the matcher.
        # This pre-processing, therefore, walks through the noun_chunks of a doc object to remove those
        # predicate terms from eah noun_chunk and merges the result.
                
        #try:
        with doc.retokenize() as retokenizer:

        #iterate through the noun_chunks
            for chunk in doc.noun_chunks:

                attrs = {"tag": chunk.root.tag, "dep": chunk.root.dep}
                count = 0

                #iterate through all predicate terms.
                for predicate in self.predicates:

                    # iterate through the noun_chunk. If its first, second etc token match those of a
                    #predicate word or phrase, then add to count.
                    while count < len(predicate) and doc[chunk.start + count].lower_ == predicate[count]:
                        count += 1

                # Create a new noun_chunk based excluding the number of tokens detected as part of
                # a predicate phrase.
                #print("result: ", chunk, " becomes ", doc[chunk.start + count : chunk.end])
                retokenizer.merge(doc[chunk.start + count : chunk.end], attrs = attrs)
                #print(chunk.sent)
#         except:
#             print("failed at with")
                
        # Find matches in doc
        matches = self.matcher(doc)
        
        # If none are found then return None
        if not matches:
            return pairs

        for match_id, start, end in matches:
            predicate = nlp.vocab.strings[match_id]
            
            if predicate in self.last: # if the predicate is in the list where the hypernym is last
                hypernym = doc[end - 1]
                hyponym = doc[start]
            else:
                hypernym = doc[start] # if the predicate is in the list where the hypernym is first
                hyponym = doc[end - 1]

#             if predicate in list(pairs.keys()): #check for double entries
#                 continue
#             else:
                # crate dictionary object with the format:
                # pairs[predicate term based on pattern name] + [(hypernym, hyponym)] + [hyponym conjuncts (tokens linked by and | or)]
            
            pairs.append(dict({"predicate" : predicate, 
                               "pairs" : [(hypernym, hyponym)] + [(hypernym, token) for token in hyponym.conjuncts if token != hypernym],
                               "sent" : (hyponym.sent.text).strip()}))

        return pairs
    
h = hearst_patterns(extended = True)

docs = [
    "We are hunting for terrorist groups, particularly the Taliban and al Qaeda",
    "We are hunting for the IRA, ISIS, al Qaeda and some other terrorist groups, especially the Taliban, Web Scientists and particularly Southampton University"
]

def show_hyps(lst):
    
   
    for i, text in enumerate(lst):
        print('########### ', i)
        hypernyms = h.find_hyponyms(text)

        if hypernyms:
            for hypernym in hypernyms:
                print(hypernym["sent"])
                print(hypernym["predicate"], '=>', hypernym["pairs"])
                print()
                
        
    
        print('-----')

show_hyps(docs)

###########  0
We are hunting for terrorist groups, particularly the Taliban and al Qaeda
particularly => [(terrorist groups, the Taliban), (terrorist groups, al Qaeda)]

-----
###########  1
We are hunting for the IRA, ISIS, al Qaeda and some other terrorist groups, especially the Taliban, Web Scientists and particularly Southampton University
some_other => [(terrorist groups, al Qaeda), (terrorist groups, the IRA), (terrorist groups, ISIS)]

We are hunting for the IRA, ISIS, al Qaeda and some other terrorist groups, especially the Taliban, Web Scientists and particularly Southampton University
especially => [(terrorist groups, the Taliban), (terrorist groups, Web Scientists), (terrorist groups, Southampton University)]

-----
Wall time: 118 ms


In [33]:
%%time

# create a list of docs
docs = [
    "Forty-four percent of patients with uveitis had one or more identifiable signs or symptoms, such as red eye, ocular pain, visual acuity, or photophobia, in order of decreasing frequency.",
    "Other close friends, including Canada, Australia, Germany and France, have pledged forces as the operation unfolds.",
    "The evidence we have gathered all points to a collection of loosely affiliated terrorist organizations known as al Qaeda.",
    "Terrorist groups like al Qaeda depend upon the aid or indifference of governments.",
    "This new law that I sign today will allow surveillance of all communications used by terrorists, including e-mails, the Internet, and cell phones.",
    "From this day forward, any nation that continues to harbor or support terrorism will be regarded by the United States as a hostile regime.",
    "We are looking out for the Taliban, al Qaeda and other terrorist groups",
    "We are looking out for al Qaeda and other terrorist groups, especially the Taliban and the muppets"
]
show_hyps(docs)

###########  0
Forty-four percent of patients with uveitis had one or more identifiable signs or symptoms, such as red eye, ocular pain, visual acuity, or photophobia, in order of decreasing frequency.
such_as => [(symptoms, red eye), (symptoms, ocular pain), (symptoms, visual acuity), (symptoms, photophobia)]

-----
###########  1
Other close friends, including Canada, Australia, Germany and France, have pledged forces as the operation unfolds.
include => [(close friends, Canada), (close friends, Australia), (close friends, Germany), (close friends, France)]

Other close friends, including Canada, Australia, Germany and France, have pledged forces as the operation unfolds.
as => [(the operation, forces)]

-----
###########  2
The evidence we have gathered all points to a collection of loosely affiliated terrorist organizations known as al Qaeda.
know_as => [(loosely affiliated terrorist organizations, al Qaeda)]

-----
###########  3
Terrorist groups like al Qaeda depend upon the aid 

In [34]:
show_hyps(orators["bush"].texts)

###########  0
Our unity is a kinship of grief and a steadfast resolve to prevail against our enemies.
be_a => [(kinship, Our unity)]

America is a nation full of good fortune, with so much to be grateful for, but we are not spared from suffering.
be_a => [(nation, America)]

-----
###########  1
-----
###########  2
Like the good folks standing with me, the American people were appalled and outraged at last Tuesday's attacks.
like => [(hand, the good folks)]

-----
###########  3
We have seen it in the courage of passengers, who rushed terrorists to save others on the ground -- passengers like an exceptional man named Todd Beamer.
like => [(passengers, an exceptional man)]

The evidence we have gathered all points to a collection of loosely affiliated terrorist organizations known as al Qaeda.
know_as => [(loosely affiliated terrorist organizations, al Qaeda)]

The terrorists' directive commands them to kill Christians and Jews, to kill all Americans, and make no distinctions among mi

ZeroDivisionError: division by zero