# Relation Extraction

## MinIE: Open Information Extraction system

##### upgraded corenlp version, added dependencies to pom.xml
> https://github.com/uma-pi1/minie, https://www.aclweb.org/anthology/D17-1278/

In [21]:
import os
os.environ['CLASSPATH'] = '../../miniepy/minie-0.0.1-SNAPSHOT.jar'
from miniepy import *

minie = MinIE()

In [31]:
minie.get_propositions(text)[0].subject

'Batman'

In [22]:
text = "The Joker believes that the hero Batman was not actually born in foggy Gotham City."
triples = [p.triple for p in minie.get_propositions(text)]

In [3]:
print("Original text:")
print('\t{}\n'.format(text))

print("Extracted triples:")
for t in triples:
    print("\t{}".format(t))

Original text:
	The Joker believes that the hero Batman was not actually born in foggy Gotham City.

Extracted triples:
	('Batman', 'is', 'hero')
	('Batman', 'was born in', 'foggy Gotham City')


### Larger Text

In [4]:
starwars_text = 'Anakin Skywalker , is a fictional character in the Star Wars franchise. Anakin Skywalker appears in the original film trilogy as a pivotal antagonist whose actions drive the plot, while Anakin Skywalker past as Anakin Skywalker and the story of Anakin Skywalker corruption are central to the narrative of the original film trilogy. Anakin Skywalker was created by George Lucas and has been portrayed by numerous actors. Anakin Skywalker appearances span the first six Star Wars films, as well as Rogue One, and Anakin Skywalker character is heavily referenced in Star Wars: The Force Awakens. Anakin Skywalker is also an important character in the Star Wars expanded universe of television series, video games, novels, literature and comic books. Originally a Jedi who was prophesied to bring balance to the Force, Anakin Skywalker falls to the dark side of the Force and serves the evil Galactic Empire at the right hand of Anakin Skywalker Sith master, Emperor Palpatine( also known as Darth Sidious) . '
    
triples = [p.triple for p in minie.get_propositions(starwars_text)]

print("Original text:")
print('\t{}\n'.format(starwars_text))

print("Extracted triples:")
for t in triples:
    print("\t{}".format(t))

Original text:
	Anakin Skywalker , is a fictional character in the Star Wars franchise. Anakin Skywalker appears in the original film trilogy as a pivotal antagonist whose actions drive the plot, while Anakin Skywalker past as Anakin Skywalker and the story of Anakin Skywalker corruption are central to the narrative of the original film trilogy. Anakin Skywalker was created by George Lucas and has been portrayed by numerous actors. Anakin Skywalker appearances span the first six Star Wars films, as well as Rogue One, and Anakin Skywalker character is heavily referenced in Star Wars: The Force Awakens. Anakin Skywalker is also an important character in the Star Wars expanded universe of television series, video games, novels, literature and comic books. Originally a Jedi who was prophesied to bring balance to the Force, Anakin Skywalker falls to the dark side of the Force and serves the evil Galactic Empire at the right hand of Anakin Skywalker Sith master, Emperor Palpatine( also known

# Stanford OpenIE

> https://nlp.stanford.edu/software/openie.html

In [3]:
from openie import StanfordOpenIE

text = 'Darth Vader, also known by his birth name Anakin Skywalker, is a fictional character in the Star Wars franchise. Darth Vader appears in the original film trilogy as a pivotal antagonist whose actions drive the plot, while his past as Anakin Skywalker and the story of his corruption are central to the narrative of the prequel trilogy. The character was created by George Lucas and has been portrayed by numerous actors. His appearances span the first six Star Wars films, as well as Rogue One, and his character is heavily referenced in Star Wars: The Force Awakens. He is also an important character in the Star Wars expanded universe of television series, video games, novels, literature and comic books. Originally a Jedi who was prophesied to bring balance to the Force, he falls to the dark side of the Force and serves the evil Galactic Empire at the right hand of his Sith master, Emperor Palpatine (also known as Darth Sidious).'
print("Original text:")
print('\t{}\n'.format(text))

triples = []
with StanfordOpenIE() as client:
    for triple in client.annotate(text):
        triples.append(triple)

print("Extracted triples:")
for t in triples:
    print("\t{}".format(t))

Original text:
	Darth Vader, also known by his birth name Anakin Skywalker, is a fictional character in the Star Wars franchise. Darth Vader appears in the original film trilogy as a pivotal antagonist whose actions drive the plot, while his past as Anakin Skywalker and the story of his corruption are central to the narrative of the prequel trilogy. The character was created by George Lucas and has been portrayed by numerous actors. His appearances span the first six Star Wars films, as well as Rogue One, and his character is heavily referenced in Star Wars: The Force Awakens. He is also an important character in the Star Wars expanded universe of television series, video games, novels, literature and comic books. Originally a Jedi who was prophesied to bring balance to the Force, he falls to the dark side of the Force and serves the evil Galactic Empire at the right hand of his Sith master, Emperor Palpatine (also known as Darth Sidious).

Starting server with command: java -Xmx8G -cp

In [12]:
for t in triples:
    curr_triple = []
    for i in t:
        curr_triple.append(t[i])
    print(curr_triple)
#     print("\t{}".format(t))

['Darth Vader', 'also known by', 'his birth name Anakin Skywalker']
['Darth Vader', 'is', 'fictional character']
['fictional character', 'is in', 'Star Wars franchise']
['Darth Vader', 'is fictional character in', 'Star Wars franchise']
['Darth Vader', 'known by', 'his birth name Anakin Skywalker']
['Darth Vader', 'appears as', 'pivotal antagonist']
['Darth Vader', 'appears as', 'antagonist']
['Darth Vader', 'appears in', 'film trilogy']
['Darth Vader', 'appears', 'central to narrative of prequel trilogy']
['his past', 'are central to', 'narrative of prequel trilogy']
['Darth Vader', 'appears', 'central to narrative']
['Darth Vader', 'appears', 'central']
['his past', 'are central to', 'narrative']
['actions', 'drive', 'plot']
['his past', 'are', 'central']
['Darth Vader', 'appears in', 'original film trilogy']
['character', 'been portrayed by', 'actors']
['character', 'was', 'created']
['character', 'been', 'portrayed']
['character', 'was created by', 'George Lucas']
['character', 'be

# Hearst Patterns

In [1]:
from hearstPatterns.hearstPatterns import HearstPatterns
hp = HearstPatterns(extended=True)

In [2]:
hp.find_hyponyms('Many countries, especially France, England and Spain also enjoy toast.')

[('France', 'country'), ('England', 'country'), ('Spain', 'country')]

In [4]:
hp.find_hyponyms('Anakin Skywalker is a fictional character in the Star Wars franchise.')

[]

In [23]:
starwars_text

'Anakin Skywalker , is a fictional character in the Star Wars franchise. Anakin Skywalker appears in the original film trilogy as a pivotal antagonist whose actions drive the plot, while Anakin Skywalker past as Anakin Skywalker and the story of Anakin Skywalker corruption are central to the narrative of the original film trilogy. Anakin Skywalker was created by George Lucas and has been portrayed by numerous actors. Anakin Skywalker appearances span the first six Star Wars films, as well as Rogue One, and Anakin Skywalker character is heavily referenced in Star Wars: The Force Awakens. Anakin Skywalker is also an important character in the Star Wars expanded universe of television series, video games, novels, literature and comic books. Originally a Jedi who was prophesied to bring balance to the Force, Anakin Skywalker falls to the dark side of the Force and serves the evil Galactic Empire at the right hand of Anakin Skywalker Sith master, Emperor Palpatine( also known as Darth Sidio

In [24]:
hp.find_hyponyms(starwars_text)

NP_a_pivotal_antagonist
['NP_the_original_film_trilogy', 'NP_a_pivotal_antagonist']


[('the original film trilogy', 'a pivotal antagonist')]

# OpenNRE: Neural Relation Extraction

In [7]:
import pandas as pd
import spacy
import opennre

In [8]:
nlp = spacy.load("en_core_web_lg")
model = opennre.get_model('wiki80_bertentity_softmax')

2021-03-18 12:30:26,514 - root - INFO - Loading BERT pre-trained checkpoint.


In [9]:
model.infer({'text': 'He was the son of Máel Dúin mac Máele Fithrich, and grandson of the high king Áed Uaridnach (died 612).', 'h': {'pos': (18, 46)}, 't': {'pos': (78, 91)}})

('father', 0.9927453398704529)

In [18]:
dataset = pd.read_csv('./data/starwars_text_dataset.txt', delimiter='\n', header=None, error_bad_lines=False)
dataset

b'Skipping line 341: expected 1 fields, saw 2\nSkipping line 1209: expected 1 fields, saw 2\nSkipping line 3309: expected 1 fields, saw 2\nSkipping line 3615: expected 1 fields, saw 2\nSkipping line 7258: expected 1 fields, saw 2\nSkipping line 8720: expected 1 fields, saw 2\nSkipping line 9514: expected 1 fields, saw 2\nSkipping line 11246: expected 1 fields, saw 2\nSkipping line 12019: expected 1 fields, saw 2\nSkipping line 13450: expected 1 fields, saw 2\nSkipping line 15793: expected 1 fields, saw 2\nSkipping line 16472: expected 1 fields, saw 2\nSkipping line 18440: expected 1 fields, saw 2\nSkipping line 20491: expected 1 fields, saw 2\nSkipping line 21737: expected 1 fields, saw 2\nSkipping line 23946: expected 1 fields, saw 2\nSkipping line 24387: expected 1 fields, saw 2\nSkipping line 24930: expected 1 fields, saw 2\nSkipping line 25723: expected 1 fields, saw 2\nSkipping line 26509: expected 1 fields, saw 2\nSkipping line 27150: expected 1 fields, saw 2\nSkipping line 27152

Unnamed: 0,0
0,Luke Skywalker is a fictional character and th...
1,"Portrayed by Mark Hamill, Luke first appeared ..."
2,"Three decades later, Hamill returned as Luke i..."
3,"The Last Jedi (2017), and The Rise of Skywalke..."
4,He reprised the role in The Mandalorian episod...
...,...
25145,References ==
25146,==
25147,External links ==
25148,Supreme Leader Snoke in the StarWars.com Databank


In [19]:
for index, row in dataset.iterrows():
    text = dataset.iloc[index][0]
    doc = nlp(text)
    
    print(text)
    for entity1 in doc.ents:
        for entity2 in doc.ents:
            if entity1 == entity2:
                continue
                
            span1 = (entity1.start_char, entity1.end_char)
            span2 = (entity2.start_char, entity2.end_char)
            relation_pred = model.infer({'text': text, 'h': {'pos': span1}, 't': {'pos': span2}})
            
            print('({}, {}, {})'.format(entity1, relation_pred, entity2))
    print()
#     if found1 is not None and found2 is not None:
#         relation_pred = re_model.infer({'text': text, 'h': {'pos': found2.span()}, 't': {'pos': found1.span()}})
#         print('Concepts: ({}, {}), Sentence: {}, Relation: {}'.format(concept1, concept2, text, relation_pred))            
#         print()

Luke Skywalker is a fictional character and the main protagonist of the original film trilogy of the Star Wars franchise created by George Lucas.
(Luke Skywalker, ('after a work by', 0.645527720451355), George Lucas)
(George Lucas, ('notable work', 0.8947811126708984), Luke Skywalker)

Portrayed by Mark Hamill, Luke first appeared in Star Wars (1977), and he returned in The Empire Strikes Back (1980) and Return of the Jedi (1983).
(Mark Hamill, ('characters', 0.2163618952035904), Luke)
(Mark Hamill, ('notable work', 0.34239792823791504), first)
(Mark Hamill, ('notable work', 0.8281146287918091), Star Wars ()
(Mark Hamill, ('participant of', 0.40250512957572937), 1977)
(Mark Hamill, ('notable work', 0.930324137210846), The Empire Strikes Back)
(Mark Hamill, ('participant of', 0.5931357741355896), 1980)
(Mark Hamill, ('notable work', 0.9134975671768188), Return of the Jedi)
(Luke, ('performer', 0.7715634107589722), Mark Hamill)
(Luke, ('said to be the same as', 0.6832870244979858), first

KeyboardInterrupt: 

In [3]:
len('Anakin Skywalker , is a fictional character in the Star Wars franchise.')

71

In [4]:
import re

string = 'Anakin Skywalker is a fictional character in the Star Wars franchise.'

a = re.search(r'Anakin Skywalker', string)
b = re.search(r'Star Wars', string)
print(a.span())
print(b.span())

(0, 16)
(49, 58)


In [10]:
out = model.infer({'text': 'Anakin Skywalker is a fictional character in the Star Wars franchise.', 'h': {'pos': a.span()}, 't': {'pos': b.span()}})
out

('part of', 0.6663140058517456)

In [11]:
type(out[0])

str

In [20]:
model.rel2id

{'place served by transport hub': 0,
 'mountain range': 1,
 'religion': 2,
 'participating team': 3,
 'contains administrative territorial entity': 4,
 'head of government': 5,
 'country of citizenship': 6,
 'original network': 7,
 'heritage designation': 8,
 'performer': 9,
 'participant of': 10,
 'position held': 11,
 'has part': 12,
 'location of formation': 13,
 'located on terrain feature': 14,
 'architect': 15,
 'country of origin': 16,
 'publisher': 17,
 'director': 18,
 'father': 19,
 'developer': 20,
 'military branch': 21,
 'mouth of the watercourse': 22,
 'nominated for': 23,
 'movement': 24,
 'successful candidate': 25,
 'followed by': 26,
 'manufacturer': 27,
 'instance of': 28,
 'after a work by': 29,
 'member of political party': 30,
 'licensed to broadcast to': 31,
 'headquarters location': 32,
 'sibling': 33,
 'instrument': 34,
 'country': 35,
 'occupation': 36,
 'residence': 37,
 'work location': 38,
 'subsidiary': 39,
 'participant': 40,
 'operator': 41,
 'characters

In [None]:
model._modules

### Relation Extraction from Cleaned Dataset

In [1]:
dataset = pd.read_csv('./dataset/starwars_text_dataset_cleaned.txt', delimiter='\n', header=None, error_bad_lines=False)
dataset

NameError: name 'pd' is not defined