## Relation Extraction with Distant Supervision

<center><img src="data/images/RE_KET.png" width=400 height=300 /></center>

<center><img src="data/images/RE_DS.png" width=500 height=300 /></center>

In [35]:
import spacy
import pandas as pd
import rdflib
from rdflib import Graph, Literal, RDF, URIRef
from rdflib.namespace import FOAF , XSD, Namespace

### Load Entity Linking Model

In [3]:
output_dir = './entity_linking_data'
nlp = spacy.load(output_dir + "/trained_el")

### Create Relation Extraction Dataset

In [26]:
dataset = pd.read_csv('./data/starwars_text_dataset.txt', delimiter='\n', header=None, error_bad_lines=False)
dataset.columns = ['text']
dataset

b'Skipping line 341: expected 1 fields, saw 2\nSkipping line 1209: expected 1 fields, saw 2\nSkipping line 3309: expected 1 fields, saw 2\nSkipping line 3615: expected 1 fields, saw 2\nSkipping line 7258: expected 1 fields, saw 2\nSkipping line 8720: expected 1 fields, saw 2\nSkipping line 9514: expected 1 fields, saw 2\nSkipping line 11246: expected 1 fields, saw 2\nSkipping line 12019: expected 1 fields, saw 2\nSkipping line 13450: expected 1 fields, saw 2\nSkipping line 15793: expected 1 fields, saw 2\nSkipping line 16472: expected 1 fields, saw 2\nSkipping line 18440: expected 1 fields, saw 2\nSkipping line 20491: expected 1 fields, saw 2\nSkipping line 21737: expected 1 fields, saw 2\nSkipping line 23946: expected 1 fields, saw 2\nSkipping line 24387: expected 1 fields, saw 2\nSkipping line 24930: expected 1 fields, saw 2\nSkipping line 25723: expected 1 fields, saw 2\nSkipping line 26509: expected 1 fields, saw 2\nSkipping line 27150: expected 1 fields, saw 2\nSkipping line 27152

Unnamed: 0,text
0,Luke Skywalker is a fictional character and th...
1,"Portrayed by Mark Hamill, Luke first appeared ..."
2,"Three decades later, Hamill returned as Luke i..."
3,"The Last Jedi (2017), and The Rise of Skywalke..."
4,He reprised the role in The Mandalorian episod...
...,...
25145,References ==
25146,==
25147,External links ==
25148,Supreme Leader Snoke in the StarWars.com Databank


### Get Relation of Two Entities from KB

Example: (Luke Skywalker, Anakin Skywalker)

In [110]:
graph = rdflib.Graph()
graph.parse('./data/starwars-data.ttl', format='turtle')

<Graph identifier=N6018cec5c4aa461b83c1b07d5b2f7fcc (<class 'rdflib.graph.Graph'>)>

Find path between Luke Skywalker (https://swapi.co/resource/human/1) and Anakin Skywalker (https://swapi.co/resource/human/11)

In [105]:
query_str = """
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX voc: <https://swapi.co/vocabulary/>
    PREFIX xml: <http://www.w3.org/XML/1998/namespace>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    PREFIX human: <https://swapi.co/resource/human/>


    SELECT ?s ?p ?o
    WHERE {   
        ?s ?p ?o.
        VALUES(?s ?p) {
            (human:11 rdfs:label)
            (human:1 rdfs:label)
        }
        
    }
"""
res = graph.query(query_str)
list(res)

[(rdflib.term.URIRef('https://swapi.co/resource/human/1'),
  rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#label'),
  rdflib.term.Literal('Luke Skywalker', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string'))),
 (rdflib.term.URIRef('https://swapi.co/resource/human/11'),
  rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#label'),
  rdflib.term.Literal('Anakin Skywalker', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')))]

In [112]:
query_str = """
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX voc: <https://swapi.co/vocabulary/>
    PREFIX xml: <http://www.w3.org/XML/1998/namespace>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    PREFIX human: <https://swapi.co/resource/human/>


    SELECT ?s ?p ?o
    WHERE {   
        ?s ?p ?o.
        VALUES(?s) {
            (human:11)
        }
        FILTER isIRI(?o)
    }
"""
res = graph.query(query_str)
set(r[-1] for r in res)

{rdflib.term.URIRef('http://commons.wikimedia.org/wiki/Special:FilePath/Anakin%20Skywalker%20costume%20Retouched.jpg'),
 rdflib.term.URIRef('http://www.wikidata.org/entity/Q51752'),
 rdflib.term.URIRef('https://swapi.co/resource/film/4'),
 rdflib.term.URIRef('https://swapi.co/resource/film/5'),
 rdflib.term.URIRef('https://swapi.co/resource/film/6'),
 rdflib.term.URIRef('https://swapi.co/resource/planet/1'),
 rdflib.term.URIRef('https://swapi.co/resource/starship/39'),
 rdflib.term.URIRef('https://swapi.co/resource/starship/59'),
 rdflib.term.URIRef('https://swapi.co/resource/starship/65'),
 rdflib.term.URIRef('https://swapi.co/resource/vehicle/44'),
 rdflib.term.URIRef('https://swapi.co/resource/vehicle/46'),
 rdflib.term.URIRef('https://swapi.co/vocabulary/Character'),
 rdflib.term.URIRef('https://swapi.co/vocabulary/Human')}

> Degree 2 any human will be connected to any human with human type (predicate)

> RDF to LPG

In [None]:
PREFIX : <http://graphtheory/node/>
PREFIX node: <http://graphtheory/node/>

ASK { node:1 :hasNeighbor* node:2 }

In [73]:
query_str = """
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX voc: <https://swapi.co/vocabulary/>
    PREFIX xml: <http://www.w3.org/XML/1998/namespace>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    PREFIX human: <https://swapi.co/resource/human/>
    
    ASK {
      human:11 ((<>|!<>)|^(<>|!<>))* human:1
    }
"""
res = graph.query(query_str)
list(res)

[True]

In [102]:
query_str = """
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX voc: <https://swapi.co/vocabulary/>
    PREFIX xml: <http://www.w3.org/XML/1998/namespace>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    PREFIX human: <https://swapi.co/resource/human/>
    
    SELECT ?u ?p ?v
    WHERE {
        human:11 ((voc:desc|!voc:desc)|^(voc:desc|!voc:desc))* ?u .
        ?u ?p ?v .
        ?v ((voc:desc|!voc:desc)|^(voc:desc|!voc:desc))* human:1 .
    }
"""
res = graph.query(query_str)
list(res)

[(rdflib.term.URIRef('https://swapi.co/resource/planet/1'),
  rdflib.term.URIRef('https://swapi.co/vocabulary/resident'),
  rdflib.term.URIRef('https://swapi.co/resource/human/1')),
 (rdflib.term.URIRef('https://swapi.co/resource/film/1'),
  rdflib.term.URIRef('https://swapi.co/vocabulary/character'),
  rdflib.term.URIRef('https://swapi.co/resource/human/1')),
 (rdflib.term.URIRef('https://swapi.co/resource/film/3'),
  rdflib.term.URIRef('https://swapi.co/vocabulary/character'),
  rdflib.term.URIRef('https://swapi.co/resource/human/1')),
 (rdflib.term.URIRef('https://swapi.co/resource/film/6'),
  rdflib.term.URIRef('https://swapi.co/vocabulary/character'),
  rdflib.term.URIRef('https://swapi.co/resource/human/1')),
 (rdflib.term.URIRef('https://swapi.co/resource/starship/12'),
  rdflib.term.URIRef('https://swapi.co/vocabulary/pilot'),
  rdflib.term.URIRef('https://swapi.co/resource/human/1')),
 (rdflib.term.URIRef('https://swapi.co/resource/starship/22'),
  rdflib.term.URIRef('https://

In [99]:
query_str = """
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX voc: <https://swapi.co/vocabulary/>
    PREFIX xml: <http://www.w3.org/XML/1998/namespace>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    PREFIX human: <https://swapi.co/resource/human/>
    
    SELECT ?p
    WHERE {
        human:11 ?p ?u
    }
"""
res = graph.query(query_str)
list(res)

[(rdflib.term.URIRef('https://swapi.co/vocabulary/homeworld')),
 (rdflib.term.URIRef('https://swapi.co/vocabulary/mass')),
 (rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#label')),
 (rdflib.term.URIRef('https://swapi.co/vocabulary/skinColor')),
 (rdflib.term.URIRef('https://swapi.co/vocabulary/birthYear')),
 (rdflib.term.URIRef('https://swapi.co/vocabulary/desc')),
 (rdflib.term.URIRef('https://swapi.co/vocabulary/vehicle')),
 (rdflib.term.URIRef('https://swapi.co/vocabulary/vehicle')),
 (rdflib.term.URIRef('https://swapi.co/vocabulary/height')),
 (rdflib.term.URIRef('https://swapi.co/vocabulary/starship')),
 (rdflib.term.URIRef('https://swapi.co/vocabulary/film')),
 (rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type')),
 (rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type')),
 (rdflib.term.URIRef('https://swapi.co/vocabulary/starship')),
 (rdflib.term.URIRef('https://swapi.co/vocabulary/film')),
 (rdflib.term.URIRef('https://swapi.co/vocab

In [None]:
SELECT  ?u
  WHERE
    { da:Noor ^(:p1|!:p1) ?u }

In [34]:
labels = []
for index, row in dataset.iterrows():
    text = dataset.iloc[index]['text']
    doc = nlp(text)
    print(text)
    for ent in doc.ents:
        print(ent.text, ent.label_)
    print()

Luke Skywalker is a fictional character and the main protagonist of the original film trilogy of the Star Wars franchise created by George Lucas.
Luke Skywalker PERSON
George Lucas PERSON

Portrayed by Mark Hamill, Luke first appeared in Star Wars (1977), and he returned in The Empire Strikes Back (1980) and Return of the Jedi (1983).
Mark Hamill PERSON
Luke PERSON
first ORDINAL
Star Wars ( WORK_OF_ART
1977 DATE
The Empire Strikes Back WORK_OF_ART
1980 DATE
Return of the Jedi WORK_OF_ART

Three decades later, Hamill returned as Luke in the Star Wars sequel trilogy, appearing in all three films: The Force Awakens (2015),
Three decades later DATE
Hamill PERSON
Luke PERSON
three CARDINAL
The Force Awakens WORK_OF_ART
2015 DATE

The Last Jedi (2017), and The Rise of Skywalker (2019).
2017 DATE
The Rise of Skywalker WORK_OF_ART
2019 DATE

He reprised the role in The Mandalorian episode "Chapter 16:
Chapter 16 LAW

The Rescue" (2020), voicing the character that was portrayed by a body double

KeyboardInterrupt: 

In [None]:
text = 'Skywalker, also known as Darth Vader, is a fictional character in the Star Wars franchise'
doc = nlp(text)
for ent in doc.ents:
    print(ent.text, ent.label_, ent.kb_id_, str(label_to_name.get(ent.kb_id_)))