# Week 2: Parsing, Relation Extration and Open Information Extraction
### COMP61332: Text Mining, Department of Computer Science, University of Manchester (Riza Batista-Navarro and Viktor Schlegel)


In this lab session, you will try out some Python code based on the **spaCy** library (https://spacy.io/) for the NLP tasks discussed in the Week 2 Lecture, as well as an application of NLP (Open Information Extraction or Open IE).
After this session, you should be able to:
- apply **part-of-speech (POS) tagging** on text
- apply **dependency parsing** on text
- develop rules for **extracting relations** from text
- develop rules for **extracting Open Information Extraction (Open IE) triples**
- explore and visualise knowledge extracted by Open IE in the form of a graph (optional)

You are provided with three text files (drawn from https://en.wikipedia.org/wiki/Timeline_of_historic_inventions), each containing a list of inventions (from the 1700s, 1800s and 1900s), that you can use for experimentation.


## Preparation of necessary packages

In [1]:
# Loading
!pip install spacy==3.0
!python -m spacy download en_core_web_sm

import spacy
from spacy.lang.en import English
from spacy.pipeline import Sentencizer


import en_core_web_sm

nlp = spacy.load('en_core_web_sm')

from spacy import displacy



Collecting spacy==3.0
  Downloading spacy-3.0.0-cp38-cp38-macosx_10_9_x86_64.whl (12.4 MB)
[K     |████████████████████████████████| 12.4 MB 5.9 MB/s eta 0:00:01
[?25hCollecting pydantic<1.8.0,>=1.7.1
  Downloading pydantic-1.7.3-cp38-cp38-macosx_10_9_x86_64.whl (2.4 MB)
[K     |████████████████████████████████| 2.4 MB 19.1 MB/s eta 0:00:01
[?25hCollecting thinc<8.1.0,>=8.0.0
  Downloading thinc-8.0.1-cp38-cp38-macosx_10_9_x86_64.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 23.2 MB/s eta 0:00:01
[?25hCollecting murmurhash<1.1.0,>=0.28.0
  Downloading murmurhash-1.0.5-cp38-cp38-macosx_10_9_x86_64.whl (18 kB)
Collecting srsly<3.0.0,>=2.4.0
  Downloading srsly-2.4.0-cp38-cp38-macosx_10_9_x86_64.whl (449 kB)
[K     |████████████████████████████████| 449 kB 25.7 MB/s eta 0:00:01
[?25hCollecting preshed<3.1.0,>=3.0.2
  Downloading preshed-3.0.5-cp38-cp38-macosx_10_9_x86_64.whl (105 kB)
[K     |████████████████████████████████| 105 kB 10.3 MB/s eta 0:00:01
Collecting

Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-3.0.0
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


## File loading

The function below takes as parameter the path to a file containing plain text.

In [2]:
import codecs

def load_file(path):
    file_text = ''
    file = codecs.open(path, 'r', encoding = 'utf-8')
    file_lines = file.readlines()
    for line in file_lines:
        # Text cleaning, remove any whitespace lines
        line = line.replace('\n','')
        file_text = file_text + line
    file.close()
    return file_text



## Sentence segmentation

The code below is the same as the one we used in Week 1. Customise the list of sentence delimiters if necessary.

In [3]:
# Create a new NLP pipeline,. Specifying English as the language of interest so that English models are loaded.
nlp = English()

config = {"punct_chars": ["."]}

# Add the component to the pipeline.
sentencizer = nlp.add_pipe('sentencizer', config=config)

# Load the contents of a text file; change the parameter to use another/your own text file.
text = load_file('1800s.txt')

# The following line applies the pipeline (so far only sentence segmentation) on the given text, and stores the result in doc.
annotations = nlp(text)

# Check the result of sentence segmentation.
sents_list = []
for sent in annotations.sents:
    sents_list.append(sent.text.strip())
    
# Check how many sentences were produced
print('Number of sentences: ', len(sents_list))

print('SENTENCE NO.\tSENTENCE:')
for i, sent in enumerate(sents_list):
    print(i, '\t', sent)



Number of sentences:  103
SENTENCE NO.	SENTENCE:
0 	 Alessandro Volta invents the voltaic pile, an early form of battery in Italy, based on previous works by Luigi Galvani.
1 	 Humphry Davy invents the arc lamp (exact date unclear; not practical as a light source until the invention of efficient electric generators).Friedrich Sertürner discovers morphine as the first active alkaloid extracted from the opium poppy plant.
2 	 Richard Trevithick invents the steam locomotive.
3 	 Hanaoka Seishū creates tsūsensan, the first modern general anesthetic.
4 	 Nicéphore Niépce invents the first internal combustion engine capable of doing useful work.
5 	 François Isaac de Rivaz designs the first automobile powered by an internal combustion engine fuelled by hydrogen.
6 	 Robert Fulton expands water transportation and trade with the workable steamboat.
7 	 Nicolas Appert invents the canning process for food.
8 	 Friedrich Koenig invents the first powered printing press, which was also the first to

## Dependency parsing (with in-built POS tagging)

### One example for easier visualisation/debugging

The code below applies a dependency parser on a sentence. For each token, the following attributes are printed: 
- token.text (the token text)
- token.lemma_ (the base form of the token)
- token.pos_ (the POS tag according to the Universal Dependencies scheme; see https://universaldependencies.org/u/pos/)
- token.tag_ (the POS tag according to the Penn Treebank; see https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
- token_dep_ (the dependency type)
- child.text:child.dep_ (a list of dependents; the text and dependency type are displayed for each dependent)

Moreoever, a **visualisation** of the tree is displayed, which can be helpful later, when you try to write your own rules.

In [4]:
# Example 1
import spacy
from spacy import displacy
nlp = spacy.load('en_core_web_sm')

doc = nlp("Nicolas Appert invents the canning process for food.")
for token in doc:
    print(token.text + '\t' + token.lemma_ + '\t' + token.pos_ + '\t' + token.tag_ + '\t' + token.dep_ + '\t' + str([child.text + ':' + child.dep_ for child in token.children]))

displacy.render(doc, style='dep', jupyter=True)

Nicolas	Nicolas	PROPN	NNP	compound	[]
Appert	Appert	PROPN	NNP	nsubj	['Nicolas:compound']
invents	invent	VERB	VBZ	ROOT	['Appert:nsubj', 'process:dobj', '.:punct']
the	the	DET	DT	det	[]
canning	canning	NOUN	NN	compound	[]
process	process	NOUN	NN	dobj	['the:det', 'canning:compound', 'for:prep']
for	for	ADP	IN	prep	['food:pobj']
food	food	NOUN	NN	pobj	[]
.	.	PUNCT	.	punct	[]


### <font color='red'>Below, write down any observations you have based on the results of dependency parsing. For example, what kind of dependencies should you follow if you want to reach names of inventors or inventions from the main verb of a sentence?</font>
<br>
<br>
<br>
<br>
<br>
<br>

## Relation extraction

### Extracting relations using rules that process the results of dependency parsing

The code below uses very basic rules to extract **inventor-invention relations**.

In [5]:
# Create a class that will store every inventor-invention relation that we extract
class Relation:
    inventor = ''
    invention = ''

In [6]:
relations = []

for sent in sents_list:
    doc = nlp(sent)
    for token in doc:
        # For now, check if the root of the tree is a verb that is a variant of "invent"
        if token.dep_ == 'ROOT' and token.lemma_ == 'invent':
            r = Relation()
            dependents = token.children
            for d in dependents:
                if d.dep_ == 'dobj':
                    r.invention = d.text
                elif d.dep_ == 'nsubj':
                    r.inventor = d.text
            if r.inventor != '' and r.invention != '':
                print('Sentence:', sent)
                print(r.inventor, '-', r.invention)
                relations.append(r)

for r in relations:
    print(r.inventor, '-', r.invention)

Sentence: Alessandro Volta invents the voltaic pile, an early form of battery in Italy, based on previous works by Luigi Galvani.
Volta - pile
Sentence: Humphry Davy invents the arc lamp (exact date unclear; not practical as a light source until the invention of efficient electric generators).Friedrich Sertürner discovers morphine as the first active alkaloid extracted from the opium poppy plant.
Davy - lamp
Sentence: Richard Trevithick invents the steam locomotive.
Trevithick - locomotive
Sentence: Nicéphore Niépce invents the first internal combustion engine capable of doing useful work.
Niépce - engine
Sentence: Nicolas Appert invents the canning process for food.
Appert - process
Sentence: Friedrich Koenig invents the first powered printing press, which was also the first to use a cylinder.
Koenig - press
Sentence: James Fox invents the modern planing machine, though Matthew Murray of Leeds and Richard Roberts of Manchester have also been credited at times with its invention.
Fox -

### <font color='red'>The code below is the same as the one above. How will you extend the rules in order to improve the extracted relations? For example: (1) how can you handle verbs that are synonymous to "invent"? (2) How will you handle sentences written in the passive voice, e.g., "X was invented by Y"? </font>
### <font color='red'>You can test your idea and code using the above sample code "Example 1" with the sentence "The canning process for food was invented by Nicolas Appert.".</font>

In [7]:
relations = []

for sent in sents_list:
    doc = nlp(sent)
    for token in doc:
        # For now, check if the root of the tree is a verb that is a variant of "invent"
        if token.dep_ == 'ROOT' and token.lemma_ == 'invent':
            r = Relation()
            dependents = token.children
            for d in dependents:
                if d.dep_ == 'dobj':
                    r.invention = d.text
                elif d.dep_ == 'nsubj':
                    r.inventor = d.text
            if r.inventor != '' and r.invention != '':
                print('Sentence:', sent)
                print(r.inventor, '-', r.invention)
                relations.append(r)

for r in relations:
    print(r.inventor, '-', r.invention)

Sentence: Alessandro Volta invents the voltaic pile, an early form of battery in Italy, based on previous works by Luigi Galvani.
Volta - pile
Sentence: Humphry Davy invents the arc lamp (exact date unclear; not practical as a light source until the invention of efficient electric generators).Friedrich Sertürner discovers morphine as the first active alkaloid extracted from the opium poppy plant.
Davy - lamp
Sentence: Richard Trevithick invents the steam locomotive.
Trevithick - locomotive
Sentence: Nicéphore Niépce invents the first internal combustion engine capable of doing useful work.
Niépce - engine
Sentence: Nicolas Appert invents the canning process for food.
Appert - process
Sentence: Friedrich Koenig invents the first powered printing press, which was also the first to use a cylinder.
Koenig - press
Sentence: James Fox invents the modern planing machine, though Matthew Murray of Leeds and Richard Roberts of Manchester have also been credited at times with its invention.
Fox -

### <font color='red'>Summarise below how you extended the code above to improve the extracted relations. In writing your new rules, what other verbs and/or sentence constructions did you consider?</font>
<br>
<br>
<br>
<br>
<br>
<br>

## Open Information Extraction

### Adapting the code above to generate ARG1-PREDICATE-ARG2 triples instead of relations

We can easily adapt the relation extracted code above to extract **ARG1-PREDICATE-ARG2 triples** (for Open Information Extraction) instead of relations. Note that for Open IE, we are interested in any kind of triples (not just those pertaining to inventors and their inventions).

In [8]:
# Example 2
# Create a class that will store every ARG1-PREDICATE-ARG2 triple that we extract
class Triple:
    arg1 = ''
    predicate = ''
    arg2 = ''

In [9]:
triples = []

for sent in sents_list:
    doc = nlp(sent)
    for token in doc:
        # For now, assume that the root of the tree is the predicate
        if token.dep_ == 'ROOT':
            t = Triple()
            # Store the lemmatised form (to normalise)
            t.predicate = token.lemma_
            dependents = token.children
            for d in dependents:
                if d.dep_ == 'dobj':
                    t.arg2 = d.text
                elif d.dep_ == 'nsubj':
                    t.arg1 = d.text
            if t.arg1 != '' and t.arg2 != '':
                print('Sentence:', sent)
                print(t.arg1, '-', t.predicate, '-', t.arg2)
                triples.append(t)

for t in triples:
    print(t.arg1, '-', t.predicate, '-', t.arg2)


Sentence: Alessandro Volta invents the voltaic pile, an early form of battery in Italy, based on previous works by Luigi Galvani.
Volta - invent - pile
Sentence: Humphry Davy invents the arc lamp (exact date unclear; not practical as a light source until the invention of efficient electric generators).Friedrich Sertürner discovers morphine as the first active alkaloid extracted from the opium poppy plant.
Davy - invent - lamp
Sentence: Richard Trevithick invents the steam locomotive.
Trevithick - invent - locomotive
Sentence: Hanaoka Seishū creates tsūsensan, the first modern general anesthetic.
Seishū - create - anesthetic
Sentence: Nicéphore Niépce invents the first internal combustion engine capable of doing useful work.
Niépce - invent - engine
Sentence: François Isaac de Rivaz designs the first automobile powered by an internal combustion engine fuelled by hydrogen.
Rivaz - design - automobile
Sentence: Robert Fulton expands water transportation and trade with the workable steambo

Sentence: James Blyth invents the first wind turbine used for generating electricity.
Blyth - invent - turbine
Sentence: John Stewart MacArthur, working in collaboration with brothers Dr. Robert and Dr. William Forrest develops the process of gold cyanidation.
Robert - develop - process
Sentence: John J. Loud invents the ballpoint pen.
Loud - invent - pen
Sentence: Heinrich Hertz publishes a conclusive proof of James Clerk Maxwell's electromagnetic theory in experiments that also demonstrate the existence of radio waves.
Hertz - publish - proof
Sentence: Frédéric Swarts invents the first chlorofluorocarbons to be applied as refrigerant.
Swarts - invent - chlorofluorocarbons
Sentence: Clément Ader invents the first aircraft, airplane, fly machine called Eole (aircraft) or Ader ÉoleWhitcomb Judson invents the zipper.
Ader - invent - aircraft
Sentence: Léon Bouly invents the cinematograph.
Bouly - invent - cinematograph
Sentence: Rudolf Diesel invents the diesel engine (although Herbert A

### <font color='red'>The code below is the same as the one above. How will you extend it to improve the extracted triples? For example, can you try to extract full names of people (rather than just their last names)?</font>

In [10]:
triples = []

for sent in sents_list:
    doc = nlp(sent)
    for token in doc:
        # For now, assume that the root of the tree is the predicate
        if token.dep_ == 'ROOT':
            t = Triple()
            # Store the lemmatised form (to normalise)
            t.predicate = token.lemma_
            dependents = token.children
            for d in dependents:
                if d.dep_ == 'dobj':
                    t.arg2 = d.text
                elif d.dep_ == 'nsubj':
                    t.arg1 = d.text
            if t.arg1 != '' and t.arg2 != '':
                print('Sentence:', sent)
                print(t.arg1, '-', t.predicate, '-', t.arg2)
                triples.append(t)

for t in triples:
    print(t.arg1, '-', t.predicate, '-', t.arg2)


Sentence: Alessandro Volta invents the voltaic pile, an early form of battery in Italy, based on previous works by Luigi Galvani.
Volta - invent - pile
Sentence: Humphry Davy invents the arc lamp (exact date unclear; not practical as a light source until the invention of efficient electric generators).Friedrich Sertürner discovers morphine as the first active alkaloid extracted from the opium poppy plant.
Davy - invent - lamp
Sentence: Richard Trevithick invents the steam locomotive.
Trevithick - invent - locomotive
Sentence: Hanaoka Seishū creates tsūsensan, the first modern general anesthetic.
Seishū - create - anesthetic
Sentence: Nicéphore Niépce invents the first internal combustion engine capable of doing useful work.
Niépce - invent - engine
Sentence: François Isaac de Rivaz designs the first automobile powered by an internal combustion engine fuelled by hydrogen.
Rivaz - design - automobile
Sentence: Robert Fulton expands water transportation and trade with the workable steambo

Sentence: Léon Bouly invents the cinematograph.
Bouly - invent - cinematograph
Sentence: Rudolf Diesel invents the diesel engine (although Herbert Akroyd Stuart had experimented with compression ignition before Diesel).Guglielmo Marconi invents a system of wireless communication using radio waves.
Diesel - invent - engine
Sentence: Wilhelm Conrad Röntgen invented the first radiograph (xrays).Hans von Pechmann synthesizes polyethylene, now the most common plastic in the world.
Röntgen - invent - radiograph
Sentence: Waldemar Jungner invents the nickel– battery.
Jungner - invent - nickel
Volta - invent - pile
Davy - invent - lamp
Trevithick - invent - locomotive
Seishū - create - anesthetic
Niépce - invent - engine
Rivaz - design - automobile
Fulton - expand - transportation
Appert - invent - process
Koenig - invent - press
Clanny - pioneer - invention
Fox - invent - machine
Ronalds - build - telegraph
Stirling - invent - engine
Drais - invent - horse
Brunel - invent - shield
Blanchard -

### <font color='red'>Summarise below how you extended the code above to improve the extracted Open IE triples.</font>
<br>
<br>
<br>
<br>
<br>
<br>

## Comparison with the Stanford OpenIE tool

Stanford OpenIE is one of the more well-known Open Information Extraction tools. 
You can explore it in two ways:
- by accessing the online demo at https://corenlp.run/ (under Annotations, keep only "openie"); this will take only one sentence at a time
- by running the code below

In [14]:
# This might take a while
!pip install stanford_openie



In [15]:
# This example might take long time when running for the first time. Please wait for it to be completed.
from openie import StanfordOpenIE

s_triples = []

with StanfordOpenIE() as client:
    # uncomment the following four lines if running on Google Colab!
    #import os
    #from stanfordnlp.server import CoreNLPClient
    #os.environ['CORENLP_HOME'] = str('/root/stanfordnlp_resources/stanford-corenlp-full-2018-10-05')
    #client.client = CoreNLPClient(annotators=['openie'], memory='8G', endpoint='http://localhost:9876')
    
    for sent in sents_list:
        stanford_triples = client.annotate(sent)
        for st in stanford_triples:
            s_triple = Triple()
            s_triple.arg1 = st['subject']
            s_triple.predicate = st['relation']
            s_triple.arg2 = st['object']
            print('Sentence:', sent)
            print(s_triple.arg1, '-', s_triple.predicate, '-', s_triple.arg2)
            s_triples.append(s_triple)

for st in s_triples:
    print(st.arg1, '-', st.predicate, '-', st.arg2)

Downloading to /Users/guohuanjie/stanfordnlp_resources.

Extracting to /Users/guohuanjie/stanfordnlp_resources.
Starting server with command: java -Xmx8G -cp /Users/guohuanjie/stanfordnlp_resources/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 60000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-d4c4a07f9d7a4717.props -preload openie
Sentence: Alessandro Volta invents the voltaic pile, an early form of battery in Italy, based on previous works by Luigi Galvani.
Alessandro Volta - invents - form
Sentence: Alessandro Volta invents the voltaic pile, an early form of battery in Italy, based on previous works by Luigi Galvani.
Alessandro Volta - invents - form in Italy
Sentence: Alessandro Volta invents the voltaic pile, an early form of battery in Italy, based on previous works by Luigi Galvani.
voltaic pile - form in - Italy
Sentence: Alessandro Volta invents the voltaic pile, an early form of battery 

Sentence: François Isaac de Rivaz designs the first automobile powered by an internal combustion engine fuelled by hydrogen.
François Isaac de Rivaz - designs - first automobile
Sentence: François Isaac de Rivaz designs the first automobile powered by an internal combustion engine fuelled by hydrogen.
François Isaac de Rivaz - designs - automobile powered by combustion engine
Sentence: François Isaac de Rivaz designs the first automobile powered by an internal combustion engine fuelled by hydrogen.
François Isaac Rivaz - designs - automobile powered by internal combustion engine
Sentence: François Isaac de Rivaz designs the first automobile powered by an internal combustion engine fuelled by hydrogen.
François Isaac de Rivaz - designs - first automobile powered by internal combustion engine fuelled
Sentence: François Isaac de Rivaz designs the first automobile powered by an internal combustion engine fuelled by hydrogen.
François Isaac de Rivaz - designs - first automobile powered by c

Sentence: James Fox invents the modern planing machine, though Matthew Murray of Leeds and Richard Roberts of Manchester have also been credited at times with its invention.
Matthew Murray - have - have credited at times with its invention
Sentence: James Fox invents the modern planing machine, though Matthew Murray of Leeds and Richard Roberts of Manchester have also been credited at times with its invention.
James Fox - invents planing machine - have also credited at times with its invention
Sentence: James Fox invents the modern planing machine, though Matthew Murray of Leeds and Richard Roberts of Manchester have also been credited at times with its invention.
James Fox - invents - modern planing machine
Sentence: James Fox invents the modern planing machine, though Matthew Murray of Leeds and Richard Roberts of Manchester have also been credited at times with its invention.
Matthew Murray - have - have also credited at times with its invention
Sentence: James Fox invents the moder

Sentence: The lathe can copy symmetrical shapes and is used for making gun stocks, and later, ax handles.
lathe - can copy - symmetrical shapes
Sentence: The lathe can copy symmetrical shapes and is used for making gun stocks, and later, ax handles.
lathe - can copy - shapes
Sentence: The lathe can copy symmetrical shapes and is used for making gun stocks, and later, ax handles.
lathe - is - used
Sentence: The lathe's patent is in force for 42 years, the record for any U.S. patent.
lathe 's patent - is in - force for 42 years
Sentence: The lathe's patent is in force for 42 years, the record for any U.S. patent.
lathe 's patent - is in - force
Sentence: The lathe's patent is in force for 42 years, the record for any U.S. patent.
42 years - for force is - record
Sentence: The lathe's patent is in force for 42 years, the record for any U.S. patent.
force - record for - U.S. patent
Sentence: The lathe's patent is in force for 42 years, the record for any U.S. patent.
lathe - has - patent
S

Sentence: Moritz von Jacobi invents Electrotyping.
Moritz von Jacobi - invents - Electrotyping
Sentence: William Otis invents the steam shovel.
William Otis - invents - steam shovel
Sentence: James Nasmyth invents the steam hammer.
James Nasmyth - invents - steam hammer
Sentence: Edmond Becquerel invents a method for the photovoltaic effect, effectively producing the first solar cell.
Edmond Becquerel - effectively producing - first cell
Sentence: Edmond Becquerel invents a method for the photovoltaic effect, effectively producing the first solar cell.
Edmond Becquerel - producing - first cell
Sentence: Edmond Becquerel invents a method for the photovoltaic effect, effectively producing the first solar cell.
Edmond Becquerel - producing - first solar cell
Sentence: Edmond Becquerel invents a method for the photovoltaic effect, effectively producing the first solar cell.
Edmond Becquerel - effectively producing - first solar cell
Sentence: Edmond Becquerel invents a method for the photo

Sentence: Alfred Nobel invents Dynamite, the first safely manageable explosive stronger than black powder.
Alfred Nobel - invents - explosive stronger than black powder
Sentence: Alfred Nobel invents Dynamite, the first safely manageable explosive stronger than black powder.
Alfred Nobel - invents - first manageable explosive stronger than black powder
Sentence: Alfred Nobel invents Dynamite, the first safely manageable explosive stronger than black powder.
Alfred Nobel - invents - stronger
Sentence: Alfred Nobel invents Dynamite, the first safely manageable explosive stronger than black powder.
Dynamite - first explosive stronger than - black powder
Sentence: Alfred Nobel invents Dynamite, the first safely manageable explosive stronger than black powder.
Alfred Nobel - invents - stronger than powder
Sentence: Alfred Nobel invents Dynamite, the first safely manageable explosive stronger than black powder.
Alfred Nobel - invents - safely manageable stronger than powder
Sentence: Alfred 

Sentence: However, other inventors before Bell had worked on the development of the telephone and the invention had several pioneers.
Bell - had worked on - development
Sentence: However, other inventors before Bell had worked on the development of the telephone and the invention had several pioneers.
Bell - had worked on - development of telephone
Sentence: Thomas Edison invents the first working phonograph.
Thomas Edison - invents - working phonograph
Sentence: Thomas Edison invents the first working phonograph.
Thomas Edison - invents - first phonograph
Sentence: Thomas Edison invents the first working phonograph.
Thomas Edison - invents - phonograph
Sentence: Thomas Edison invents the first working phonograph.
Thomas Edison - invents - first working phonograph
Sentence: Henry Fleuss is granted a patent for the first practical rebreather.
Henry Fleuss - is granted - patent
Sentence: Henry Fleuss is granted a patent for the first practical rebreather.
Henry Fleuss - is granted - pate

Sentence: Charles Martin Hall and independently Paul Héroult invent the Hall–Héroult process for economically producing aluminum in 1886.Karl Benz invents the first petrol or gasoline powered auto-mobile (car).Carl Josef Bayer invents the Bayer process for the production of alumina.
Héroult process - producing aluminum in - 1886
Sentence: Charles Martin Hall and independently Paul Héroult invent the Hall–Héroult process for economically producing aluminum in 1886.Karl Benz invents the first petrol or gasoline powered auto-mobile (car).Carl Josef Bayer invents the Bayer process for the production of alumina.
Charles Martin Hall - invent - Hall
Sentence: Charles Martin Hall and independently Paul Héroult invent the Hall–Héroult process for economically producing aluminum in 1886.Karl Benz invents the first petrol or gasoline powered auto-mobile (car).Carl Josef Bayer invents the Bayer process for the production of alumina.
Paul Héroult - invent - Hall
Sentence: Charles Martin Hall and in

Sentence: Rudolf Diesel invents the diesel engine (although Herbert Akroyd Stuart had experimented with compression ignition before Diesel).Guglielmo Marconi invents a system of wireless communication using radio waves.
Rudolf Diesel - invents - diesel engine
Sentence: Rudolf Diesel invents the diesel engine (although Herbert Akroyd Stuart had experimented with compression ignition before Diesel).Guglielmo Marconi invents a system of wireless communication using radio waves.
Guglielmo Marconi - invents - system
Sentence: Rudolf Diesel invents the diesel engine (although Herbert Akroyd Stuart had experimented with compression ignition before Diesel).Guglielmo Marconi invents a system of wireless communication using radio waves.
Guglielmo Marconi - invents - system of wireless communication
Sentence: Rudolf Diesel invents the diesel engine (although Herbert Akroyd Stuart had experimented with compression ignition before Diesel).Guglielmo Marconi invents a system of wireless communication

world - has - first practical ice making
James Harrison - produces - world 's ice
James Harrison - produces - world 's practical ice
William Henry Perkin - invents - first synthetic dye
William Henry Perkin - invents - first dye
William Henry Perkin - invents - synthetic dye
William Henry Perkin - invents - dye
William Henry Perkin - invents - Mauveine
Heinrich Geissler - invents - Geissler tube
Gaston Planté - invents - lead acid battery
Gaston Planté - invents - battery
Gaston Planté - invents - rechargeable battery
Gaston Planté - invents - first rechargeable battery
Gaston Planté - invents - first battery
Gaston Planté - invents - acid battery
Joseph Swan - produces - carbon fibers
Alexander Parkes - invents - parkesine also known as celluloid
Alexander Parkes - invents - first plastic
Alexander Parkes - invents - man-made plastic
Alexander Parkes - invents - first man-made plastic
Alexander Parkes - invents - parkesine known
Alexander Parkes - invents - parkesine known as celluloi

Hungarian engineers - intvent - core high efficiency transformer
engineers - intvent - closed core high efficiency transformer
Hungarian engineers - intvent - AC parallel power distribution
John Kemp Starley - invents - modern bicycle
John Kemp Starley - invents - bicycle
Carl Gassner - invents - battery
Carl Gassner - invents - zinc-carbon battery making
Carl Gassner - invents - battery making
Carl Gassner - invents - zinc-carbon battery
Carl Gassner - invents - first dry cell battery
Carl Gassner - invents - dry cell battery
Carl Gassner - invents - first cell battery
Carl Gassner - invents - cell battery
Héroult process - producing aluminum in - 1886
Charles Martin Hall - invent - Hall
Paul Héroult - invent - Hall
process - economically producing aluminum in - 1886
process - producing aluminum in - 1886
process - economically producing - aluminum
process - producing - aluminum
Héroult process - economically producing aluminum in - 1886
Héroult process - producing - aluminum
Héroult 

### <font color='red'>Compare the triples you have been able to extract using your own code, with those extracted by Stanford OpenIE. What are the strong points and weaknesses of your own implementation? What are those of Stanford OpenIE?</font>
<font color='red'><b>Your own implementation</b></font><br>
<font color='red'>Strengths:</font><br>
<font color='red'>Weaknesses:</font><br><br>
<font color='red'><b>Stanford OpenIE</b></font><br>
<font color='red'>Strengths:</font><br>
<font color='red'>Weaknesses:</font><br>

## Optional: Graph representation of knowledge extracted by Open IE

The code below builds a graph representation based on a list of triples extracted by Open IE.

In [16]:
!pip install networkx
import networkx as nx

# Function for assigning unique IDs to arguments of triples
# Returns a dictionary of IDs and the corresponding argument names
def normalise_arguments(triples):
    global arg_id
    # arg_id below serves as the ID
    arg_id = 1
    argument_dict = dict()
    for triple in triples:
        argument_dict[triple.arg1] = arg_id
        arg_id += 1
        argument_dict[triple.arg2] = arg_id
        arg_id += 1
    return argument_dict




In [20]:
# Function for generating the graph
from typing import List
# to_graph fuction was not called/used which will make the code below not running.
def to_graph(triples: List[Triple]) -> nx.Graph:
    global argument_dict
    argument_dict = normalise_arguments(triples)
    # Unlike arguments, predicates do not need to be unique
    predicate_id = arg_id
    graph = nx.Graph()
    for triple in triples:
        graph.add_node(argument_dict[triple.arg1], name=triple.arg1, type='subject')
        graph.add_node(predicate_id, name=triple.predicate, type='predicate')
        graph.add_node(argument_dict[triple.arg2], name=triple.arg2, type='object')
        graph.add_edge(argument_dict[triple.arg1], predicate_id)
        graph.add_edge(argument_dict[triple.arg2], predicate_id)
        predicate_id += 1
    return graph


The code below is for generating a visualisation of a given graph.

In [21]:
!pip install pyvis
import os
import shutil
import tempfile
from pyvis.network import Network
from typing import List

class PyVisPrinter:
    """Class to visualise a (serialized) dataset entry."""

    def __init__(self, path=None):
        self.path = tempfile.mkdtemp(prefix='vis-', dir='/tmp') or path
        
    def clean(self):
        shutil.rmtree(self.path)

    def print_graph(self, graph: nx.Graph, filename):

        vis = Network(bgcolor="#222222",
                      width='100%',
                      font_color="white", notebook=True)
        
        for idx, (node, data) in enumerate(graph.nodes(data=True)):
            vis.add_node(
                node,
                title=data['name'],
                label=data['name'],
                color='yellow' if data['type'] == 'predicate' else 'green' if data['type'] == 'subject' else 'blue'
            )

        for i, (source, target) in enumerate(graph.edges()):
            if source in vis.get_nodes() and target in vis.get_nodes():
                vis.add_edge(source, target)
            else:
                self.logger.warning("source or target not in graph!")

        name = os.path.join(self.path, filename)
        return vis
    



In [22]:
# Generate a graph based on a list of triples
graph = to_graph(triples)

# Generate an HTML file with the graph visualisation and display it
p = PyVisPrinter()
v = p.print_graph(graph, 'my_graph.html')
v.show('my_graph.html')

Below is a very basic way to query the graph.
The code will return a tree traversed using depth-first search (DFS) with 'pen' as a starting point (answering the question "Who invented the pen?").
Try other queries by changing the starting point (i.e., change 'pen' to 'Volta' to answer the question: "What did Volta invent?")

In [25]:
successors = nx.dfs_tree(graph, argument_dict['cinematograph'])
for s in successors:
    print (graph.nodes[s])

{'name': 'cinematograph', 'type': 'object'}
{'name': 'invent', 'type': 'predicate'}
{'name': 'Bouly', 'type': 'subject'}
