# Relation extraction and entity linking

This notebook is part of the lecture series at the Faculty Development Programme organised by the Department of Computer Science and Engineering, Anil Neerukonda Institute of Technology and Sciences, Visakhapatnam, jointly in association with ShodhGuru Innovation and Research Labs, India. Specifically, this notebook is part of Tek Raj Chhetri's lecture entitled Applications of Deep Neural Networks in Knowledge Graph Construction.







For the relation extraction and entity linking, we will use  __Flair__. For details about __Flair__ and more tutorials check [https://flairnlp.github.io/docs/intro](https://flairnlp.github.io/docs/intro)
 
- Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter, S. and Vollgraf, R., 2019, June. FLAIR: An easy-to-use framework for state-of-the-art NLP. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics (demonstrations) (pp. 54-59). 
 
## Installation



`pip flair`
 

*** 

## Entity linking

In [1]:
from flair.nn import Classifier
from flair.data import Sentence
from flair.splitter import SegtokSentenceSplitter

def entity_linking(text):
    
    # loading odel 
    split_tok = SegtokSentenceSplitter()
    
    token_sentences_tagged = split_tok.split(text) 
    
    tagger_link = Classifier.load('linker')

    tagger_link.predict(token_sentences_tagged)
    return token_sentences_tagged

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
def print_result(inputs):
    for r in inputs:
        for label in r.get_labels():
            print("###################")
            print(f'Predicted value for linking is: "{label.value}"')
            print(f'Confidence score: "{label.score}"')
            
            return label.value
        
def get_linking_value(inputs):
    linked_val = entity_linking(inputs)
    for r in linked_val:
        for label in r.get_labels():
            
            return label.value
    

In [3]:
result_linklong= entity_linking('Albert Einstein was born at Ulm, in Württemberg, Germany. Six weeks later Einstein family moved to Munich.')

2023-04-05 17:25:18,683 SequenceTagger predicts: Dictionary with 20 tags: <unk>, O, S-ORG, S-MISC, B-PER, E-PER, S-LOC, B-ORG, E-ORG, I-PER, S-PER, B-MISC, I-MISC, E-MISC, I-ORG, B-LOC, E-LOC, I-LOC, <START>, <STOP>


In [4]:
print_result(result_linklong)

###################
Predicted value for linking is: "Albert_Einstein"
Confidence score: "0.9999998807907104"


'Albert_Einstein'

In [5]:
print_result(entity_linking("Einstein"))

2023-04-05 17:25:33,041 SequenceTagger predicts: Dictionary with 20 tags: <unk>, O, S-ORG, S-MISC, B-PER, E-PER, S-LOC, B-ORG, E-ORG, I-PER, S-PER, B-MISC, I-MISC, E-MISC, I-ORG, B-LOC, E-LOC, I-LOC, <START>, <STOP>
###################
Predicted value for linking is: "Albert_Einstein"
Confidence score: "0.9999995231628418"


'Albert_Einstein'

## Relations extraction

In [6]:
from flair.data import Sentence
from flair.nn import Classifier
from flair.splitter import SegtokSentenceSplitter


def extract_relation(sentence, mini_batch_size=5): 
    
    split_tok = SegtokSentenceSplitter()
    token_sentences = split_tok.split(sentence) 

    tag_entity = Classifier.load('ner')
    rel_extract = Classifier.load('relations')

    tag_entity.predict(token_sentences)
    rel_extract.predict(token_sentences, mini_batch_size)
    return token_sentences    


In [7]:
# input_text_einstein = "Albert Einstein was born at Ulm, in Württemberg, Germany, on March 14, 1879. Six weeks later the family moved to Munich, where he later on began his schooling at the Luitpold Gymnasium. Later, they moved to Italy and Albert continued his education at Aarau, Switzerland and in 1896 he entered the Swiss Federal Polytechnic School in Zurich to be trained as a teacher in physics and mathematics. In 1901, the year he gained his diploma, he acquired Swiss citizenship and, as he was unable to find a teaching post, he accepted a position as technical assistant in the Swiss Patent Office. In 1905 he obtained his doctor’s degree. During his stay at the Patent Office, and in his spare time, he produced much of his remarkable work and in 1908 he was appointed Privatdozent in Berne. In 1909 he became Professor Extraordinary at Zurich, in 1911 Professor of Theoretical Physics at Prague, returning to Zurich in the following year to fill a similar post. In 1914 he was appointed Director of the Kaiser Wilhelm Physical Institute and Professor in the University of Berlin. He became a German citizen in 1914 and remained in Berlin until 1933 when he renounced his citizenship for political reasons and emigrated to America to take the position of Professor of Theoretical Physics at Princeton*. He became a United States citizen in 1940 and retired from his post in 1945. After World War II, Einstein was a leading figure in the World Government Movement, he was offered the Presidency of the State of Israel, which he declined, and he collaborated with Dr. Chaim Weizmann in establishing the Hebrew University of Jerusalem. Einstein always appeared to have a clear view of the problems of physics and the determination to solve them. He had a strategy of his own and was able to visualize the main stages on the way to his goal. He regarded his major achievements as mere stepping-stones for the next advance. At the start of his scientific work, Einstein realized the inadequacies of Newtonian mechanics and his special theory of relativity stemmed from an attempt to reconcile the laws of mechanics with the laws of the electromagnetic field. He dealt with classical problems of statistical mechanics and problems in which they were merged with quantum theory: this led to an explanation of the Brownian movement of molecules. He investigated the thermal properties of light with a low radiation density and his observations laid the foundation of the photon theory of light. In his early days in Berlin, Einstein postulated that the correct interpretation of the special theory of relativity must also furnish a theory of gravitation and in 1916 he published his paper on the general theory of relativity. During this time he also contributed to the problems of the theory of radiation and statistical mechanics. In the 1920s, Einstein embarked on the construction of unified field theories, although he continued to work on the probabilistic interpretation of quantum theory, and he persevered with this work in America. He contributed to statistical mechanics by his development of the quantum theory of a monatomic gas and he has also accomplished valuable work in connection with atomic transition probabilities and relativistic cosmology. After his retirement he continued to work towards the unification of the basic concepts of physics, taking the opposite approach, geometrisation, to the majority of physicists." 

In [8]:
input_text_einstein = 'Albert Einstein was born at Ulm, in Württemberg, Germany. Six weeks later Einstein family moved to Munich.'

In [9]:
result = extract_relation(input_text_einstein)

2023-04-05 17:25:39,570 SequenceTagger predicts: Dictionary with 20 tags: <unk>, O, S-ORG, S-MISC, B-PER, E-PER, S-LOC, B-ORG, E-ORG, I-PER, S-PER, B-MISC, I-MISC, E-MISC, I-ORG, B-LOC, E-LOC, I-LOC, <START>, <STOP>


In [10]:
triples = []
triples_entity_linked = []
for res in result: 
    res_relation = res.get_labels('relation')
    if len(res_relation) > 0: 
        for tr in res_relation: 
            subjects = str(tr.data_point.first).strip().split("->")[0].split(":")[2].split("→")[0]
            objects = str(tr.data_point.second).strip().split("->")[0].split(":")[2].split("→")[0]
            #entity linking 
            entity_linked_subjects = " ".join(get_linking_value(subjects).split("_"))
            entity_linked_objects = " ".join(get_linking_value(objects).split("_"))
            triples.append([subjects, tr.value, objects])
            triples_entity_linked.append([entity_linked_subjects, tr.value, entity_linked_objects])


2023-04-05 17:25:55,134 SequenceTagger predicts: Dictionary with 20 tags: <unk>, O, S-ORG, S-MISC, B-PER, E-PER, S-LOC, B-ORG, E-ORG, I-PER, S-PER, B-MISC, I-MISC, E-MISC, I-ORG, B-LOC, E-LOC, I-LOC, <START>, <STOP>
2023-04-05 17:26:11,206 SequenceTagger predicts: Dictionary with 20 tags: <unk>, O, S-ORG, S-MISC, B-PER, E-PER, S-LOC, B-ORG, E-ORG, I-PER, S-PER, B-MISC, I-MISC, E-MISC, I-ORG, B-LOC, E-LOC, I-LOC, <START>, <STOP>
2023-04-05 17:26:33,170 SequenceTagger predicts: Dictionary with 20 tags: <unk>, O, S-ORG, S-MISC, B-PER, E-PER, S-LOC, B-ORG, E-ORG, I-PER, S-PER, B-MISC, I-MISC, E-MISC, I-ORG, B-LOC, E-LOC, I-LOC, <START>, <STOP>
2023-04-05 17:27:12,219 SequenceTagger predicts: Dictionary with 20 tags: <unk>, O, S-ORG, S-MISC, B-PER, E-PER, S-LOC, B-ORG, E-ORG, I-PER, S-PER, B-MISC, I-MISC, E-MISC, I-ORG, B-LOC, E-LOC, I-LOC, <START>, <STOP>
2023-04-05 17:27:29,355 SequenceTagger predicts: Dictionary with 20 tags: <unk>, O, S-ORG, S-MISC, B-PER, E-PER, S-LOC, B-ORG, E-ORG, I-

In [80]:
def preprocess_triples(inputlists):
    newtriples=[]
    for i in triples:
        sublist=[j.strip().replace('"', '') for j in i]
        newtriples.append(sublist)
    return newtriples
        
           

In [62]:
preprocess_triples(triples)

[['Albert Einstein', 'born_in', 'Ulm'],
 ['Albert Einstein', 'born_in', 'Württemberg'],
 ['Albert Einstein', 'born_in', 'Germany'],
 ['Einstein', 'lived_in', 'Munich']]

In [11]:
import pandas as pd

In [64]:
df_without_entity_linked=pd.DataFrame(preprocess_triples(triples), columns = ['subject', 'relation','object']) 

In [66]:
df_without_entity_linked

Unnamed: 0,subject,relation,object
0,Albert Einstein,born_in,Ulm
1,Albert Einstein,born_in,Württemberg
2,Albert Einstein,born_in,Germany
3,Einstein,lived_in,Munich


In [67]:
df_without_entity_linked.to_csv("df_without_entity_linked.csv", index=None)   

In [87]:
df_with_entity_linkeds = pd.DataFrame(triples_entity_linked, columns=['subject', 'relation','object'])

In [88]:
df_with_entity_linkeds

Unnamed: 0,subject,relation,object
0,Albert Einstein,born_in,Ulm
1,Albert Einstein,born_in,Württemberg
2,Albert Einstein,born_in,Germany
3,Albert Einstein,lived_in,Munich


In [89]:
df_with_entity_linkeds.to_csv("df_with_entity_linked.csv", index=None)   

In [91]:
%%writefile knowledge_graphs_visualisation_df_without_entity_linked.py

import streamlit
from streamlit_agraph import TripleStore
from streamlit_agraph import agraph, Node, Edge, Config
import pandas as pd

nodes = set()
Edges = []
triples = pd.read_csv("df_without_entity_linked.csv")
 

for _,triple in triples.iterrows():
    nodes.add(triple['subject'])
    nodes.add(triple['object'])
    Edges.append(Edge(source=triple['subject'],
                      label=triple['relation'],
                      target=triple['object'],
                      # **kwargs
                      )
                 )
st_nodes=[]
for n in nodes:
    st_nodes.append(Node(id=n, label=n))
# width and height are large because of my screen size, you may want to change accordingly    
config = Config(width=3800,
                height=1300,
                nodeHighlightBehavior=True, 
                highlightColor="#ff0000",
                directed=True, 
                hierarchical=True,  
                )

agraph(nodes=st_nodes,
       edges=Edges, config=config)


Writing knowledge_graphs_visualisation_df_without_entity_linked.py


In [92]:
# !streamlit run knowledge_graphs_visualisation_df_with_entity_linked.py

2023-04-05 17:53:50.700 INFO    numexpr.utils: NumExpr defaulting to 8 threads.
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8501[0m
[34m  Network URL: [0m[1mhttp://138.232.106.80:8501[0m
[0m
^C
[34m  Stopping...[0m


In [1]:
!streamlit run knowledge_graphs_visualisation_df_without_entity_linked.py

2023-04-06 11:47:16.978 INFO    numexpr.utils: NumExpr defaulting to 8 threads.
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8501[0m
[34m  Network URL: [0m[1mhttp://138.232.106.80:8501[0m
[0m

  [34m[1mA new version of Streamlit is available.[0m

  See what's new at https://discuss.streamlit.io/c/announcements

  Enter the following command to upgrade:
  [34m$[0m [1mpip install streamlit --upgrade[0m
[0m
^C
[34m  Stopping...[0m
