# File Description

##### In this file, we are performing the representation of Entities and their Relationships in the elaborate format of SVO. SVO is short for "Subject Verb Object" where the relationship between two entities is defined as their verb. Please note that once we have succesfully extracted the Subject Verb and Object from the Unstructured text file, we will store them in a CSV file. This file will later be used to generate a queryable graph. Each entity i.e. the subjects and the Objeccts will serve as undividual nodes of the graph whereas the Verbs will serve as relationships. 

## Installation and Upgradation 

##### Please note that the Installation and Upgradation part of the file refers to textaxy and spacy. In our previous file we discussed how Neural coref requires a lower version of Spacy. However in this file, we will be updating the values of Spacy because "en_core_web" as well as Textacy require newer versions. 

In [None]:
! pip install textacy
! pip install --upgrade spacy
! python -m spacy download en_core_web_sm

## Code

In [None]:
import spacy
import textacy
from textacy.extract import subject_verb_object_triples
from bs4 import BeautifulSoup
import requests
import re
import os

In [None]:
data_dir ='' 
TEXTS = [open('Preprocessed_PTCL.txt').read()]

##### This part of the code, extracts entities as well as relationships from the codebase. The next step in the process is to align them in the form of Subjects Verbs and Objects, SVOs. This step also avoids repetition of same entities in the form of Graphs by creating a frequency dictionary. 

In [None]:
nlp = spacy.load('en_core_web_sm')
final_svos = []
final_text_svos = []
entity_dict = {}
svo_labels = []
for i, text in enumerate(TEXTS):
    doc = nlp(text)
    for ent in doc.ents:
        if ent not in entity_dict.keys():
            entity_dict[str(ent)] = ent.label_
            #print(ent.label)       
    svos = list(subject_verb_object_triples(doc))
    #print(svos, "/n")
    #svos = subject_verb_object_triples(doc)
    svos_text = [(str(x[0]).strip(), str(x[1]).strip(), str(x[2]).strip()) for x in svos]
    print(svos_text)
    final_svos = final_svos + svos
    final_text_svos = final_text_svos + svos_text

for svo in final_text_svos:
    tup = ['Object', 'Object']
    if(svo[0] in entity_dict.keys()):
        tup[0] = entity_dict[svo[0]]
    
    if(svo[2] in entity_dict.keys()):
        tup[1] = entity_dict[svo[2]]
    svo_labels.append(tuple(tup))


In [None]:
final_text_svos

In [None]:

# Write all the SVOs as a CSV file

import csv

with open('svos.csv', 'w') as csvFile:
    writer = csv.writer(csvFile)
    writer.writerows(final_text_svos)

csvFile.close()

In [None]:
svo_labels

##### The frequency dictionary which was refered to in the previous comment will now be conserved in the form of a pickle file. In the end, we'il also generate a dot file to save the graph. 

In [None]:
# Save the entity type dictionary using pickle

import pickle
with open('entity_dict.pickle', 'wb') as handle:
    pickle.dump(entity_dict, handle, protocol=pickle.HIGHEST_PROTOCOL)

In [None]:
# Visualize the KG using Graphviz

def generate_graphviz_graph(entity_relations, name, verbose=True):
    """digraph G {
    # a -> b [ label="a to b" ];
    # b -> c [ label="another label"];
    }"""
    graph = list()
    graph.append('digraph {')
    for er in entity_relations:
        graph.append('"{}" -> "{}" [ label="{}" ];'.format(er[0], er[2], er[1]))
    graph.append('}')

    out_dot = name + '.dot'
    with open(out_dot, 'w') as output_file:
        output_file.writelines(graph)

    out_png = name + '.png'
    DOT_BIN_PATH = 'dot'
    command = "dot -Tpng {} -o {}".format(out_dot, out_png)
 
    os.system(command)

    print('Wrote graph to {} and {}'.format(out_dot, out_png))


In [None]:
generate_graphviz_graph(final_text_svos," name")