# How to use the pre-trained model in your own code

### 1. Complete the Setup instructions from the Readme and download the pre-trained models
- The models were tested with Keras v. 2.1.5 and tensorflow 1.6.0- You may need to install `hd5py` with pip and then re-install numpy==1.13.1 if it gets updated

In [None]:
import sys

In [None]:
# Make sure that the directory of the project is in your Python PATH
sys.path.insert(0, "relation_extraction/")

In [None]:
from core.parser import RelParser
from core import keras_models

### 2. You need to tokenize and part-of-speech tag your data
- The easiest way to so is to use Stanford CoreNLP server with the pycorenlp library
- Install and start the CoreNLP server with english models as instructed here: [CoreNLP Server](https://stanfordnlp.github.io/CoreNLP/corenlp-server.html)
- Install the pycorenlp python library: `pip install pycorenlp`

In [None]:
from pycorenlp import StanfordCoreNLP

In [None]:
corenlp = StanfordCoreNLP('http://localhost:9000')
corenlp_properties = {
    'annotators': 'tokenize, pos, ner',
    'outputFormat': 'json'
}

In [None]:
def get_tagged_from_server(input_text):
    """
    Send the input_text to the CoreNLP server and retrieve the tokens, named entity tags and part-of-speech tags.
    """
    corenlp_output = corenlp.annotate(input_text,properties=corenlp_properties).get("sentences", [])[0]
    tagged = [(t['originalText'], t['ner'], t['pos']) for t in corenlp_output['tokens']]
    return tagged

In [None]:
print(get_tagged_from_server("Germany is a country in Europe"))

In [None]:
print(get_tagged_from_server("Star Wars VII is an American space opera epic film directed by  J. J. Abrams."))

- You can also generate a similar output with any part-of-speech tagger of your choice und use it with our models.

### 3. Extract entity mentions and generate an empty graph of relations in the input sentence

In [None]:
from core import entity_extraction

In [None]:
# Convert the input string into a list of tuples with the Stanford CoreNLP as explained above
tagged = get_tagged_from_server("Germany is a country in Europe")

In [None]:
entity_fragments = entity_extraction.extract_entities(tagged)
edges = entity_extraction.generate_edges(entity_fragments)
non_parsed_graph = {'tokens': [t for t, _, _ in tagged],
                    'edgeSet': edges}
print(edges)

- Empty relations are called edges and they have two attributes: 'left' and 'right' that contain token indices of entity mentions

### 4. Load the pre-trained relation extraction model

In [None]:
# the glove embeddings should be in the "resources/" folder, otherwise change the pathes in the model_params.json or directly in the code
keras_models.model_params['wordembeddings'] = "../resources/embeddings/glove/glove.6B.50d.txt"

In [None]:
# The downloaded pretrained models should be in the "trainedmodels/" folder
relparser = RelParser("model_ContextWeighted", models_folder="trainedmodels/")

### 5. Label the edges in the sentence graph using the pre-trained model

In [None]:
parsed_graph = relparser.classify_graph_relations([non_parsed_graph])

- The output is a dictionary, the labeled edges are stored in the 'edgeSet' field. 
- 'kbID' contains the wikidata identifier of the assigned relation, 'P0' stands for an empty relation
- 'lexicalInput' contains a human readable relation label

In [None]:
print(parsed_graph)