### In this notebook, we present how to read the ontology, and build an index.

We assume that the ES is working and responding at the http://localhost:{port}

In [1]:
from source.ontology_parsing.data_loading import get_all_concept_file_paths, get_graphs_from_files
from config import ONTOLOGY_CORE_DIR

### Reading the ontology from  `ONTOLOGY_CORE_DIR` path
(the path can be changed in `config.py`)

In [2]:
# reading files
files = get_all_concept_file_paths(ONTOLOGY_CORE_DIR)

# loading the files data into graphs with rdflib
graphs = get_graphs_from_files(files)

### Building an index

In the beginning, let's build a baseline index - each concept will be a separate row with their objects as values in the columns (predicates).

First, we need to define which predicates (with URIs) we want as columns in our new index.

In [3]:
pred_uri_to_idx_colname = {
    'http://www.w3.org/2004/02/skos/core#prefLabel': 'prefLabel',
    'http://www.w3.org/2004/02/skos/core#closeMatch': 'closeMatch',
    'http://www.w3.org/2004/02/skos/core#related': 'related',
    'http://www.w3.org/2004/02/skos/core#broader': 'broader'
}
pred_uri_to_idx_colname

{'http://www.w3.org/2004/02/skos/core#prefLabel': 'prefLabel',
 'http://www.w3.org/2004/02/skos/core#closeMatch': 'closeMatch',
 'http://www.w3.org/2004/02/skos/core#related': 'related',
 'http://www.w3.org/2004/02/skos/core#broader': 'broader'}

Or we can derive the columns automatically from all predicates present in the ontology.

In [4]:
from source.ontology_parsing.graph_utils import get_uri_to_colname_dict_from_ontology

# or derived from the ontology automatically
pred_uri_to_idx_colname = get_uri_to_colname_dict_from_ontology(graphs)
pred_uri_to_idx_colname

{'http://www.w3.org/2004/02/skos/core#closeMatch': 'closeMatch',
 'http://www.w3.org/2004/02/skos/core#prefLabel': 'prefLabel',
 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type': 'type',
 'http://www.w3.org/2004/02/skos/core#broader': 'broader',
 'http://www.w3.org/2004/02/skos/core#related': 'related'}

Another step is to build the index with the columns defined above.
To do that, we create an IndexBuilder object implementing the methods to build a requested type of index.

In [8]:
from source.es_index.IndexBaseline import IndexBaseline



index_builder = IndexBaseline(pred_uri_to_idx_colname, graphs, include_concept_type=True)

We can see the template used as a schema for our index:

In [11]:
index_builder.get_template()

{'mappings': {'properties': {'name': {'type': 'text'},
   'type': {'type': 'text'},
   'closeMatch': {'type': 'text'},
   'prefLabel': {'type': 'text'},
   'broader': {'type': 'text'},
   'related': {'type': 'text'}}}}

Now, we can build the index and populate it with the data from the ontology. The name of the index - `IDX_NAME` is defined in the `config.py` file, and it can be overridden if needed. The port - `PORT` - on which ES is responding can also be defined in the `config.py` file.

In [10]:
from source.es_index.create_index import build_index
from config import IDX_NAME, PORT



# assumes that elastic search is responding at localhost:{PORT}
build_index(index_builder, es_config={'host': 'localhost', 'port': PORT}, idx_name=IDX_NAME)

Yay Connected
{'_index': 'ontology_index', '_type': '_doc', '_id': 'N-KRnYUB9f9Rttrv4pZP', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1}
{'_index': 'ontology_index', '_type': '_doc', '_id': 'OOKRnYUB9f9Rttrv4pZd', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 1, '_primary_term': 1}
{'_index': 'ontology_index', '_type': '_doc', '_id': 'OeKRnYUB9f9Rttrv4paA', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 2, '_primary_term': 1}
{'_index': 'ontology_index', '_type': '_doc', '_id': 'OuKRnYUB9f9Rttrv4paL', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 3, '_primary_term': 1}
{'_index': 'ontology_index', '_type': '_doc', '_id': 'O-KRnYUB9f9Rttrv4paX', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no':