# DSS-AZUL-INDEXING

### This jupyter notebook is to do a couple testing indexing operations using the DataExtractor class and the FileIndexer class.

Below here we import our modules and set up:
* ElasticSearch Client
* Dummy payload of event
* Parse the bundle_uuid and the bundle_version

In [1]:
from elasticsearch import Elasticsearch
from indexer.utils import DataExtractor
from indexer.indexer import FileIndexer
import json
from pprint import pprint

# Create an ElasticSearch client
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
# Sample event payload
payload = { "query": { "query": { "match_all":{}} }, "subscription_id": "ba50df7b-5a97-4e87-b9ce-c0935a817f0b", "transaction_id": "ff6b7fa3-dc79-4a79-a313-296801de76b9", "match": { "bundle_version": "2017-10-02T183005.116143Z", "bundle_uuid": "cd55a68f-fc98-4c6d-af5e-7df491775df4" } }
bundle_uuid = payload['match']['bundle_uuid']
bundle_version = payload['match']['bundle_version']
# Create 

Next, we will create an instance of the DataExtractor and use it to get the contents from the bundle referenced by the variable `payload`. We will be pulling from the AWS replica.

In [2]:
# Create DataExtractor instance pointing to HCA Staging
extractor = DataExtractor("https://dss.staging.data.humancellatlas.org/v1")
# Use dummy payload and get the metadata_files and the data_files
metadata_files, data_files = extractor.extract_bundle(payload, "aws")
# Print each dictionary
print("\n#####################################################")
print("#                    PRINTING METADATA               #")
print("#####################################################")
pprint(metadata_files, indent=4)
print("\n#####################################################")
print("#                  PRINTING DATA FILES               #")
print("#####################################################")
pprint(data_files, indent=4)


#####################################################
#                    PRINTING METADATA               #
#####################################################
{   'assay.json': {   'core': {   'accession': None,
                                  'events': [   {   'endState': 'Validating',
                                                    'originalState': 'Draft',
                                                    'submissionDate': {   'date': '2017-10-02T18:13:29.336+0000'}},
                                                {   'endState': 'Valid',
                                                    'originalState': 'Validating',
                                                    'submissionDate': {   'date': '2017-10-02T18:13:39.375+0000'}}],
                                  'submissionDate': {   'date': '2017-10-02T18:10:53.603Z'},
                                  'updateDate': {   'date': '2017-10-02T18:13:39.375Z'},
                                  'uuid': '90dab527-46d0

Next we pass this on to the FileIndexer class to create a File Oriented index entry on ElasticSearch running on `localhost:9200`. But first, we get the index settings and get the configuration files.

In [3]:
# Define helper method to open files
def open_and_return_json(file_path):
    """
    Opens and returns the contents of the json file given in file_path
    :param file_path: Path of a json file to be opened
    :return: Returns an obj with the contents of the json file
    """
    with open(file_path, 'r') as file_:
        loaded_file = json.load(file_)
    return loaded_file

# Get the index's settings
index_settings = open_and_return_json('chalicelib/settings.json')
# Get the index overall config
index_mapping_config = open_and_return_json('chalicelib/config.json')

file_indexer = FileIndexer(metadata_files,
                           data_files,
                           es,
                           "testing_index",
                           "doc",
                           index_settings=index_settings,
                           index_mapping_config=index_mapping_config)

file_indexer.index(bundle_uuid, bundle_version)
print("INDEXING DONE")

INDEXING DONE
