# Load Reference Data

This notebook performs loads Reference Data from a DataRepo into your Knowledge Graph.
* Load a file that contains any Reference Knowledge (in a Cipher syntax or from CSV files)

The result of this step includes:
- ReferenceNode nodes connected to each other with ReferenceEdge edges

### Import General Libraries

In [None]:
import os
import logging

### Parameters can be passed into the Notebook from an OpenTLDR Workflow
OpenTLDR workflows use the notebook block tagged as "parameters" to inject variables (for example to redirect the source of content).

> **Changing Variable Names in the Parameters Block** you are welcome to change the values of these parameter variables, but if you change their names, be aware they are used elsewhere in the notebook and in other workflow stages.

In [None]:
# Workflow Parameters - these may be overridden by the Workflow

data_repo_config= {'repo_type': 'files', 'path': '../Data/Sample/reference'}

# Example of using S3 bucket:
#
# data_repo_config = {
#        'repo_type': 's3',
#        'bucket': os.getenv("S3_BUCKET"),
#        'aws_access_key_id': os.getenv("S3_ACCESS_KEY_ID"),
#        'aws_secret_access_key': os.getenv("S3_SECRET_KEY"),
#        'prefix': 'reference'
#        }

# Logging level ranges are (from least to most verbose): ERROR, WARN, INFO, DEBUG
logging_level= logging.INFO

verbose = True

# Setup



### Import OpenTLDR Libraries


In [None]:
logging.getLogger("OpenTLDR").setLevel(logging_level)

from opentldr import KnowledgeGraph, DataRepo

kg=KnowledgeGraph()

## Loading Reference Data
The following cell uses the OpenTLDR DataRepo to load Reference Data.

> **Reference Data** enables the definition of knowledge that is known but may not be explicitly called out within any specific article. Within a particular domain there is often "common knowledge" that is assumed by formalizing this knowledge in a graph, it can be automatically reasoned about in connection with other information. 

In [None]:
if data_repo_config is not None:

    repo = DataRepo(kg,data_repo_config)
    
    if verbose:
        print("Loading Reference Data from: {}".format(repo.describe()))

    list_of_uids =  repo.importData()

    print("Loaded {count} reference nodes and edges from the repository.".format(count=len(list_of_uids)))

else:
    print("No DataRepo specified for Reference Data.")

## Verify what ReferenceData has been loaded the KG
This simply pulls all the ReferenceNodes from the KG and prints them.

In [None]:
if verbose:

    all_reference_nodes= kg.get_all_reference_nodes()
    # This API call is equivelent to making a cypher query to the KG:
    #  all_reference_nodes= kg.cypher_query("MATCH (x:ReferenceNode) RETURN x","x")

    print("Found {count} reference nodes in the knowledge graph:".format(count=len(all_reference_nodes)))

    # Iterate thru the Reference Nodes and print info for each
    for reference_node in all_reference_nodes:
        print(" - Found {type}({uid}):\t{text}".format(
            type=reference_node.type, uid=reference_node.uid, text=reference_node.text))

# Close down the remote connections to the database

In [None]:
kg.close()