# Step 0: Initialize
This notebook performs initialization of your Knowledge Graph.

![step0](resources/Step0_Initialize.png)

* Import OpenTLDR Knowledge Graph class - which automatically uses your .env to connect to Neo4J.
* Wipe the current database clean to start at a known state. WARNING: This deletes everything!
* Load a file that contains any Reference Knowledge (in a Cypher syntax or from CSV files)

The result of this step includes:
- A clear KG
- ReferenceNode nodes connected to each other with ReferenceEdge edges

## Setup Imports and Logging Level
> Setting the logging level will change the amount of extra output you see.

Uncomment the logging level you wish to see.

In [None]:
import os
import logging

#from dotenv import load_dotenv
#load_dotenv("./.env")

#logging.getLogger("OpenTLDR").setLevel(logging.ERROR)  # Less output
logging.getLogger("OpenTLDR").setLevel(logging.WARN)   # Default
#logging.getLogger("OpenTLDR").setLevel(logging.INFO)   # More output
#logging.getLogger("OpenTLDR").setLevel(logging.DEBUG)  # So much output


## Parameters
OpenTLDR workflows use the notebook block tagged as "parameters" to inject variables (for example to point to a different source of Reference Data).

> **Do Not Change Variable Names in the Parameters Block**<BR>This is particularly true for those variables in the Workflow Parameters section, because those are known to be overridden for various workflows. You are welcome to change the values of these parameter variables or add news ones but please do not change their names or delete them without understanding how they get used. They coud be used elsewhere in the notebook, other notebooks, and/or in other workflow processes.

In [None]:
# Workflow Parameters - these may be overridden by the Workflow
message = "Used default values in the Notebooks 'parameters' block."

ref_date_repo_config= {'repo_type': 'files', 'path': './sample_data/reference'}

# Example of using S3 repo with .env configuration (to avoid key leakage)
#ref_date_repo_config = {
#        'repo_type': 's3',
#        'bucket': os.getenv("S3_BUCKET"),
#        'aws_access_key_id': os.getenv("S3_ACCESS_KEY_ID"),
#        'aws_secret_access_key': os.getenv("S3_SECRET_KEY"),
#        'prefix': 'reference'
#        }

#clean_policy = "live"       # 'live' assumes the workflow runs once per day, keeps requests and TLDR data but not content. This is used on the live server.
#clean_policy = "keep"       # 'keep' will simply keep whatever happens to be in the KG at the time. Be careful with this.
clean_policy = "fresh"      # 'fresh' (or anything else) deletes everything in the KG each time the workflow is run. This is the default.


## Example of Parameter passing in Action:
Let's verify that the parameters are working as expected.

If you are running this notebook directly, you should see the value set in the above cell (which is tagged 'parameters'), by default that is: 
<pre>Used default values in the Notebooks 'parameters' block.</pre>

If you are running this notebook thru the default workflow, then you will be looking at the READ ONLY outputed notebooks (by default this is stored in the "./READ_ONLY_OUTPUT" folder) and you should see the value set in the Workflow.ipynb notebook, by default that is: 
<pre>Successfully passed in parameters from Workflow.ipynb!</pre>

In [None]:
print(message)

## Initialize and Import an OpenTLDR Knowledge Graph
The KnowledgeGraph() contructor should be called without parameters so that it defaults to using .env and environment variables.
The order of is:

1 Set in KnowledgeGraph() constructor

- You can set each variable and it will create the driver
- You can setup a neo4j driver instance and pass that in
- **We advise NOT using this method** , as your notebook's code will overrule the automation configurations

2 Set as Environment variable
- This is how several automated processes operate, but this is not intended for most users

3 Set in the .env file (in project directory)
- **This is the recommended place for you to set system-specific things!**
- This file is not part of the GitHub repository you cloned, instead you or a setup script must create it (usually this
is done by copying the DefaultDotEnv file to .env (e.g., cp ./DefaultDotEnv ./.env))

4 Defaults that are hard coded will work with the provided neo4j container setup, but probably not much else:

| Variable | Value | Description |
|---|---|---|
| NEO4J_CONNECTION | 'neo4j://localhost:7687' | URL for bolt protocol on default port of localhost only |
| NEO4J_USERNAME | neo4jUser | user and password are not used for default localhost only container |
| NEO4J_PASSWORD | neo4jPassword | user and password are not used for default localhost only container |
| NEO4J_DATABASE | neo4j | community edition of neo4j ONLY allows one database |

In [None]:
from opentldr import KnowledgeGraph
kg=KnowledgeGraph()

# Initialize the Graph Database
> **WARNING: this will erase data in the current KG and cannot be undone.**
> Please be sure that this is what you want to do before running this cell. You can turn off this behavior in the parameters block using by setting the 'clean_policy' variable to "keep".

The kg.delete_all() method is equivelent to running the Cipher command "MATCH(x) DETACH DELETE x" which matches any node and then deletes it and any connected edges.

In [None]:
match clean_policy:
    case "live":
        print("Cleaning KG of previous non-persistent content.")
        # does not delete Requests, Users, Sources, TLDRs, or Reference Knowledge
        kg.delete_all_evalkeys()
        kg.delete_all_recommendations()
        kg.delete_all_summaries()
        kg.delete_all_content()
    case "keep":
        print("Not cleaning KG, keeping whatever is there.")
        pass
    case _:
        print("Cleaning KG for a fresh start.")
        # removes everything..
        kg.delete_all()


## Loading the Reference Data
The following cell uses the OpenTLDR Content Repo to load Reference Data.

> **Reference Data** enables the definition of knowledge that is known but may not be explicitly called out within any specific article. Within a particular domain there is often "common knowledge" that is assumed by formalizing this knowledge in a graph, it can be automatically reasoned about in connection with other information. 

In [None]:
from opentldr import DataRepo

if ref_date_repo_config is not None:
    repo = DataRepo(kg, ref_date_repo_config)
    list_of_uids =  repo.importData()
    print("Loaded {count} reference nodes and edges from the repository.".format(count=len(list_of_uids)))

## Verify what ReferenceData has been loaded the KG
This simply pulls all the ReferenceNodes from the KG and prints them.

In [None]:
# Makes a cypher query to the KG
# all_reference_nodes= kg.cypher_query_to_list("MATCH (x:ReferenceNode) RETURN x","x")
all_reference_nodes= kg.get_all_reference_nodes()

print("Found {count} reference nodes in the knowledge graph:".format(count=len(all_reference_nodes)))

# Iterate thru the Reference Nodes and print info for each
for reference_node in all_reference_nodes:
    print(" - {type}({uid}):\t{text}".format(
        type=reference_node.type, uid=reference_node.uid, text=reference_node.text))

# Close down the remote connections to the database

In [None]:
kg.close()