# Decision Tree Data
This notebook explores data taken from an Access Database and put into a graph format.  The data itself represents a series of questions along with possible answers.  There is additional data tying these questions and answers together to construct a tree-like structure.

<div align="left">
    <img src="images/example_model.png" alt="Graph Model" width="700px" align="center"/>
</div>

The data within this notebook includes a sample decision tree revolving around choosing a drink as Starbucks.  The questions and answers take us through a series of choices, eventually landing on a drink order.

## Let's Begin
If you haven't already, please set up your computer by following instructions in the *README.md* file.  To start, we need to set up some initial variables to ensure we can connect to our graph database.  We will be able to use these variables later in our notebook.

In [1]:
import os
from neo4j.v1 import GraphDatabase

# Connect to our Graph database, ensure connectivity, and store connection in variable.
graph = GraphDatabase.driver("bolt://localhost", auth=("neo4j", "123changeme"));

# Set up a local path reference
rel_path = os.getcwd()

# If you want to clear your database and start fresh, uncomment the line below.
# BE SURE TO CHECK WHAT DATABASE YOU ARE RUNNING THIS AGAINST.
# MORE CAPITAL LETTERS TO EMPHASIZE THE POINT ABOVE.
# with graph.session() as session: print(session.run("MATCH (d) DETACH DELETE (d)").value())

## Data CSV Files

Here we are setting up variables pointing to the CSV files we have stored on our machine.

In [29]:
# These files contain all of the potential question, answers, and connections between nodes
questions_file = os.path.join(rel_path, "questions.csv");
answers_file = os.path.join(rel_path, "answers.csv");
relationships_file = os.path.join(rel_path, "relationships.csv");

Let's see what the structure of the data looks like by using the LOAD CSV comment in Neo4J. We will load each file and show the row of data as an example.

**NOTE**: If you get an error running the command below, try commenting out the `dbms.directories.import` line in your graph databases configuration file.  You can access settings by clicking _Manage_ on your database in Neo4J and selecting the settings tab. Make sure to uncomment the line when you are done as this is not secure.

In [30]:
csv_query = """LOAD CSV FROM $file AS row RETURN row LIMIT 5 """
with graph.session() as session:
    print("Example questions:");
    display(session.run(csv_query, { "file": "file:" + questions_file }).value())
    print("Example answers:");
    display(session.run(csv_query, { "file": "file:" + answers_file }).value())
    print("Example relationships:");
    display(session.run(csv_query, { "file": "file:" + relationships_file }).value())

Example questions:


[['1', 'Is Holiday'],
 ['2', 'Is Frappucino'],
 ['3', 'Frappucino Flavor'],
 ['4', 'Size'],
 ['5', 'Is Espresso Based']]

Example answers:


[['1', 'Yes'],
 ['2', 'No'],
 ['3', 'Yes'],
 ['4', 'No'],
 ['5', 'Strawberries & Crème']]

Example relationships:


[['1', '1', 'true'],
 ['1', '2', 'true'],
 ['2', '3', 'true'],
 ['2', '4', 'true'],
 ['3', '5', 'true']]

The question, and answer files contains a simple list of IDs and values.  The relationships file contain pointer IDs to and from either a question or answer.  The third column in the relationships file signifies if the start node is a question.

## Importing Data

Now we need to take the data inside our CSV files and connect them in a graph database.  The following code will run through all files and create the nodes and relationships.  We will rely on Python to open our CSV file and loop through each line and entry in our file.  Note - there are various ways of doing this.  This method should not be used in a production environment demanding performance.  Try using the [`LOAD CSV`](https://neo4j.com/blog/bulk-data-import-neo4j-3-0/) command for bulk data importing.

Let this block run until you see 'Data Loaded!'.  It should only take a few seconds due to the low volume of data.  You can run this as many times as you want.  The queries generated utilize `MERGE` to ensure it only creates a node when it does not find one matching the properties list.

In [27]:
from pandas import *
from string import Template
import multiprocessing.dummy as mp

q_csv = read_csv(questions_file, header=None);
a_csv = read_csv(answers_file, header=None);
r_csv = read_csv(relationships_file, header=None);

q_template = 'MERGE (n:Question { id: $id, value: "$val" }) ';
a_template = 'MERGE (n:Answer { id: $id, value: "$val" }) ';
q_to_a_template = 'MATCH (q:Question { id: $from_id }) MATCH (a:Answer { id: $to_id }) MERGE (q)-[:IS_CLASSIFIED_BY]->(a) ';
a_to_q_template = 'MATCH (q:Question { id: $to_id }) MATCH (a:Answer { id: $from_id }) MERGE (a)-[:RESULTS_IN]->(q) ';
create_queries = [];
relate_queries = [];

for i, row in enumerate(q_csv.values):
    create_queries.append(Template(q_template).substitute(id=row[0], val=row[1]));

for i, row in enumerate(a_csv.values):
    create_queries.append(Template(a_template).substitute(id=row[0], val=row[1]));

for i, row in enumerate(r_csv.values):
    if row[2] is True:
        relate_queries.append(Template(q_to_a_template).substitute(from_id=row[0], to_id=row[1]));
    else:
        relate_queries.append(Template(a_to_q_template).substitute(from_id=row[0], to_id=row[1]));

print('Queries created... Running.');

with graph.session() as session:
    display(session.run("CREATE INDEX ON :Question(id,value)").summary().counters);
    display(session.run("CREATE INDEX ON :Answer(id,value)").summary().counters);
    display(session.run("CREATE CONSTRAINT ON (q:Question) ASSERT q.id IS UNIQUE").summary().counters);
    display(session.run("CREATE CONSTRAINT ON (a:Answer) ASSERT a.id IS UNIQUE").summary().counters);

def run_queries(q):
    with graph.session() as session:
        for i, query in enumerate(q):
            display(session.run(query).summary().counters)

run_queries(create_queries);
run_queries(relate_queries);

print('Data Loaded!');


Queries created... Running.


{}

{}

{}

{}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'labels_added': 1, 'nodes_created': 1, 'properties_set': 2}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

{'relationships_created': 1}

Data Loaded!


### Decision Tree Loaded

We should now have our entire dataset loaded into our graph database.  Let's run a quick snippet of code to check our import.  Access your Neo4J browser by selecting 'Manage' on your database and clicking the 'Open Browser' button.

Try running the following query: `MATCH (n) RETURN n`.  This query gives us everything in the database, nodes and relationships included.

You should get output resembling the following:

<div align="left">
    <img src="images/data_loaded.png" alt="Graph Model" width="700px" align="center"/>
</div>

#### What's Next?

Now that we have our base decision tree imported, we can start asking the data questions.

[Go to the next module >>](2%20-%20Exploring%20the%20Decision%20Tree.ipynb)