# Runway - Exploratory Data Analysis

This notebooks demonstrates how to use Runway's EDA module. It requires that you have an existing graph.

In [1]:
import os

from neo4j_runway.database.neo4j import Neo4jGraph
from neo4j_runway.graph_eda import GraphEDA

## Create a Neo4j Instance

In [2]:
g = Neo4jGraph(uri=os.environ.get("NEO4J_URI"), username=os.environ.get("NEO4J_USERNAME"), password=os.environ.get("NEO4J_PASSWORD"), database=os.environ.get("NEO4J_DATABASE"))

## GraphEDA 

In [3]:
eda = GraphEDA(g)

We can run analytical queries individually via the `GraphEDA` class. For example let's retrieve information on the data constraints.

In [4]:
eda.database_constraints()

Unnamed: 0,id,name,type,entityType,labelsOrTypes,properties,ownedIndex,propertyType
0,12,person_name,UNIQUENESS,NODE,[Person],[name],person_name,
1,14,toy_name,UNIQUENESS,NODE,[Toy],[name],toy_name,


When we run a quering method, the results are appended to an internal cache. By default we return the stored content, but we can choose to refresh the cache by providing `refresh=True`.

### Collecting Insights

We can run *all* the analytical queries in the `GraphEDA` class by calling the `run` method.

**This can be computationally intensive!**

WARNING: The methods in this module can be computationally expensive.
It is not recommended to use this module on massive Neo4j databases
(i.e., nodes and relationships in the hundreds of millions)

In [5]:
eda.run()

{'database_indexes': [{'id': 1,
   'name': 'index_343aff4e',
   'state': 'ONLINE',
   'populationPercent': 100.0,
   'type': 'LOOKUP',
   'entityType': 'NODE',
   'labelsOrTypes': None,
   'properties': None,
   'indexProvider': 'token-lookup-1.0',
   'owningConstraint': None,
   'lastRead': neo4j.time.DateTime(2024, 10, 25, 18, 59, 24, 339000000, tzinfo=<UTC>),
   'readCount': 3947},
  {'id': 2,
   'name': 'index_f7700477',
   'state': 'ONLINE',
   'populationPercent': 100.0,
   'type': 'LOOKUP',
   'entityType': 'RELATIONSHIP',
   'labelsOrTypes': None,
   'properties': None,
   'indexProvider': 'token-lookup-1.0',
   'owningConstraint': None,
   'lastRead': None,
   'readCount': 0},
  {'id': 11,
   'name': 'person_name',
   'state': 'ONLINE',
   'populationPercent': 100.0,
   'type': 'RANGE',
   'entityType': 'NODE',
   'labelsOrTypes': ['Person'],
   'properties': ['name'],
   'indexProvider': 'range-1.0',
   'owningConstraint': 'person_name',
   'lastRead': neo4j.time.DateTime(202

Now that we have our cache filled, let's see if there are any isolated nodes in the database.

In [6]:
eda.disconnected_node_ids()

Unnamed: 0,nodeLabel,nodeId
0,Test,40


We see we have a single isolated Test node. We can find this node by it's `nodeId` in the database.

## Reports

We can generate a report containing all the information we've gathered from our queries by calling `create_eda_report`. 

Some of the sections can become quite lengthy, so there are arguments to control the data that is returned.

In [7]:
eda.create_eda_report(include_disconnected_node_ids=True, include_unlabeled_node_ids=True, include_node_degrees=True, view_report=False)

"\n# Runway EDA Report\n\n## Database Information\n|    | databaseName   | databaseVersion   | databaseEdition   | APOCVersion   | GDSVersion    |\n|---:|:---------------|:------------------|:------------------|:--------------|:--------------|\n|  0 | neo4j          | 5.15.0            | enterprise        | 5.15.1        | not installed |\n\n### Counts\n|    |   nodeCount |   unlabeledNodeCount |   disconnectedNodeCount |   relationshipCount |\n|---:|------------:|---------------------:|------------------------:|--------------------:|\n|  0 |          20 |                    0 |                       1 |                  24 |\n\n### Indexes\n|    |   id | name           | state   |   populationPercent | type   | entityType   | labelsOrTypes   | properties   | indexProvider    | owningConstraint   | lastRead                            | readCount   |\n|---:|-----:|:---------------|:--------|--------------------:|:-------|:-------------|:----------------|:-------------|:-----------------

In [8]:
eda.view_report()


# Runway EDA Report

## Database Information
|    | databaseName   | databaseVersion   | databaseEdition   | APOCVersion   | GDSVersion    |
|---:|:---------------|:------------------|:------------------|:--------------|:--------------|
|  0 | neo4j          | 5.15.0            | enterprise        | 5.15.1        | not installed |

### Counts
|    |   nodeCount |   unlabeledNodeCount |   disconnectedNodeCount |   relationshipCount |
|---:|------------:|---------------------:|------------------------:|--------------------:|
|  0 |          20 |                    0 |                       1 |                  24 |

### Indexes
|    |   id | name           | state   |   populationPercent | type   | entityType   | labelsOrTypes   | properties   | indexProvider    | owningConstraint   | lastRead                            | readCount   |
|---:|-----:|:---------------|:--------|--------------------:|:-------|:-------------|:----------------|:-------------|:-----------------|:-------------------|:------------------------------------|:------------|
|  0 |    1 | index_343aff4e | ONLINE  |                 100 | LOOKUP | NODE         |                 |              | token-lookup-1.0 |                    | 2024-10-25T18:59:24.339000000+00:00 | 3,947       |
|  1 |    2 | index_f7700477 | ONLINE  |                 100 | LOOKUP | RELATIONSHIP |                 |              | token-lookup-1.0 |                    |                                     | 0           |
|  2 |   11 | person_name    | ONLINE  |                 100 | RANGE  | NODE         | ['Person']      | ['name']     | range-1.0        | person_name        | 2024-10-25T18:59:24.365000000+00:00 | 55          |
|  3 |   13 | toy_name       | ONLINE  |                 100 | RANGE  | NODE         | ['Toy']         | ['name']     | range-1.0        | toy_name           | 2024-10-25T18:59:24.338000000+00:00 | 28          |

### Constraints
|    |   id | name        | type       | entityType   | labelsOrTypes   | properties   | ownedIndex   | propertyType   |
|---:|-----:|:------------|:-----------|:-------------|:----------------|:-------------|:-------------|:---------------|
|  0 |   12 | person_name | UNIQUENESS | NODE         | ['Person']      | ['name']     | person_name  |                |
|  1 |   14 | toy_name    | UNIQUENESS | NODE         | ['Toy']         | ['name']     | toy_name     |                |

## Nodes Overview
### Label Counts
|    | label   |   count |
|---:|:--------|--------:|
|  0 | Person  |       5 |
|  1 | Pet     |       5 |
|  2 | Toy     |       5 |
|  3 | Address |       4 |
|  4 | Test    |       1 |
### Multi-Label Counts
|    | labelCombinations   |   nodeCount |
|---:|:--------------------|------------:|
|  0 | ['Test', 'Label2']  |           1 |
### Properties
|    | nodeLabels         | propertyName   | propertyTypes   | mandatory   |
|---:|:-------------------|:---------------|:----------------|:------------|
|  0 | ['Label2', 'Test'] | id             | ['String']      | True        |
|  1 | ['Person']         | name           | ['String']      | True        |
|  2 | ['Person']         | age            | ['Long']        | True        |
|  3 | ['Address']        | street         | ['String']      | True        |
|  4 | ['Address']        | city           | ['String']      | True        |
|  5 | ['Pet']            | name           | ['String']      | True        |
|  6 | ['Pet']            | kind           | ['String']      | True        |
|  7 | ['Toy']            | name           | ['String']      | True        |
|  8 | ['Toy']            | kind           | ['String']      | True        |


## Relationships Overview
### Type Counts
|    | relType     |   count |
|---:|:------------|--------:|
|  0 | KNOWS       |       9 |
|  1 | HAS_ADDRESS |       5 |
|  2 | HAS_PET     |       5 |
|  3 | PLAYS_WITH  |       5 |
### Properties
no relationship properties


## Unlabeled Nodes
no unlabeled nodes data in cache
## Disconnected Nodes
|    | nodeLabel   |   nodeId |
|---:|:------------|---------:|
|  0 | Test        |       40 |
## Node Degrees
* Top 5 Ordered By outDegree

|    |   nodeId | nodeLabel   |   inDegree |   outDegree |
|---:|---------:|:------------|-----------:|------------:|
|  0 |       41 | ['Person']  |          3 |           4 |
|  2 |       43 | ['Person']  |          3 |           4 |
|  3 |       45 | ['Person']  |          0 |           4 |
|  1 |       42 | ['Person']  |          3 |           4 |
|  4 |       44 | ['Person']  |          0 |           3 |
---

Runway v0.13.0

Report Generated @ 2024-10-29 13:34:19.353120


We can also save the report to a Markdown file.

In [9]:
eda.save_report("outputs/pets_runway_report.md")