# 1. Get To Know Your Graph

In this notebook, you'll connect to your Neo4j database, establish a driver, and perform basic exploratory data analysis (EDA) to understand the structure and content of your knowledge graph.

---

## Table of Contents
1. Connect to Neo4j
2. Establish the Driver
3. Basic Graph Statistics
4. EDA on Source Documents


## 1. Connect to Neo4j
Fill in your credentials or use your .env file.


In [None]:
%pip install -r requirements.txt

In [None]:
from neo4j import GraphDatabase
import os

NEO4J_URI = os.getenv('NEO4J_URI')
NEO4J_USER = os.getenv('NEO4J_USERNAME')
NEO4J_PASSWORD = os.getenv('NEO4J_PASSWORD')
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')

driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))



def run_query(query, parameters=None):
    with driver.session() as session:
        return list(session.run(query, parameters or {}))
print('Connected to Neo4j!')

Connected to Neo4j!


## 2. Basic Graph Statistics
Let's get a sense of the structure of your graph: node types, relationship types, and their counts.


In [25]:
# Call apoc.meta.stats() to get a summary of the graph
with driver.session() as session:
    result = session.run("CALL apoc.meta.stats()").single()
    apoc_stats = dict(result)

# View selected fields
print("Total node count:", apoc_stats['nodeCount'])
print("Total relationship count:", apoc_stats['relCount'])
print("Label count:", len(apoc_stats['labels']))
print("Relationship type count:", len(apoc_stats['relTypes']))

Total node count: 17785
Total relationship count: 774608
Label count: 7
Relationship type count: 24


## 3. EDA on Source Documents
Explore basic properties of your source documents, such as text length.


In [27]:
# Preview a few documents
for record in run_query('MATCH (d:Document) RETURN d.text AS text, d.title AS title LIMIT 3'):
    print(f"Title: {record['title']}Text (first 200 chars): {record['text'][:200]}")

# Text length stats
lengths = [record['length'] for record in run_query('MATCH (d:Document) RETURN size(d.text) AS length')]
if lengths:
    import numpy as np
    print(f'Mean length: {np.mean(lengths):.1f}')
    print(f'Max length: {np.max(lengths)}')
    print(f'Min length: {np.min(lengths)}')
else:
    print('No Document nodes with text found.')


Title: NoneText (first 200 chars): For numerical values passed in as parameters, Cypher does not take the size of the number into account. Cypher will therefore regard any exact numerical parameter as an INTEGER regardless of its decla
Title: NoneText (first 200 chars): Projecting graphs using Cypher (deprecated) This page describes the Legacy Cypher projection, which is deprecated. The replacement is to use the new Cypher projection, which is described in Projecting
Title: NoneText (first 200 chars): Using Legacy Cypher projections is a more flexible and expressive approach with diminished focus on performance compared to the native projections. Legacy Cypher projections are primarily recommended 
Mean length: 497.4
Max length: 2807
Min length: 1


---
You now have a basic understanding of your graph's structure and your source documents.
You're ready for deeper analytics and retrieval!
