<a href="https://colab.research.google.com/github/Vasundhara-Shukla/neo4j/blob/main/Neo4j.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# From Unstructured Text to Knowledge Graph in 5 Minutes

Most LLMs return text.  
But applications need structure.

In this notebook, we’ll:
- Extract entities & relationships from raw text
- Insert them into Neo4j
- Query them like a real knowledge graph

Let’s turn text into connected intelligence.


In [None]:
!pip install neo4j

from neo4j import GraphDatabase




## Using Neo4j AuraDB



In [None]:
# Import the Colab userdata module to access secrets
from google.colab import userdata

# Retrieve credentials from Colab secrets
URI = userdata.get('NEO4J_URI')
USERNAME = userdata.get('NEO4J_USERNAME')
PASSWORD = userdata.get('NEO4J_PASSWORD')

# Re-establish the connection to Neo4j with the new credentials
driver = GraphDatabase.driver(URI, auth=(USERNAME, PASSWORD))

print("Neo4j driver re-initialized with new credentials.")

Neo4j driver re-initialized with new credentials.


Imagine this came from:
- A news article
- A PDF
- An LLM extraction pipeline


In [None]:
text = """
Satya Nadella is the CEO of Microsoft.
Microsoft acquired LinkedIn in 2016.
LinkedIn is headquartered in Sunnyvale.
"""


In [None]:
triples = [
    ("Satya Nadella", "CEO_OF", "Microsoft"),
    ("Microsoft", "ACQUIRED", "LinkedIn"),
    ("LinkedIn", "HEADQUARTERED_IN", "Sunnyvale")
]


In production, triples could be extracted using:
- An LLM
- spaCy
- A structured extraction model

Here we mock them for clarity.


In [None]:
def insert_triple(tx, subject, relation, obj):
    query = f"""
    MERGE (a:Entity {{name: $subject}})
    MERGE (b:Entity {{name: $object}})
    MERGE (a)-[:{relation}]->(b)
    """
    tx.run(query, subject=subject, object=obj)

with driver.session() as session:
    for s, r, o in triples:
        session.execute_write(insert_triple, s, r, o)

After updating the secrets and running the cell above, you can re-run the cell that inserts the triples.

Why `MERGE`?

It prevents duplication, which is essential when building knowledge graphs
from messy real-world data.


In [None]:
def query_graph(tx):
    result = tx.run("""
        MATCH (a)-[r]->(b)
        RETURN a.name AS subject, type(r) AS relation, b.name AS object
    """)
    return [record.data() for record in result]

with driver.session() as session:
    results = session.execute_read(query_graph)

results

[{'subject': 'Satya Nadella', 'relation': 'CEO_OF', 'object': 'Microsoft'},
 {'subject': 'Microsoft', 'relation': 'ACQUIRED', 'object': 'LinkedIn'},
 {'subject': 'LinkedIn',
  'relation': 'HEADQUARTERED_IN',
  'object': 'Sunnyvale'}]

## Why This Matters

We have just turned text into structured, queryable knowledge.

Unlike vector-only systems:
- Relationships are explicit
- Queries are deterministic
- Results are explainable

This is the foundation of:
- Knowledge Graph-powered RAG
- Enterprise AI
- Fraud detection
- Supply chain analysis

Text becomes connected intelligence.
