# Single Agent over RAG + Knowledge Graph (Neo4j)

You will build a single agent with **two tools**:
1) `university_retriever` (RAG)
2) `neo4j_qa` (Knowledge Graph Q&A via NL→Cypher)

The agent selects the correct tool:
- fuzzy text questions → RAG tool
- relationship/path questions → KG tool

Includes a **tool-selection tracing** cell to show why the agent chose each tool.


## 1) Install dependencies

In [1]:
!pip -q install -U langchain langchain-community langchain-openai langchain-neo4j neo4j faiss-cpu tiktoken

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/102.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.8/102.8 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m38.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.6/84.6 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m313.2/313.2 kB[0m [31m21.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m90.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m54.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m476.0/476.0 kB[0m [31m32.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[

## 2) Configure OpenAI and Neo4j Aura credentials

In [4]:
import os, getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key: ")

NEO4J_URI = "neo4j+s://ead18442.databases.neo4j.io"
NEO4J_USER = "neo4j"
NEO4J_PASSWORD = "fz1Hzz1Z-Gzh8zYDwlfujRKHhTM9zq4eC5QAYp1YGWY"


## 3) Connect to Neo4j and seed a tiny University KG (optional)
If you already have the University graph loaded, you can skip seeding.

In [5]:
from langchain_neo4j import Neo4jGraph

graph = Neo4jGraph(url=NEO4J_URI, username=NEO4J_USER, password=NEO4J_PASSWORD)

# Optional seed (idempotent). Comment out if you don't want writes.
graph.query("""
MERGE (person:Class {name:"Person"})
MERGE (student:Class {name:"Student"})
MERGE (phd:Class {name:"PhDStudent"})
MERGE (academic:Class {name:"Academic"})
MERGE (prof:Class {name:"Professor"})
MERGE (student)-[:SUBCLASS_OF]->(person)
MERGE (phd)-[:SUBCLASS_OF]->(student)
MERGE (academic)-[:SUBCLASS_OF]->(person)
MERGE (prof)-[:SUBCLASS_OF]->(academic)
MERGE (alice:Person {id:"Alice"})
MERGE (bob:Person {id:"Bob"})
MERGE (g:Group {name:"EnergyAIGroup"})
MERGE (alice)-[:TYPE]->(phd)
MERGE (alice)-[:TYPE]->(student)
MERGE (alice)-[:TYPE]->(person)
MERGE (bob)-[:TYPE]->(prof)
MERGE (bob)-[:TYPE]->(academic)
MERGE (bob)-[:TYPE]->(person)
MERGE (alice)-[:SUPERVISED_BY]->(bob)
MERGE (bob)-[:MEMBER_OF]->(g)
MERGE (alice)-[:AFFILIATED_WITH]->(g)
""")

graph.refresh_schema()
print(graph.schema)


Node properties:
Person {id: STRING, profile: STRING, embedding: LIST}
Class {name: STRING}
Group {name: STRING, description: STRING, embedding: LIST}
Relationship properties:

The relationships:
(:Person)-[:TYPE]->(:Class)
(:Person)-[:AFFILIATED_WITH]->(:Group)
(:Person)-[:SUPERVISED_BY]->(:Person)
(:Person)-[:MEMBER_OF]->(:Group)
(:Class)-[:SUBCLASS_OF]->(:Class)


## 4) Build the RAG retriever tool
This tool answers fuzzy text questions like: “What does EnergyAIGroup focus on?”

In [18]:
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain.tools import tool

docs = [
    Document(page_content="Alice is a PhD student researching neuro-symbolic AI, knowledge graphs, and reasoning systems.",
             metadata={"entity":"Alice","type":"Person"}),
    Document(page_content="Bob is a Professor working on ontology engineering and graph-based AI. He supervises students.",
             metadata={"entity":"Bob","type":"Person"}),
    Document(page_content="EnergyAIGroup focuses on AI for sustainability, energy forecasting, optimisation, and climate modelling.",
             metadata={"entity":"EnergyAIGroup","type":"Group"}),
]

emb = OpenAIEmbeddings(model="text-embedding-3-small")
chunks = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=30).split_documents(docs)
vs = FAISS.from_documents(chunks, emb)
retriever = vs.as_retriever(search_kwargs={"k": 3})

@tool
def university_retriever(query: str) -> str:
    """Retrieve relevant University text snippets for a query."""
    hits = retriever.invoke(query)
    return "\n\n".join([f"[{d.metadata}] {d.page_content}" for d in hits])


## 5) Build the KG tool (Natural Language → Cypher)
We use GraphCypherQAChain. LangChain requires an explicit safety acknowledgement.

In [51]:
from langchain_openai import ChatOpenAI
from langchain_neo4j import GraphCypherQAChain
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o", temperature=0)

CYPHER_PROMPT = ChatPromptTemplate.from_template("""
You generate Cypher for Neo4j using the provided schema.
Rules:
- READ ONLY (MATCH/RETURN). Never write (no CREATE/MERGE/DELETE/SET).
- Return scalar properties, NOT full nodes.
  Example: RETURN supervisor.id AS supervisor
- Prefer `id` or `name` properties when returning people/groups.

Schema:
{schema}

Question:
{question}

Cypher:
""")

kg_chain = GraphCypherQAChain.from_llm(
    llm,
    graph=graph,
    # cypher_prompt=CYPHER_PROMPT,
    verbose=False,
    validate_cypher=True,
    use_function_response=True,
    allow_dangerous_requests=True
)

@tool
def neo4j_qa(question: str) -> str:
    """Answer relationship/path questions using Neo4j (NL→Cypher)."""
    res = kg_chain.run({"query": question})

    # Print once while debugging (remove later)
    print("RAW kg_chain.invoke:", res)

    if isinstance(res, str):
        return res

    for key in ["result", "output", "answer"]:
        if key in res and isinstance(res[key], str) and res[key].strip():
            return res[key]

    # If we get here, return the whole thing for visibility
    return str(res)


## 6) Create a v1 ReAct agent with both tools

In [40]:
from langchain.agents import create_agent
from langchain.messages import HumanMessage

tools = [university_retriever, neo4j_qa]

agent = create_agent(
    llm,
    tools=tools,
    system_prompt=(
        "You are a university assistant. Choose the best tool:\n"
        "- Use neo4j_qa for relationship/path/type questions "
        "(supervises, member of, affiliated with, types, paths).\n"
        "- Use university_retriever for fuzzy descriptive questions "
        "(focus area, research interests).\n"
        "Always cite evidence from tools. If unsure, ask for clarification."
    )
)

## 7) Run a few questions

In [52]:
agent.invoke(
    {"messages": [HumanMessage(content="Who supervises Alice?")]},
)

RAW kg_chain.invoke: {'query': 'Who supervises Alice?', 'result': 'Alice is supervised by Bob, a professor working on artificial intelligence, ontology engineering, and graph-based systems.'}


{'messages': [HumanMessage(content='Who supervises Alice?', additional_kwargs={}, response_metadata={}, id='f2b42e1d-b41d-4c9c-b1b1-7f7755d23cba'),
  AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 21, 'prompt_tokens': 152, 'total_tokens': 173, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_83554c687e', 'id': 'chatcmpl-CnIwnYBFzgVKj0Ya8IoPfl1Tp8XV4', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--019b25df-2b32-7d43-b7e3-dc79678205bb-0', tool_calls=[{'name': 'neo4j_qa', 'args': {'question': 'Who supervises Alice?'}, 'id': 'call_VXuq0AmDgBaLN1oV0MYY04xH', 'type': 'tool_call'}], usage_metadata={'input_tokens': 152, 'output_tokens': 21, 't

In [15]:
agent.invoke(
    {"messages": [HumanMessage(content="What does EnergyAIGroup focus on? Use evidence.")]},
)

{'messages': [HumanMessage(content='What does EnergyAIGroup focus on? Use evidence.', additional_kwargs={}, response_metadata={}, id='7ae0bfbb-23ec-433b-a7ca-f786a3c851fa'),
  AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 21, 'prompt_tokens': 158, 'total_tokens': 179, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_11f3029f6b', 'id': 'chatcmpl-CnISRWX9COconZUfEwMiYL3WAHZOm', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--019b25c2-75f7-7a53-8e5b-42a038da2e6a-0', tool_calls=[{'name': 'university_retriever', 'args': {'query': 'EnergyAIGroup focus area'}, 'id': 'call_bKrSMcjFO9BsVCpr6VR9zea0', 'type': 'tool_call'}], usage_metadata={'

In [23]:
agent.invoke(
    {"messages": [HumanMessage(content="Which research group is Alice affiliated with?")]},
)

{'messages': [HumanMessage(content='Which research group is Alice affiliated with?', additional_kwargs={}, response_metadata={}, id='44113d4a-81db-4611-bbf7-80f8e4240e52'),
  AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 24, 'prompt_tokens': 155, 'total_tokens': 179, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_83554c687e', 'id': 'chatcmpl-CnIVaQmW0eJ8JvotHET0RE63Uvdqn', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--019b25c5-6f6f-7aa3-91af-374752315e78-0', tool_calls=[{'name': 'neo4j_qa', 'args': {'question': 'Which research group is Alice affiliated with?'}, 'id': 'call_w6M5KMW5K11jUkKwjVD7Dmok', 'type': 'tool_call'}], usage_meta

## 8) Tool-selection tracing (why did it choose RAG vs KG?)
This cell shows the tool calls the agent made, the input sent to each tool, and the output snippet.

In [28]:
from langchain.messages import AIMessage, ToolMessage

def trace_agent(question: str):
    print("\n" + "=" * 90)
    print("QUESTION:", question)
    print("=" * 90)

    # Invoke v1 agent (message-based)
    res = agent.invoke({
        "messages": [HumanMessage(content=question)]
    })

    messages = res.get("messages", [])

    print("\nTOOL TRACE:")

    step = 1
    for i, msg in enumerate(messages):
        # 1) Model deciding to call a tool
        if isinstance(msg, AIMessage) and msg.tool_calls:
            for call in msg.tool_calls:
                print(f"\nStep {step}:")
                print("  Tool chosen:", call["name"])
                print("  Tool input :", call["args"])
                step += 1

        # 2) Tool returning evidence
        if isinstance(msg, ToolMessage):
            print("  Tool output snippet:")
            print(" ", msg.content.replace("\n", " ")[:300])

    # 3) Final model answer = last AIMessage without tool calls
    final_answer = None
    for msg in reversed(messages):
        if isinstance(msg, AIMessage) and not msg.tool_calls:
            final_answer = msg.content
            break

    print("\nFINAL ANSWER:")
    print(final_answer)

In [29]:
for q in [
    "Who supervises Alice?",
    "What does EnergyAIGroup focus on?",
    "Which classes is Bob an instance of?",
    "Who works on neuro-symbolic reasoning?",
]:
    trace_agent(q)


QUESTION: Who supervises Alice?

TOOL TRACE:

Step 1:
  Tool chosen: neo4j_qa
  Tool input : {'question': 'Who supervises Alice?'}
  Tool output snippet:
  I don't know the answer.

FINAL ANSWER:
I couldn't find information on who supervises Alice. If you have more context or details, please provide them, and I'll try to assist you further.

QUESTION: What does EnergyAIGroup focus on?

TOOL TRACE:

Step 1:
  Tool chosen: university_retriever
  Tool input : {'query': 'EnergyAIGroup focus area'}
  Tool output snippet:
  [{'entity': 'EnergyAIGroup', 'type': 'Group'}] EnergyAIGroup focuses on AI for sustainability, energy forecasting, optimisation, and climate modelling.  [{'entity': 'Alice', 'type': 'Person'}] Alice is a PhD student researching neuro-symbolic AI, knowledge graphs, and reasoning systems.  [{'entity':

FINAL ANSWER:
The EnergyAIGroup focuses on AI for sustainability, energy forecasting, optimization, and climate modeling.

QUESTION: Which classes is Bob an instance of?

TO

## Reflection
- Which questions go to which tool, and why?
- What risks exist with NL→Cypher?
- Why is this more reliable than a pure RAG system?
