# Knowledge Graph RAG Walkthrough

This notebook demonstrates how to query the documentation-driven knowledge graph using the Retrieval Augmented Generation (RAG) pipeline. It assumes you have already ingested one or more markdown files into the project workspace.

In [2]:
cd ..

/home/coder/CTC


## 1. Environment Setup

The RAG helper runs locally against the markdown source, so it does not require Neo4j credentials. If you plan to persist results to Neo4j, export `NEO4J_URL`, `NEO4J_USERNAME`, and `NEO4J_PASSWORD` before running the notebook.

In [3]:
import logging
from pathlib import Path

from pipelines import run_knowledge_graph_rag_pipeline, run_rule_based_pipeline

logging.basicConfig(level=logging.INFO, format="%(levelname)s:%(name)s:%(message)s")

## 2. (Optional) Parse or Ingest Additional Documentation

If you want to materialize the knowledge graph before querying, you can call the rule-based pipeline directly. This step parses the specified markdown file and optionally writes it to Neo4j when credentials are provided.

In [4]:
README_PATH = Path("ctc-data-translated/readme-en.md")

# Uncomment to run the rule-based pipeline without touching Neo4j.
# graph = run_rule_based_pipeline(README_PATH)
# len(graph.nodes), len(graph.relationships)

## 3. Ask a Question with the RAG Pipeline

Provide a natural-language question. The pipeline parses the configured markdown, finds relevant nodes and relationships, builds a prompt, and queries the LLM client.

In [5]:
QUESTION = "How does the system validate form submissions before persistence?"

rag_result = run_knowledge_graph_rag_pipeline(
    QUESTION,
    readme_path=README_PATH,
    top_k_nodes=6,
    max_neighbor_nodes=12,
    relationship_limit=24,
)

KG RAG pipeline:   0%|          | 0/5 [00:00<?, ?step/s, Resolving input]INFO:pipelines.rag:RAG pipeline using input (explicit path): ctc-data-translated/readme-en.md
KG RAG pipeline:  20%|██        | 1/5 [00:00<00:00, 2242.94step/s, Parsing input]INFO:pipelines.rag:Parsed 120 nodes and 204 relationships
KG RAG pipeline:  40%|████      | 2/5 [00:00<00:00, 407.79step/s, Selecting graph context]INFO:pipelines.rag:Context prepared with 6 primary nodes, 3 neighbor nodes, and 15 relationships
KG RAG pipeline:  60%|██████    | 3/5 [00:00<00:00, 446.39step/s, Neo4j load skipped]     INFO:pipelines.rag:Neo4j connection details missing; skipping load
KG RAG pipeline:  80%|████████  | 4/5 [00:00<00:00, 569.59step/s, Request completion]INFO:httpx:HTTP Request: POST https://genapi.ntq.ai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:pipelines.rag:LLM response received (1438 characters)
KG RAG pipeline: 100%|██████████| 5/5 [00:03<00:00,  1.34step/s, Request completion] 


## 4. Inspect the Answer and Context

The result object exposes the generated answer, the exact prompt sent to the LLM, and the graph context underlying the response. You can adjust the formatting or explore additional fields to debug or enrich the chain.

In [6]:
print(rag_result.answer)

The system validates form submissions before persistence through the following mechanisms, as inferred from the provided context:

- **Form Field Mapping**: The `Main Form` contains fields such as `ankenNo`, `keiyakuKey`, and `actionType`, which are mapped to corresponding `VOProperty` fields (`ankenNo`, `keiyakuKey`, and `actionType`). This mapping ensures that the data entered into the form is correctly aligned with the data model for persistence.

- **Source Table Reference**: The `Form_to_VO_Mapping_Main_Form_Table` serves as a source of truth for how form fields map to VO properties. This table likely contains validation rules or constraints that are applied during form submission.

- **Validation via Binding**: Each `FormField` (e.g., `ankenNo`, `keiyakuKey`, `actionType`) is connected to a `VOProperty` via the `BINDS_TO` relationship. This implies that the system checks the form data against the properties defined in the VO (Value Object) during submission, ensuring data integri

In [7]:
print(rag_result.context)

Primary nodes
- form:anken_cardForm / anken_cardForm | labels=Form | props=description=Main Form; section=Form to VO Mapping
- form:keiyaku_cardForm / keiyaku_cardForm | labels=Form | props=description=Main Form; section=Form to VO Mapping
- source:table:Form_to_VO_Mapping_Main_Form_Table / Main Form Table | labels=Source,TableSource | props=data_type=table; section=Form to VO Mapping
- formfield:Main Form:ankenNo / ankenNo | labels=FormField | props=data_type=Long; form_group=Main Form; purpose=Project/case number; section=Form to VO Mapping
- vo_property:ankenNo / ankenNo | labels=VOProperty | props=section=Form to VO Mapping
- formfield:Main Form:keiyakuKey / keiyakuKey | labels=FormField | props=data_type=Long; form_group=Main Form; purpose=Contract key; section=Form to VO Mapping
Connected context nodes
- vo_property:keiyakuKey / keiyakuKey | labels=VOProperty | props=section=Form to VO Mapping
- formfield:Main Form:actionType / actionType | labels=FormField | props=data_type=Stri

In [8]:
rag_result.primary_node_ids, rag_result.neighbor_node_ids

(['form:anken_cardForm',
  'form:keiyaku_cardForm',
  'source:table:Form_to_VO_Mapping_Main_Form_Table',
  'formfield:Main Form:ankenNo',
  'vo_property:ankenNo',
  'formfield:Main Form:keiyakuKey'],
 ['vo_property:keiyakuKey',
  'formfield:Main Form:actionType',
  'vo_property:actionType'])

## 5. Next Steps

- Tweak `top_k_nodes`, `max_neighbor_nodes`, and `relationship_limit` to control prompt size.
- Switch to `run_llm_pipeline` for diagram-to-Cypher generation when Mermaid diagrams are present.
- Persist graph data to Neo4j by supplying credentials to the pipelines.