A deployable Order-to-Cash context graph built with Next.js 15, PostgreSQL, and a guarded LLM query layer.
The app ingests the provided SAP-style JSONL exports, normalizes them into business entities, projects them into a graph, visualizes the graph in a split-pane UI, and answers natural-language questions with dataset-backed results.
This repository implements four main capabilities:
- Graph construction from fragmented O2C source files
- Graph exploration with searchable, inspectable entity neighborhoods
- Natural-language querying over a constrained semantic layer
- Guardrails that keep answers grounded in the provided dataset only
The project supports two runtime modes:
- In-memory mode:
- no Postgres required
- graph explorer works
- deterministic assignment queries work
- Full mode:
- Postgres + Gemini configured
- semantic SQL executes against the database
- broader NL-to-SQL fallback is enabled
| Technology | Role in this project |
|---|---|
| Next.js 15 | Full-stack application framework, routing, API endpoints, production build |
| React 19 | Frontend UI rendering |
| TypeScript | Type safety across UI, API, and data/modeling layers |
| PostgreSQL | Primary persistent database for normalized data, graph projection, and semantic views |
| Drizzle ORM | Database schema definition, seeding, and DB access |
pg |
PostgreSQL connection pool and query execution |
| Cytoscape.js | Graph visualization and interaction |
AI SDK (ai) |
Structured LLM integration layer |
@ai-sdk/google |
Gemini provider for SQL planning and grounded answer synthesis |
| Zod | Validation for request bodies and LLM structured outputs |
pgsql-ast-parser |
SQL AST parsing and query guardrails |
| Technology | Role |
|---|---|
| Vitest | Unit/service tests |
| React Testing Library | Component testing |
| Playwright | End-to-end browser tests |
| ESLint | Static linting |
| Docker Compose | Local Postgres + Adminer runtime |
flowchart TB
subgraph Presentation["Presentation Layer - Next.js React"]
Page["page.tsx"]
Workspace["O2CWorkspace.tsx"]
Canvas["GraphCanvas.tsx"]
Table["ResultTable.tsx"]
Page --> Workspace
Workspace --> Canvas
Workspace --> Table
end
subgraph API["API Layer"]
SearchAPI["/api/graph/search"]
NeighborAPI["/api/graph/neighborhood"]
QueryAPI["/api/query"]
end
subgraph Services["Service Layer"]
GraphService["graph/service.ts"]
QueryService["query/service.ts"]
Templates["query/templates.ts"]
Guardrails["query/guardrails.ts"]
Catalog["query/semantic-catalog.ts"]
end
subgraph Data["Data Layer"]
Client["db/client.ts"]
Schema["db/schema.ts"]
Views["db/views.ts"]
Dataset["o2c/dataset.ts"]
Seed["db/seed.ts"]
end
subgraph External["External Systems"]
PG[(PostgreSQL)]
Gemini[("Gemini AI")]
JSONL[("JSONL Files")]
end
Workspace --> SearchAPI
Workspace --> NeighborAPI
Workspace --> QueryAPI
SearchAPI --> GraphService
NeighborAPI --> GraphService
QueryAPI --> QueryService
QueryService --> Templates
QueryService --> Guardrails
QueryService --> Catalog
QueryService --> Gemini
GraphService --> Client
GraphService --> Dataset
QueryService --> Client
Client --> PG
Dataset --> JSONL
Seed --> Schema
Schema --> PG
Views --> PG
erDiagram
graph_nodes {
text node_id PK
text node_type
text entity_id
text label
text subtitle
jsonb metadata
}
graph_edges {
text edge_id PK
text source_node_id FK
text target_node_id FK
text relation
jsonb metadata
}
v_entity_lookup {
text node_id
text node_type
text entity_id
text label
text subtitle
text search_text
}
graph_nodes ||--o{ graph_edges : source
graph_nodes ||--o{ graph_edges : target
graph_nodes ||--|| v_entity_lookup : indexes
flowchart TB
Customer((CUSTOMER))
Address((ADDRESS))
Product((PRODUCT))
Plant((PLANT))
SalesOrder((SALES_ORDER))
SalesOrderItem((SALES_ORDER_ITEM))
Delivery((DELIVERY))
DeliveryItem((DELIVERY_ITEM))
BillingDoc((BILLING_DOC))
BillingItem((BILLING_ITEM))
JournalEntry((JOURNAL_ENTRY))
Payment((PAYMENT))
Customer -->|HAS_ADDRESS| Address
SalesOrder -->|PLACED_BY| Customer
SalesOrder -->|HAS_ITEM| SalesOrderItem
SalesOrderItem -->|FOR_PRODUCT| Product
SalesOrderItem -->|REQUESTS_FROM| Plant
SalesOrderItem -->|FULFILLED_BY| DeliveryItem
Delivery -->|HAS_ITEM| DeliveryItem
DeliveryItem -->|SHIPPED_FROM| Plant
DeliveryItem -->|BILLED_AS| BillingItem
BillingDoc -->|HAS_ITEM| BillingItem
BillingDoc -->|BILLED_TO| Customer
BillingItem -->|FOR_PRODUCT| Product
BillingItem -->|POSTED_TO| JournalEntry
JournalEntry -->|CLEARED_BY| Payment
stateDiagram-v2
[*] --> ordered_not_delivered
ordered_not_delivered --> delivered_not_billed
delivered_not_billed --> billed_not_posted
billed_not_posted --> posted_not_cleared
posted_not_cleared --> complete
complete --> [*]
flowchart LR
JSONL["JSONL Files<br/>17 directories"]
Load["loadNormalizedDataset"]
Build["buildGraph"]
Seed["seedDatabase"]
Tables["PostgreSQL Tables"]
Views["21 Analytical Views"]
JSONL --> Load
Load --> Build
Build --> Seed
Seed --> Tables
Seed --> Views
flowchart TD
Start(["Start: nodeId, depth"])
Init["visitedNodes = Set nodeId<br/>visitedEdges = Set empty<br/>frontier = nodeId"]
Loop{"iteration < depth?"}
Query["SELECT edges WHERE<br/>source OR target IN frontier"]
Process["Add edges to visitedEdges<br/>Add nodes to visitedNodes<br/>Update frontier"]
FetchNodes["SELECT nodes WHERE<br/>node_id IN visitedNodes"]
FetchEdges["SELECT edges WHERE<br/>edge_id IN visitedEdges"]
Return(["Return nodes and edges"])
Start --> Init
Init --> Loop
Loop -->|Yes| Query
Query --> Process
Process --> Loop
Loop -->|No| FetchNodes
FetchNodes --> FetchEdges
FetchEdges --> Return
flowchart TD
Input["User Message"]
Template{"Pattern<br/>Match?"}
TemplateSQL["Execute Template SQL"]
Domain{"In Domain?"}
Refuse["Return Refusal"]
HasLLM{"Gemini<br/>Configured?"}
NotConfigured["Return Not Configured"]
Generate["generateSqlPlan"]
Validate["validateReadOnlySql"]
Execute["executeSql"]
Repair["repairSqlPlan"]
Answer["buildGroundedAnswer"]
Extract["deriveRelatedNodeIds"]
Response["QueryResponseBody"]
Input --> Template
Template -->|Yes| TemplateSQL
Template -->|No| Domain
Domain -->|No| Refuse
Domain -->|Yes| HasLLM
HasLLM -->|No| NotConfigured
HasLLM -->|Yes| Generate
Generate --> Validate
Validate --> Execute
Execute -->|Error| Repair
Repair --> Validate
Execute -->|Success| Answer
TemplateSQL --> Answer
Answer --> Extract
Extract --> Response
flowchart TB
subgraph src["src/"]
subgraph app["app/"]
page["page.tsx"]
layout["layout.tsx"]
globals["globals.css"]
subgraph api["api/"]
subgraph graph["graph/"]
search["search/route.ts"]
neighborhood["neighborhood/route.ts"]
end
query["query/route.ts"]
end
end
subgraph components["components/"]
O2C["O2CWorkspace.tsx"]
Canvas["GraphCanvas.tsx"]
Result["ResultTable.tsx"]
end
subgraph db["db/"]
client["client.ts"]
schema["schema.ts"]
views["views.ts"]
seed["seed.ts"]
end
subgraph lib["lib/"]
env["env.ts"]
constants["constants.ts"]
contracts["contracts.ts"]
subgraph graphLib["graph/"]
graphService["service.ts"]
end
subgraph queryLib["query/"]
queryService["service.ts"]
templates["templates.ts"]
guardrails["guardrails.ts"]
catalog["semantic-catalog.ts"]
end
subgraph o2c["o2c/"]
dataset["dataset.ts"]
end
end
end
flowchart TB
subgraph Master["Master Data"]
customers
addresses
products
plants
end
subgraph Trans["Transactions"]
sales_orders
sales_order_items
deliveries
delivery_items
billing_documents
billing_items
journal_entries
payments
end
subgraph Assign["Assignments"]
customer_company_assignments
customer_sales_area_assignments
product_plants
product_storage_locations
end
subgraph Graph["Graph Tables"]
graph_nodes
graph_edges
end
subgraph Views["Key Views"]
v_entity_lookup
v_o2c_flow_item
v_billing_trace
v_customers
v_products
end
Master --> Trans
Trans --> Graph
Master --> Graph
Graph --> Views
flowchart LR
subgraph Edges["Graph Edge Types"]
E1["HAS_ADDRESS"]
E2["PLACED_BY"]
E3["HAS_ITEM"]
E4["FOR_PRODUCT"]
E5["REQUESTS_FROM"]
E6["FULFILLED_BY"]
E7["SHIPPED_FROM"]
E8["BILLED_AS"]
E9["BILLED_TO"]
E10["POSTED_TO"]
E11["CLEARED_BY"]
end
| Layer | Main files | Responsibility |
|---|---|---|
| Dataset normalization | src/lib/o2c/dataset.ts |
Parse JSONL, normalize IDs, derive flow states, build graph data |
| Database schema | src/db/schema.ts |
Define normalized tables and graph projection tables |
| Semantic SQL layer | src/db/views.ts |
Create read-only analytical views for safe querying |
| Graph services | src/lib/graph/service.ts |
Search nodes and fetch node neighborhoods |
| Query services | src/lib/query/service.ts, src/lib/query/templates.ts, src/lib/query/guardrails.ts |
Translate NL -> safe query -> grounded answer |
| UI | src/components/O2CWorkspace.tsx, src/components/GraphCanvas.tsx |
Graph explorer, inspector, chat, SQL preview, rows preview |
| API routes | src/app/api/* |
Expose graph and query endpoints |
| Area | What is being evaluated | How this project answers it |
|---|---|---|
| Code quality and architecture | Structure, readability, and maintainability | The codebase is split by concern: normalization (src/lib/o2c/dataset.ts), schema/views (src/db/*), graph services (src/lib/graph/service.ts), query logic (src/lib/query/*), API routes (src/app/api/*), and UI (src/components/*). This keeps business modeling, SQL safety, and frontend behavior separate and maintainable. |
| Graph modelling | Quality and clarity of entities and relationships | The graph is built from explicit business entities such as customers, products, plants, sales orders, deliveries, billing documents, journal entries, and payments. Relationships are named with business meaning like PLACED_BY, FULFILLED_BY, BILLED_AS, POSTED_TO, and CLEARED_BY, so the graph mirrors the real O2C flow instead of being a generic node network. |
| Database / storage choice | Architectural decisions and tradeoffs | The project uses PostgreSQL, not a separate graph database. The tradeoff is intentional: Postgres keeps normalized tables, graph projection tables, and semantic query views in one free-tier-friendly system. This is simpler to deploy and audit than adding Neo4j or another graph store for a fixed assignment dataset. |
| LLM integration and prompting | How natural language is translated into useful queries | The query pipeline is hybrid. It first tries deterministic templates for high-value business questions, then falls back to Gemini only when needed. The LLM is prompted against a narrow allowlisted semantic layer, not raw tables, which makes NL-to-SQL translation more reliable and easier to validate. |
| Guardrails | Ability to restrict misuse and off-topic prompts | Guardrails operate at two levels: domain refusal and SQL refusal. Off-topic prompts are rejected with a fixed dataset-only message. In-domain SQL is parsed with pgsql-ast-parser and blocked unless it is a single read-only allowlisted query with a LIMIT, allowed relations, and allowed columns. |
The project models the O2C flow as normalized business entities:
customersaddressesproductsplantssales_orderssales_order_itemsdeliveriesdelivery_itemsbilling_documentsbilling_itemsjournal_entriespayments
Supporting lookup tables retained in v1:
customer_company_assignmentscustomer_sales_area_assignmentsproduct_plantsproduct_storage_locations
Graph projection tables:
graph_nodesgraph_edges
- Item IDs are canonicalized:
000010becomes10
- Billing headers and billing cancellations are merged into one billing-document layer
- Dates are converted into typed timestamps when inserted into Postgres
- Flow status is computed explicitly per item
ordered_not_delivereddelivered_not_billedbilled_not_postedposted_not_clearedcomplete
- customer
- address
- product
- plant
- sales_order
- sales_order_item
- delivery
- delivery_item
- billing_document
- billing_item
- journal_entry
- payment
HAS_ADDRESSPLACED_BYHAS_ITEMFOR_PRODUCTREQUESTS_FROMFULFILLED_BYSHIPPED_FROMBILLED_TOBILLED_ASPOSTED_TOCLEARED_BY
Examples:
sales_order:740552sales_order_item:740552:10delivery:80738072billing_document:90504248customer:320000083
The LLM never queries raw operational tables directly by default.
Instead, the app exposes a smaller semantic layer of allowlisted views:
v_o2c_flow_item- sales-order-item grain
- used for flow completeness analysis
v_billing_trace- billing-item grain
- used for billing trace and top-product billing analysis
v_entity_lookup- graph node search layer
v_customer_growth- customer/date level growth analysis
v_customer_status- customer block status analysis
This design reduces hallucinated joins and makes the query surface safer and easier to audit.
The natural-language query logic is a guarded hybrid pipeline.
- Validate request shape at
POST /api/query - Load the normalized dataset
- Classify whether the prompt is in-domain
- Try deterministic query templates first
- If a template matches:
- produce known SQL over allowlisted semantic views
- or execute equivalent logic in memory when DB is absent
- If no template matches and both Postgres + Gemini are configured:
- ask Gemini to generate SQL over allowlisted views only
- Parse and validate generated SQL with AST guardrails
- Execute read-only SQL with a short statement timeout
- Build the final answer only from the returned rows
- Return:
- answer
- SQL
- rows preview
- related node IDs
- diagnostics
- Top products by billing-document count
- Trace the full flow of a billing document
- Incomplete or broken sales orders
- Customer growth over time
- Percentage of customers currently blocked
This design is stronger than unrestricted NL-to-SQL over raw tables because it combines:
- deterministic templates for high-value assignment queries
- semantic views for safety and simplicity
- AST validation for query guardrails
- grounded answer generation from returned rows only
The guardrails are implemented in src/lib/query/guardrails.ts.
- only read-only
SELECT/ CTE-backedSELECT - exactly one statement
- no
INSERT,UPDATE,DELETE,DROP,ALTER,TRUNCATE,CREATE,GRANT,REVOKE,COMMENT - no wildcard
* - only allowlisted views
- only allowlisted columns
LIMITis mandatory
If the user asks something unrelated, the system refuses with:
This system is designed to answer questions related to the provided dataset only.
Examples of refused prompts:
- general knowledge
- creative writing
- unrelated entertainment or sports prompts
- Domain classification fallback when heuristics are insufficient
- SQL planning for open-ended in-domain questions
- Grounded answer synthesis from returned result rows
- It is not used to seed the database
- It is not used to invent answers without data
- It is not allowed to generate arbitrary unrestricted SQL
- It does not query raw tables freely
gemini-2.5-flash- SQL planning
gemini-2.5-flash-lite- lightweight classification
- grounded response synthesis
Data seeding is deterministic and does not depend on the LLM.
- Read JSONL files from
sap-o2c-data/ - Normalize the raw records
- Build normalized entity collections
- Build graph nodes and graph edges
- Insert normalized tables into Postgres
- Insert graph projection tables
- Create or replace analytical views
- reproducible
- deterministic
- no LLM cost
- safe for fixed assignment datasets
- same source of truth feeds both graph and query layers
scripts/ingest-sap-jsonl.tssrc/db/seed.tssrc/lib/o2c/dataset.ts
Search graph nodes by label, ID, subtitle, and metadata-derived text.
Fetch a focused neighborhood for a given node ID.
Accepts:
{
"message": "Trace the full flow of billing document 90504248.",
"focusNodeIds": ["billing_document:90504248"]
}Returns:
{
"answer": "Billing document 90504248 traces back to sales order 740552...",
"sql": "select ...",
"rowsPreview": [{ "...": "..." }],
"relatedNodeIds": ["billing_document:90504248"],
"diagnostics": {
"mode": "template"
}
}- Cytoscape-based graph canvas
- search box for entity lookup
- expandable/focused node neighborhoods
- metadata inspector for selected nodes
Reset viewbutton to recenter the graph viewport
- natural-language prompt entry
- sample question chips
- grounded answer rendering
- generated SQL trace
- returned rows preview table
- graph highlighting for entities mentioned in query results
- Node.js
- npm
- Docker Desktop or a local Postgres instance
npm installCopy .env.example to .env.local and set values as needed:
DATABASE_URL=postgres://postgres:postgres@localhost:5432/o2c_context_graph
GOOGLE_GENERATIVE_AI_API_KEY=
APP_DATASET_ROOT=./sap-o2c-datadocker compose up -d
npm run db:init
npm run devpostgres- Postgres 16 on
localhost:5432
- Postgres 16 on
adminer- optional database UI at http://localhost:8080
Adminer defaults:
- Server:
postgres - Username:
postgres - Password:
postgres - Database:
o2c_context_graph
npm run devIn that mode:
- graph exploration still works
- deterministic queries still work
- open-ended LLM SQL planning does not run
npm run db:push
npm run ingestOr in one step:
npm run db:initRecommended deployment shape:
- Frontend/API:
- Vercel Hobby
- Database:
- Supabase Free Postgres
- LLM:
- Gemini 2.5 Flash family
Deployment steps: see DEPLOYMENT.md.
- build with
next build - run with
next start - use a real
DATABASE_URL - optionally use a real
GOOGLE_GENERATIVE_AI_API_KEYfor open-ended NL-to-SQL fallback
Current choices (good for evaluator demos and small traffic):
- Single Next.js app (UI + API routes) deployed on Vercel
- Postgres holds both normalized business data and the graph projection (
graph_nodes,graph_edges) - Read-only semantic views power NL-to-SQL, with strict allowlists and a statement timeout
Future choices (if you need more scale later):
- Use a managed Postgres pooler +
DATABASE_POOL_MAX=1to avoid serverless connection spikes - Add caching for
/api/query(SQL plan + rows) to reduce LLM calls and speed up repeat questions - Add request rate limits to protect LLM quota
- Add indexes/materialized views for heavy analytical queries or larger datasets
Prompt:
Which products are associated with the highest number of billing documents?
Known expected leaders:
S8907367008620S8907367039280
Both have 22 distinct billing documents in the dataset.
Prompt:
Trace the full flow of billing document 90504248.
Known fixture:
- Sales Order:
740552 - Delivery:
80738072 - Billing:
90504248 - Accounting Document:
9400000249
Prompt:
Identify sales orders that have broken or incomplete flows.
Known fixture examples:
740506740507740508
Prompt:
What percentage of customers are currently blocked?
Known expected answer:
75.00%6 of 8customers blocked
src/
app/
api/
components/
db/
lib/
scripts/
sap-o2c-data/
e2e/
High-signal files:
src/lib/o2c/dataset.tssrc/db/schema.tssrc/db/views.tssrc/db/seed.tssrc/lib/query/service.tssrc/lib/query/templates.tssrc/lib/query/guardrails.tssrc/lib/graph/service.tssrc/components/O2CWorkspace.tsxsrc/components/GraphCanvas.tsx
npm run typecheck
npm run lint
npm run test
npm run test:e2e
npm run build- dataset normalization fixtures
- graph search and neighborhood lookup
- query templates and guardrails
- UI sample-query flow
- main app loads
- graph search renders a large neighborhood
- reset-view interaction works
This repository is designed to be evaluator-friendly:
- the graph is inspectable
- the query layer is auditable
- the answers are grounded
- the SQL is exposed
- the guardrails are explicit
This README is intended to be the single source of truth for the repository, the production shape of the app, and the main demo talking points.