Skip to content

Deregit2025/data-agent-forge

Repository files navigation

Oracle Forge — Multi-Database AI Data Agent

Oracle Forge is a production-grade AI data agent that answers natural language questions across heterogeneous enterprise databases. Built for the TRP1 DataAgentBench (DAB) challenge, it handles four database types (PostgreSQL, MongoDB, SQLite, DuckDB) through a unified orchestration layer.


Team

Name Role
Dereje Derib Driver
Eyoel Nebiyu Driver
Nuhamin Alemayehu Corpus
Rafia Kedir Corpus
Chalie Lijalem Intelligence Officer
Liul Teshome Intelligence Officer

Datasets

Oracle Forge supports all 12 DAB datasets:

Dataset DB Types Domain
yelp MongoDB + DuckDB Business reviews
googlelocal PostgreSQL + SQLite Local business listings
bookreview PostgreSQL + SQLite Book reviews
music_brainz SQLite + DuckDB Music catalog and sales
crmarenapro PostgreSQL + SQLite CRM / support cases
pancancer PostgreSQL + DuckDB Cancer genomics
patents SQLite + PostgreSQL Patent publications
deps_dev SQLite Package dependency metadata
github_repos PostgreSQL + SQLite Open-source repositories
agnews PostgreSQL News article classification
stockindex SQLite + DuckDB Global stock index data
stockmarket SQLite + DuckDB Individual stock price history

Architecture

User Query
    │
    ▼
OracleForgeAgent (agent/main.py)
    │
    ├─ Loads three-layer context:
    │   Layer 1: agent/AGENT.md       (tools, join keys, dialect rules)
    │   Layer 2: kb/domain/dab_*.md   (dataset-specific schemas)
    │   Layer 3: kb/corrections/      (self-learned failure patterns)
    │
    ▼
Conductor (agent/conductor.py) — LangGraph orchestrator
    │
    ├─ Decomposes query into sub-tasks per database
    ├─ Routes each sub-task to the correct sub-agent:
    │   ├─ postgres_agent.py    → PostgreSQL (5 databases)
    │   ├─ mongo_agent.py       → MongoDB (3 collections)
    │   ├─ sqlite_agent.py      → SQLite (12 databases)
    │   └─ duckdb_agent.py      → DuckDB (9 databases)
    │
    ▼
MCP Server (mcp/mcp_server.py) — http://127.0.0.1:5000
    │
    ├─ 29 database tools across 4 DB types
    └─ Unified POST /v1/tools/{tool_name} interface
    
Self-Correction Loop
    ├─ agent/self_correction/failure_types.py  — failure taxonomy
    ├─ agent/self_correction/recovery_router.py — targeted fix strategies
    └─ utils/autodream.py — consolidates corrections → KB updates

Setup Instructions

Prerequisites

  • Python 3.11+
  • Access to the shared team server (via Tailscale)
  • OpenRouter API key (for Claude Sonnet 4.6 via OpenRouter)
  • DataAgentBench repository cloned at /home/project/oracle-forge/DataAgentBench

1. Clone the repository

git clone https://github.com/derejederib/oracle-forge.git
cd oracle-forge

2. Install dependencies

pip install -r agent/requirements.txt

3. Configure environment

cp .env.example .env
# Edit .env and fill in your values:
#   OPENROUTER_API_KEY=sk-or-...
#   MCP_URL=http://127.0.0.1:5000
#   DAB_PATH=/home/project/oracle-forge/DataAgentBench

4. Start the MCP server

The MCP server must be running before the agent or evaluation harness can execute queries.

python -m mcp.mcp_server

Verify it is healthy:

curl http://127.0.0.1:5000/health

5. Run a single query (smoke test)

python -m eval.harness --datasets yelp --query_ids 1 --n_trials 1

6. Run the full benchmark

python -m eval.harness --datasets yelp agnews bookreview --n_trials 5

7. Run the regression suite

python -m eval.regression_suite \
  --baseline eval/results/baseline_run.json \
  --current  eval/results/latest.json

Live Agent

The agent is deployed on the shared team server (ip-10-0-14-163). All team members connect via Tailscale using the alias trp-gemini.

# Connect to the server (Tailscale required):
ssh trp-gemini

A persistent tmux session named oracle-forge runs on the server. The MCP server (python -m mcp.mcp_server) runs inside this session.

# Driver — full control:
tmux attach -t oracle-forge

# Corpus / Intelligence Officers — read-only view:
tmux attach -t oracle-forge -r

Run a benchmark query from inside the session:

cd /home/project/oracle-forge
source venv/bin/activate
python -m eval.harness --datasets yelp --query_ids 1 --n_trials 1

Verify the MCP server is healthy before running:

curl http://127.0.0.1:5000/health

Knowledge Base

Three-layer context engineering:

Layer Location Purpose
Layer 1 agent/AGENT.md MCP tools, join keys, dialect rules — always loaded
Layer 2 kb/domain/dab_*.md Dataset-specific schemas, field types, query patterns
Layer 3 kb/corrections/corrections_log.md Self-learned failure corrections from prior runs

KB documents are injection-tested before merging. See kb/*/injection_tests/ for evidence.


Evaluation

# Score a single query
python -c "
from eval.scorer import score
from pathlib import Path
result = score('42', Path('/path/to/DataAgentBench/query_yelp/query1'))
print(result)
"

Results are saved to eval/results/benchmark_{run_id}.json and eval/results/latest.json.


Project Structure

oracle-forge/
├── agent/                  # Core agent logic
│   ├── AGENT.md            # Layer 1 KB — tools, join keys, dialect rules
│   ├── conductor.py        # LangGraph orchestrator
│   ├── main.py             # OracleForgeAgent entry point
│   ├── claude_adapter.py   # OpenRouter → Claude integration
│   ├── sub_agents/         # DB-type specialists
│   └── self_correction/    # Failure taxonomy + recovery router
├── mcp/                    # MCP server + 29 database tools
├── eval/                   # Evaluation harness
│   ├── harness.py          # Benchmark runner
│   ├── scorer.py           # DAB validate.py integration
│   ├── trace_schema.py     # Structured result types
│   ├── regression_suite.py # Regression testing
│   └── results/            # JSON result files
├── kb/                     # Knowledge base (3 layers)
│   ├── architecture/       # KB v1 — system architecture docs
│   ├── domain/             # KB v2 — 12 dataset schemas
│   ├── evaluation/         # Harness design, scoring methodology
│   └── corrections/        # KB v3 — self-learned failure patterns
├── utils/                  # Shared utility library
│   ├── schema_introspector.py
│   ├── autodream.py
│   ├── contract_validator.py
│   ├── entity_resolver.py
│   ├── multi_pass_retrieval.py
│   └── tests/
├── probes/                 # Adversarial probe library
│   └── probes.md           # 15+ probes across 3+ failure categories
├── planning/               # AI-DLC Inception documents
├── signal/                 # Signal Corps engagement portfolio
└── docs/                   # Challenge PDFs and reference material

About

Context-layered, self-correcting AI data agent that answers complex business questions across PostgreSQL, MongoDB, DuckDB, and SQLite — competing on the UC Berkeley DataAgentBench leaderboard.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages