Oracle Forge is a production-grade AI data agent that answers natural language questions across heterogeneous enterprise databases. Built for the TRP1 DataAgentBench (DAB) challenge, it handles four database types (PostgreSQL, MongoDB, SQLite, DuckDB) through a unified orchestration layer.
| Name | Role |
|---|---|
| Dereje Derib | Driver |
| Eyoel Nebiyu | Driver |
| Nuhamin Alemayehu | Corpus |
| Rafia Kedir | Corpus |
| Chalie Lijalem | Intelligence Officer |
| Liul Teshome | Intelligence Officer |
Oracle Forge supports all 12 DAB datasets:
| Dataset | DB Types | Domain |
|---|---|---|
| yelp | MongoDB + DuckDB | Business reviews |
| googlelocal | PostgreSQL + SQLite | Local business listings |
| bookreview | PostgreSQL + SQLite | Book reviews |
| music_brainz | SQLite + DuckDB | Music catalog and sales |
| crmarenapro | PostgreSQL + SQLite | CRM / support cases |
| pancancer | PostgreSQL + DuckDB | Cancer genomics |
| patents | SQLite + PostgreSQL | Patent publications |
| deps_dev | SQLite | Package dependency metadata |
| github_repos | PostgreSQL + SQLite | Open-source repositories |
| agnews | PostgreSQL | News article classification |
| stockindex | SQLite + DuckDB | Global stock index data |
| stockmarket | SQLite + DuckDB | Individual stock price history |
User Query
│
▼
OracleForgeAgent (agent/main.py)
│
├─ Loads three-layer context:
│ Layer 1: agent/AGENT.md (tools, join keys, dialect rules)
│ Layer 2: kb/domain/dab_*.md (dataset-specific schemas)
│ Layer 3: kb/corrections/ (self-learned failure patterns)
│
▼
Conductor (agent/conductor.py) — LangGraph orchestrator
│
├─ Decomposes query into sub-tasks per database
├─ Routes each sub-task to the correct sub-agent:
│ ├─ postgres_agent.py → PostgreSQL (5 databases)
│ ├─ mongo_agent.py → MongoDB (3 collections)
│ ├─ sqlite_agent.py → SQLite (12 databases)
│ └─ duckdb_agent.py → DuckDB (9 databases)
│
▼
MCP Server (mcp/mcp_server.py) — http://127.0.0.1:5000
│
├─ 29 database tools across 4 DB types
└─ Unified POST /v1/tools/{tool_name} interface
Self-Correction Loop
├─ agent/self_correction/failure_types.py — failure taxonomy
├─ agent/self_correction/recovery_router.py — targeted fix strategies
└─ utils/autodream.py — consolidates corrections → KB updates
- Python 3.11+
- Access to the shared team server (via Tailscale)
- OpenRouter API key (for Claude Sonnet 4.6 via OpenRouter)
- DataAgentBench repository cloned at
/home/project/oracle-forge/DataAgentBench
git clone https://github.com/derejederib/oracle-forge.git
cd oracle-forgepip install -r agent/requirements.txtcp .env.example .env
# Edit .env and fill in your values:
# OPENROUTER_API_KEY=sk-or-...
# MCP_URL=http://127.0.0.1:5000
# DAB_PATH=/home/project/oracle-forge/DataAgentBenchThe MCP server must be running before the agent or evaluation harness can execute queries.
python -m mcp.mcp_serverVerify it is healthy:
curl http://127.0.0.1:5000/healthpython -m eval.harness --datasets yelp --query_ids 1 --n_trials 1python -m eval.harness --datasets yelp agnews bookreview --n_trials 5python -m eval.regression_suite \
--baseline eval/results/baseline_run.json \
--current eval/results/latest.jsonThe agent is deployed on the shared team server (ip-10-0-14-163). All team members connect via Tailscale using the alias trp-gemini.
# Connect to the server (Tailscale required):
ssh trp-geminiA persistent tmux session named oracle-forge runs on the server. The MCP server (python -m mcp.mcp_server) runs inside this session.
# Driver — full control:
tmux attach -t oracle-forge
# Corpus / Intelligence Officers — read-only view:
tmux attach -t oracle-forge -rRun a benchmark query from inside the session:
cd /home/project/oracle-forge
source venv/bin/activate
python -m eval.harness --datasets yelp --query_ids 1 --n_trials 1Verify the MCP server is healthy before running:
curl http://127.0.0.1:5000/healthThree-layer context engineering:
| Layer | Location | Purpose |
|---|---|---|
| Layer 1 | agent/AGENT.md |
MCP tools, join keys, dialect rules — always loaded |
| Layer 2 | kb/domain/dab_*.md |
Dataset-specific schemas, field types, query patterns |
| Layer 3 | kb/corrections/corrections_log.md |
Self-learned failure corrections from prior runs |
KB documents are injection-tested before merging. See kb/*/injection_tests/ for evidence.
# Score a single query
python -c "
from eval.scorer import score
from pathlib import Path
result = score('42', Path('/path/to/DataAgentBench/query_yelp/query1'))
print(result)
"Results are saved to eval/results/benchmark_{run_id}.json and eval/results/latest.json.
oracle-forge/
├── agent/ # Core agent logic
│ ├── AGENT.md # Layer 1 KB — tools, join keys, dialect rules
│ ├── conductor.py # LangGraph orchestrator
│ ├── main.py # OracleForgeAgent entry point
│ ├── claude_adapter.py # OpenRouter → Claude integration
│ ├── sub_agents/ # DB-type specialists
│ └── self_correction/ # Failure taxonomy + recovery router
├── mcp/ # MCP server + 29 database tools
├── eval/ # Evaluation harness
│ ├── harness.py # Benchmark runner
│ ├── scorer.py # DAB validate.py integration
│ ├── trace_schema.py # Structured result types
│ ├── regression_suite.py # Regression testing
│ └── results/ # JSON result files
├── kb/ # Knowledge base (3 layers)
│ ├── architecture/ # KB v1 — system architecture docs
│ ├── domain/ # KB v2 — 12 dataset schemas
│ ├── evaluation/ # Harness design, scoring methodology
│ └── corrections/ # KB v3 — self-learned failure patterns
├── utils/ # Shared utility library
│ ├── schema_introspector.py
│ ├── autodream.py
│ ├── contract_validator.py
│ ├── entity_resolver.py
│ ├── multi_pass_retrieval.py
│ └── tests/
├── probes/ # Adversarial probe library
│ └── probes.md # 15+ probes across 3+ failure categories
├── planning/ # AI-DLC Inception documents
├── signal/ # Signal Corps engagement portfolio
└── docs/ # Challenge PDFs and reference material