Oracle Forge — Multi-Database AI Data Agent

Oracle Forge is a production-grade AI data agent that answers natural language questions across heterogeneous enterprise databases. Built for the TRP1 DataAgentBench (DAB) challenge, it handles four database types (PostgreSQL, MongoDB, SQLite, DuckDB) through a unified orchestration layer.

Team

Name	Role
Dereje Derib	Driver
Eyoel Nebiyu	Driver
Nuhamin Alemayehu	Corpus
Rafia Kedir	Corpus
Chalie Lijalem	Intelligence Officer
Liul Teshome	Intelligence Officer

Datasets

Oracle Forge supports all 12 DAB datasets:

Dataset	DB Types	Domain
yelp	MongoDB + DuckDB	Business reviews
googlelocal	PostgreSQL + SQLite	Local business listings
bookreview	PostgreSQL + SQLite	Book reviews
music_brainz	SQLite + DuckDB	Music catalog and sales
crmarenapro	PostgreSQL + SQLite	CRM / support cases
pancancer	PostgreSQL + DuckDB	Cancer genomics
patents	SQLite + PostgreSQL	Patent publications
deps_dev	SQLite	Package dependency metadata
github_repos	PostgreSQL + SQLite	Open-source repositories
agnews	PostgreSQL	News article classification
stockindex	SQLite + DuckDB	Global stock index data
stockmarket	SQLite + DuckDB	Individual stock price history

Architecture

User Query
    │
    ▼
OracleForgeAgent (agent/main.py)
    │
    ├─ Loads three-layer context:
    │   Layer 1: agent/AGENT.md       (tools, join keys, dialect rules)
    │   Layer 2: kb/domain/dab_*.md   (dataset-specific schemas)
    │   Layer 3: kb/corrections/      (self-learned failure patterns)
    │
    ▼
Conductor (agent/conductor.py) — LangGraph orchestrator
    │
    ├─ Decomposes query into sub-tasks per database
    ├─ Routes each sub-task to the correct sub-agent:
    │   ├─ postgres_agent.py    → PostgreSQL (5 databases)
    │   ├─ mongo_agent.py       → MongoDB (3 collections)
    │   ├─ sqlite_agent.py      → SQLite (12 databases)
    │   └─ duckdb_agent.py      → DuckDB (9 databases)
    │
    ▼
MCP Server (mcp/mcp_server.py) — http://127.0.0.1:5000
    │
    ├─ 29 database tools across 4 DB types
    └─ Unified POST /v1/tools/{tool_name} interface
    
Self-Correction Loop
    ├─ agent/self_correction/failure_types.py  — failure taxonomy
    ├─ agent/self_correction/recovery_router.py — targeted fix strategies
    └─ utils/autodream.py — consolidates corrections → KB updates

Setup Instructions

Prerequisites

Python 3.11+
Access to the shared team server (via Tailscale)
OpenRouter API key (for Claude Sonnet 4.6 via OpenRouter)
DataAgentBench repository cloned at /home/project/oracle-forge/DataAgentBench

1. Clone the repository

git clone https://github.com/derejederib/oracle-forge.git
cd oracle-forge

2. Install dependencies

pip install -r agent/requirements.txt

3. Configure environment

cp .env.example .env
# Edit .env and fill in your values:
#   OPENROUTER_API_KEY=sk-or-...
#   MCP_URL=http://127.0.0.1:5000
#   DAB_PATH=/home/project/oracle-forge/DataAgentBench

4. Start the MCP server

The MCP server must be running before the agent or evaluation harness can execute queries.

python -m mcp.mcp_server

Verify it is healthy:

curl http://127.0.0.1:5000/health

5. Run a single query (smoke test)

python -m eval.harness --datasets yelp --query_ids 1 --n_trials 1

6. Run the full benchmark

python -m eval.harness --datasets yelp agnews bookreview --n_trials 5

7. Run the regression suite

python -m eval.regression_suite \
  --baseline eval/results/baseline_run.json \
  --current  eval/results/latest.json

Live Agent

The agent is deployed on the shared team server (ip-10-0-14-163). All team members connect via Tailscale using the alias trp-gemini.

# Connect to the server (Tailscale required):
ssh trp-gemini

A persistent tmux session named oracle-forge runs on the server. The MCP server (python -m mcp.mcp_server) runs inside this session.

# Driver — full control:
tmux attach -t oracle-forge

# Corpus / Intelligence Officers — read-only view:
tmux attach -t oracle-forge -r

Run a benchmark query from inside the session:

cd /home/project/oracle-forge
source venv/bin/activate
python -m eval.harness --datasets yelp --query_ids 1 --n_trials 1

Verify the MCP server is healthy before running:

curl http://127.0.0.1:5000/health

Knowledge Base

Three-layer context engineering:

Layer	Location	Purpose
Layer 1	`agent/AGENT.md`	MCP tools, join keys, dialect rules — always loaded
Layer 2	`kb/domain/dab_*.md`	Dataset-specific schemas, field types, query patterns
Layer 3	`kb/corrections/corrections_log.md`	Self-learned failure corrections from prior runs

KB documents are injection-tested before merging. See kb/*/injection_tests/ for evidence.

Evaluation

# Score a single query
python -c "
from eval.scorer import score
from pathlib import Path
result = score('42', Path('/path/to/DataAgentBench/query_yelp/query1'))
print(result)
"

Results are saved to eval/results/benchmark_{run_id}.json and eval/results/latest.json.

Project Structure

oracle-forge/
├── agent/                  # Core agent logic
│   ├── AGENT.md            # Layer 1 KB — tools, join keys, dialect rules
│   ├── conductor.py        # LangGraph orchestrator
│   ├── main.py             # OracleForgeAgent entry point
│   ├── claude_adapter.py   # OpenRouter → Claude integration
│   ├── sub_agents/         # DB-type specialists
│   └── self_correction/    # Failure taxonomy + recovery router
├── mcp/                    # MCP server + 29 database tools
├── eval/                   # Evaluation harness
│   ├── harness.py          # Benchmark runner
│   ├── scorer.py           # DAB validate.py integration
│   ├── trace_schema.py     # Structured result types
│   ├── regression_suite.py # Regression testing
│   └── results/            # JSON result files
├── kb/                     # Knowledge base (3 layers)
│   ├── architecture/       # KB v1 — system architecture docs
│   ├── domain/             # KB v2 — 12 dataset schemas
│   ├── evaluation/         # Harness design, scoring methodology
│   └── corrections/        # KB v3 — self-learned failure patterns
├── utils/                  # Shared utility library
│   ├── schema_introspector.py
│   ├── autodream.py
│   ├── contract_validator.py
│   ├── entity_resolver.py
│   ├── multi_pass_retrieval.py
│   └── tests/
├── probes/                 # Adversarial probe library
│   └── probes.md           # 15+ probes across 3+ failure categories
├── planning/               # AI-DLC Inception documents
├── signal/                 # Signal Corps engagement portfolio
└── docs/                   # Challenge PDFs and reference material

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Oracle Forge — Multi-Database AI Data Agent

Team

Datasets

Architecture

Setup Instructions

Prerequisites

1. Clone the repository

2. Install dependencies

3. Configure environment

4. Start the MCP server

5. Run a single query (smoke test)

6. Run the full benchmark

7. Run the regression suite

Live Agent

Knowledge Base

Evaluation

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
DataAgentBench		DataAgentBench
agent		agent
api		api
eval		eval
kb		kb
mcp		mcp
planning		planning
probes		probes
results		results
sandbox		sandbox
signal		signal
utils		utils
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
conftest.py		conftest.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Oracle Forge — Multi-Database AI Data Agent

Team

Datasets

Architecture

Setup Instructions

Prerequisites

1. Clone the repository

2. Install dependencies

3. Configure environment

4. Start the MCP server

5. Run a single query (smoke test)

6. Run the full benchmark

7. Run the regression suite

Live Agent

Knowledge Base

Evaluation

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages