A local CLI tool (oda) that lets developers, data analysts, data scientists, and AI agents query databases safely — with auto-generated schema context, read-only enforcement, and self-healing diagnostics.
Open Data Agent (oda) is a local command-line tool that bridges the gap between natural language questions and SQL databases. It is designed to be used by:
- Data analysts and data scientists doing exploratory data analysis (EDA) — query any connected database without memorising schema details
- Developers running ad-hoc SQL safely, with automatic LIMIT injection and query history
- AI coding agents (primarily OpenCode) —
odaacts as a structured, read-only database interface that an agent can call as a tool, using auto-generated schema documentation as context
Key properties:
- Read-only by default — write operations are hard-blocked with no bypass mechanism
- Schema docs as context — human- and agent-readable markdown catalog generated from your live schema
- Self-healing diagnostics — zero-row results and errors emit structured diagnostic context (row counts, sample values, NULL counts) so an AI agent can self-correct and retry
- Local-first — no server, no network service, all data on disk
- Multi-dialect — supports PostgreSQL, MySQL, and SQLite
Question (natural language or SQL)
│
├─ [AI agent] reads .opencode/rules/data-agent.md ← generated by `oda connect`
├─ [AI agent] reads docs/data-catalog/_index.md ← generated by `oda docs generate`
├─ [AI agent] navigates to relevant table docs
├─ [AI agent] checks memory/ for known data quirks
│
└─ oda query "SELECT ..."
│
├─ SafetyChecker read-only whitelist, dangerous pattern detection
├─ LIMIT injection auto-appends or clamps LIMIT (default 1000, max 10000)
├─ Timeout server-side (PostgreSQL/MySQL) or thread-based (SQLite)
├─ Execution results printed to stdout
└─ History log appended to ~/.config/open-data-agent/history.jsonl
│
└─ (zero rows or error)
└─ DiagnosticEngine → structured context to stderr → agent retries
1. Schema Catalog
oda docs generate introspects your connected database and produces a hierarchical markdown catalog under docs/data-catalog/. Each table gets its own file with columns, types, nullability, sample rows, and (optionally) column statistics. This catalog is the primary context source — neither you nor the agent needs to run live introspection queries.
2. Memory Store
memory/ contains curated markdown files (with YAML frontmatter) that capture data quirks, business logic, known anomalies, and tribal knowledge about your data. The agent checks memory before constructing a query.
3. Query Engine
Every query passes through a safety pipeline:
- Whitelist check (primary guard) — only explicitly allowed prefixes pass:
SELECT,WITH,EXPLAIN, and dialect-specific commands (PRAGMAfor SQLite;SHOWfor PostgreSQL and MySQL;TABLEfor MySQL). Any token not on the list is blocked outright. - Blacklist check (secondary guard) — catches injection patterns inside otherwise-valid SQL:
INSERT,UPDATE,DELETE,DROP,CREATE,ALTER,TRUNCATE,REPLACE,MERGE,GRANT,REVOKE,CALL,EXEC,EXECUTE, and dialect-specific patterns such asCOPY,LOAD DATA,ATTACH DATABASE - LIMIT injection — if no LIMIT is present, one is appended automatically; if the limit exceeds the configured maximum, it is clamped
- Timeout — queries are killed after a configurable timeout (default 30 seconds)
- History — every query is appended to a local JSONL log with timing and metadata
4. Diagnostics
When a query returns zero rows or errors, DiagnosticEngine automatically emits structured context to stderr: table row counts, sample column values, and NULL counts for relevant columns. This gives an AI agent (or a developer) enough signal to self-correct and retry without running additional queries manually.
oda is designed as a first-class tool for the OpenCode AI coding agent. When you run oda connect <name>, it renders a rules file at .opencode/rules/data-agent.md that tells OpenCode:
- Which database is active and how to explore it
- How to navigate the schema catalog
- How to call
oda query,oda memory, andoda docscommands - How to interpret diagnostic output and self-heal on failures
This means no extra LLM setup is required for natural language queries — OpenCode reads your schema docs, understands your data, constructs the SQL, and calls oda query as a tool. The full NL→SQL→results loop works out of the box.
See Asking Questions in Natural Language with OpenCode for a step-by-step walkthrough.
- Python 3.12+
uv— install withcurl -LsSf https://astral.sh/uv/install.sh | sh
git clone <repository-url>
cd open-data-agent
uv syncuv run oda initCreates ~/.config/open-data-agent/ with default config files.
uv run oda connections addPrompts for connection name, database type (postgresql / mysql / sqlite), host, port, database name, username, and password. Passwords are stored in the OS keychain where available (macOS Keychain, GNOME Keyring, Windows Credential Manager). On headless or CI environments without a keychain backend, passwords fall back to plaintext in ~/.config/open-data-agent/connections.yaml with a warning.
uv run oda connect <name>Sets the active connection and renders .opencode/rules/data-agent.md for OpenCode.
Note:
oda connectwrites the rules file relative to the current working directory. Always run this command from your project root, otherwise the file will be created in the wrong location.
uv run oda docs generate
# Include column statistics (null counts, distinct counts, min/max):
uv run oda docs generate --enrichuv run oda query "SELECT * FROM orders LIMIT 10"
# Output as JSON or CSV:
uv run oda query "SELECT * FROM orders" --format json
uv run oda query "SELECT * FROM orders" --format csv| Command | Description |
|---|---|
oda init |
First-run setup: create config directories and default files |
oda connect <name> |
Activate a connection; render OpenCode rules file |
oda connections list |
List all configured connections |
oda connections add |
Add a new connection (interactive) |
oda connections remove <name> |
Remove a connection |
oda connections test <name> |
Test live connectivity |
oda schemas |
List schemas in the active database |
oda tables [<schema>] |
List tables (optional schema filter) |
oda describe <table> |
Show columns and types |
oda sample <table> [--n N] |
Show N sample rows (default 5) |
oda profile <table> |
Column statistics: null count, distinct, min, max |
oda query "<sql>" |
Execute SQL; auto-logged to history |
oda docs generate [--enrich] |
Generate schema documentation catalog |
oda docs status |
Show freshness report for schema docs |
oda memory list |
List memory entries |
oda memory add |
Add a memory entry |
oda memory search <term> |
Search memory entries |
oda history list [--n N] |
Show most recent N query history entries |
oda history search <term> |
Search query history |
oda history stats |
Query history statistics |
Global config lives at ~/.config/open-data-agent/config.yaml:
row_limit: 1000 # default LIMIT auto-injected if absent
max_row_limit: 10000 # hard ceiling; never exceeded
query_timeout_seconds: 30 # query execution timeout
docs_staleness_days: 7 # warn if schema docs older than this
log_level: INFO
strict_mode: false # if true: block queries when docs are stale
# equivalent to passing --strict on every oda query callA ready-to-run guide using the Northwind database is available at docs/testing-with-northwind.md. It covers spinning up a local PostgreSQL container, registering the connection, generating schema docs, and running sample queries that exercise every major oda feature.
# One command to get started:
bash scripts/setup-northwind.sh# Unit tests (no external database required)
uv run pytest tests/unit/ -q
# Lint and format
uv run ruff check .
uv run ruff format --check .
# Type check
uv run mypy src/open_data_agent
# Integration tests (requires Docker)
docker compose up -d
uv run pytest -m integration
docker compose downOpenCode is an AI coding agent that runs in your terminal. When paired with oda, it acts as a natural language interface to your database — you ask a question in plain English, OpenCode reads your schema docs, writes the SQL, and calls oda query to execute it. No extra LLM configuration or API keys are needed beyond your OpenCode setup.
- OpenCode installed and configured (
opencodeavailable in your PATH) odaset up with a connection and schema docs generated (see How to Use It)
# Run from your project root — oda connect writes .opencode/rules/data-agent.md
# relative to the current working directory
uv run oda connect my-db
# Generate the schema catalog OpenCode will use as context
uv run oda docs generateAfter this, docs/data-catalog/ contains a markdown file for every table in your database. OpenCode reads these files to understand your schema without running any live introspection queries.
opencodeOpenCode automatically loads .opencode/rules/data-agent.md on startup. This file tells it which database is active, how to navigate the schema catalog, and how to use oda commands.
Type your question directly in the OpenCode chat. Examples:
How many orders were placed last month, broken down by status?
Which customers have spent more than $10,000 in total?
Show me the top 10 products by revenue in Q1 2025.
OpenCode will:
- Read the relevant table docs from
docs/data-catalog/ - Check
memory/for any known data quirks affecting those tables - Construct a safe, read-only SQL query
- Call
oda query "..."to execute it - Present the results — and if zero rows are returned, use the diagnostic output to self-correct and retry
You can ask follow-up questions in the same session:
Filter that to just the 'enterprise' customer segment.
Now group by region instead of status.
Export that as CSV.
OpenCode maintains conversation context, so follow-ups build on the previous query.
When you or OpenCode discovers something important about your data (a quirky column, a misleading field name, a known data quality issue), save it so future sessions benefit:
uv run oda memory add --title "Revenue column" \
--category data_quality \
--content "Use net_item_price not item_price — item_price includes tax"Or ask OpenCode to do it:
Remember that the revenue column to use is net_item_price, not item_price.
If a query returns zero rows, oda query automatically emits structured diagnostics to stderr:
- Row counts for each table referenced in the SQL
- Sample values for filter columns (e.g.
Column 'status' sample values: ['active', 'pending']) - NULL counts for filter columns
OpenCode reads this output and retries with a corrected query, without any input from you.
If your database schema changes, regenerate the catalog before asking questions:
uv run oda docs generate
# Check freshness at any time:
uv run oda docs statusUse --strict mode to block queries against stale docs:
uv run oda query "SELECT ..." --strictMIT License
Copyright (c) 2026 Adoit
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.