Dash is a self-learning data agent that grounds its answers in 6 layers of context and improves with every run.
Inspired by OpenAI's in-house data agent.
# Clone this repo
git clone https://github.com/agno-agi/dash.git && cd dash
# Add OPENAI_API_KEY by adding to .env file or export OPENAI_API_KEY=sk-***
cp example.env .env
# Start the application
docker compose up -d --build
# Load sample data and knowledge
docker exec -it dash-api python -m dash.scripts.load_data
docker exec -it dash-api python -m dash.scripts.load_knowledgeConfirm dash is running by navigation to http://localhost:8000/docs.
- Open os.agno.com and login
- Add OS → Local →
http://localhost:8000 - Click "Connect"
Try it (sample F1 dataset):
- Who won the most F1 World Championships?
- How many races has Lewis Hamilton won?
- Compare Ferrari vs Mercedes points 2015-2020
Our goal is simple: ask a question in english, get a correct, meaningful answer. But raw LLMs writing SQL hit a wall fast:
- Schemas lack meaning.
- Types are misleading.
- Tribal knowledge is missing.
- No way to learn from mistakes.
- Results generally lack interpretation.
The root cause is missing context and missing memory.
Dash solves this with 6 layers of grounded context, a self-learning loop that improves with every query, and a focus on understanding your question to deliver insights you can act on.
| Layer | Purpose | Source |
|---|---|---|
| Table Usage | Schema, columns, relationships | knowledge/tables/*.json |
| Human Annotations | Metrics, definitions, and business rules | knowledge/business/*.json |
| Query Patterns | SQL that is known to work | knowledge/queries/*.sql |
| Institutional Knowledge | Docs, wikis, external references | MCP (optional) |
| Learnings | Error patterns and discovered fixes | Agno Learning Machine |
| Runtime Context | Live schema changes | introspect_schema tool |
The agent retrieves relevant context at query time via hybrid search, then generates SQL grounded in patterns that already work.
Dash improves without retraining or fine-tuning. We call this gpu-poor continuous learning.
It learns through two complementary systems:
| System | Stores | How It Evolves |
|---|---|---|
| Knowledge | Validated queries and business context | Curated by you + dash |
| Learnings | Error patterns and fixes | Managed by Learning Machine automatically |
User Question
↓
Retrieve Knowledge + Learnings
↓
Reason about intent
↓
Generate grounded SQL
↓
Execute and interpret
↓
┌────┴────┐
↓ ↓
Success Error
↓ ↓
↓ Diagnose → Fix → Save Learning
↓ (never repeated)
↓
Return insight
↓
Optionally save as Knowledge
Knowledge is curated—validated queries and business context you want the agent to build on.
Learnings is discovered—patterns the agent finds through trial and error. When a query fails because position is TEXT not INTEGER, the agent saves that gotcha. Next time, it knows.
Dash reasons about what makes an answer useful, not just technically correct.
Question: Who won the most races in 2019?
| Typical SQL Agent | Dash |
|---|---|
Hamilton: 11 |
Lewis Hamilton dominated 2019 with 11 wins out of 21 races, more than double Bottas’s 4 wins. This performance secured his sixth world championship. |
railway login
./scripts/railway_up.shLoad data and knowledge:
railway run python -m dash.scripts.load_data
railway run python -m dash.scripts.load_knowledgeView logs:
railway logs --service dashRun commands in production:
railway run python -m dash # CLI modeRedeploy after changes:
railway up --service dash -dOpen dashboard:
railway openDash works best when it understands how your organization talks about data.
knowledge/
├── tables/ # Table meaning and caveats
├── queries/ # Proven SQL patterns
└── business/ # Metrics and language
{
"table_name": "orders",
"table_description": "Customer orders with denormalized line items",
"use_cases": ["Revenue reporting", "Customer analytics"],
"data_quality_notes": [
"created_at is UTC",
"status values: pending, completed, refunded",
"amount stored in cents"
]
}
-- <query name>monthly_revenue</query name>
-- <query description>
-- Monthly revenue calculation.
-- Converts cents to dollars.
-- Excludes refunded orders.
-- </query description>
-- <query>
SELECT
DATE_TRUNC('month', created_at) AS month,
SUM(amount) / 100.0 AS revenue_dollars
FROM orders
WHERE status = 'completed'
GROUP BY 1
ORDER BY 1 DESC
-- </query>
{
"metrics": [
{
"name": "MRR",
"definition": "Sum of active subscriptions excluding trials"
}
],
"common_gotchas": [
{
"issue": "Revenue double counting",
"solution": "Filter to completed orders only"
}
]
}
python -m dash.scripts.load_knowledge # Upsert changes
python -m dash.scripts.load_knowledge --recreate # Fresh start./scripts/venv_setup.sh && source .venv/bin/activate
docker compose up -d dash-db
python -m dash.scripts.load_data
python -m dash # CLI mode| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY |
Yes | OpenAI API key |
EXA_API_KEY |
No | Web search for external knowledge |
DB_* |
No | Database config (defaults to localhost) |
- OpenAI's In-House Data Agent — the inspiration
- Self-Improving SQL Agent — deep dive on an earlier architecture
- Agno Docs
- Discord