Natural-language data queries, powered by DataHub + LangGraph
Ask a question. Get SQL, results, and a chart — in one turn.
Requires: Docker, DataHub CLI (
pip install acryl-datahub),uv, Python 3.11+
git clone https://github.com/datahub-project/analytics-agent.git
cd analytics-agent
bash quickstart.shNo .env editing required. The script:
- Starts a local DataHub instance (or connects to an existing one)
- Loads the Olist e-commerce sample dataset + catalog metadata
- Builds and launches Analytics Agent at http://localhost:8100
Open the browser — a setup wizard walks you through naming your agent, picking a model (Anthropic, OpenAI, or Google), and entering your API key. If you already have one of those keys exported in your shell, it's picked up automatically.
| Plain-English → SQL → Chart | Ask "top 5 categories by revenue" — the agent searches DataHub docs first, writes SQL, runs it, and auto-renders a Vega-Lite chart. |
| Context Quality | A live status bar shows how well your DataHub catalog supported the agent (1–5). Hover for the LLM's reasoning. Improves as you document your data. |
/improve-context |
Type /improve-context after any conversation to get a numbered list of documentation improvements the agent wishes it had — then approve and publish them to DataHub in one click. |
| Multi-turn memory | Follow-ups like "make it a pie chart" or "filter to Q3" work across turns. |
| Collapsible reasoning | Tool calls and agent thinking are shown but collapsed — visible when you want them, out of the way when you don't. |
| 4 themes | DataHub (light/purple), Warm (light/orange), Ocean (dark/blue), Carbon (dark/gray). Switcher in the bottom-left. |
| Multiple connections | Add and manage multiple Snowflake, DuckDB, or SQLAlchemy connections from Settings. Each has its own encrypted credentials. |
Manual setup also requires:
nodeandjust(brew install node just)
git clone https://github.com/datahub-project/analytics-agent.git
cd analytics-agent
just install # uv sync + pnpm install
just start # builds frontend, starts backend at :8100Open http://localhost:8100 — a setup wizard handles the LLM key and connections on first run.
Without
just:uv sync && cd frontend && pnpm install && pnpm build && cd .. && uv run uvicorn analytics_agent.main:app --port 8100
cp .env.example .env # then edit as needed# LLM — pick one provider (or leave blank and use the wizard)
LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
# DataHub (optional — can also be added via Settings → Connections)
DATAHUB_GMS_URL=https://your-instance.acryl.io/gms
DATAHUB_GMS_TOKEN=eyJhbGci...| Command | What it does |
|---|---|
just start |
Build frontend if stale, start backend |
just start-remote |
Start + show DataHub connection status |
just nuke |
Wipe the DB and start from scratch |
just dev |
Hot-reload backend (use just dev-full for frontend HMR too) |
just logs |
Tail backend logs |
# Terminal 1 — backend (dev)
uv run uvicorn analytics_agent.main:app --reload --port 8101
# Terminal 2 — frontend HMR (http://localhost:5173, proxies /api/* to :8101)
cd frontend && pnpm dev# DataHub Cloud (Acryl)
datahub init --sso --host https://your-instance.acryl.io/gms --token-duration ONE_MONTH
# Self-hosted
datahub init --host http://localhost:8080 --username datahub --password datahub
# Verify the connection
curl -s -X POST http://localhost:8100/api/settings/connections/datahub/test# config.yaml
engines:
- type: snowflake
name: snowflake
connection:
account: "${SNOWFLAKE_ACCOUNT}"
warehouse: "${SNOWFLAKE_WAREHOUSE}"
database: "${SNOWFLAKE_DATABASE}"
schema: "${SNOWFLAKE_SCHEMA}"
user: "${SNOWFLAKE_USER}"Generate an RSA key pair, upload the public key to Snowflake, then set SNOWFLAKE_PRIVATE_KEY (base64-encoded PEM) in .env.
Settings → Connections → Authentication → SSO — opens a browser window for your IdP.
Four independently configurable model tiers:
| Task | Env var | Default (Anthropic) |
|---|---|---|
| Main analysis agent | LLM_MODEL |
claude-sonnet-4-6 |
| Chart generation | CHART_LLM_MODEL |
claude-haiku-4-5-20251001 |
| Context quality scoring | QUALITY_LLM_MODEL |
claude-haiku-4-5-20251001 |
| Titles & greeting | DELIGHT_LLM_MODEL |
claude-haiku-4-5-20251001 |
LLM_PROVIDER=anthropic
LLM_MODEL=claude-opus-4-7 # upgrade just the agent
QUALITY_LLM_MODEL=claude-sonnet-4-6 # or use a stronger model for quality scoringAnthropic models can also be run via AWS Bedrock. Set LLM_PROVIDER=bedrock and use the Bedrock inference-profile model IDs (e.g. us.anthropic.claude-sonnet-4-5-20250929-v1:0). Auth falls back to the standard AWS credential chain (env vars, ~/.aws/credentials, IAM role); to override, set AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY (and optionally AWS_SESSION_TOKEN for STS). AWS_REGION defaults to us-west-2.
LLM_PROVIDER=bedrock
AWS_REGION=us-west-2
LLM_MODEL=us.anthropic.claude-sonnet-4-5-20250929-v1:0The quickstart uses the DataHub MySQL container. For non-quickstart runs, SQLite is the default (./data/dev.db). Set DATABASE_URL in .env to switch backends — see .env.example for Postgres and SQLite formats.
Settings (top-right) manages:
- Connections — test, edit, add, and delete engine connections
- Authentication — per-connection: Password, Private Key, SSO, PAT, OAuth
- Tool toggles — enable/disable individual DataHub or engine tools
- Write-back skills —
publish_analysisandsave_correction(enabled by default) - Prompt — customize the system prompt
- Display — app name and logo
docker build -f docker/Dockerfile -t analytics-agent .
docker run -p 8100:8100 --env-file .env analytics-agentcd frontend && pnpm build && cd ..
uv run uvicorn analytics_agent.main:app --host 0.0.0.0 --port 8100analytics-agent/
├── backend/src/analytics_agent/
│ ├── agent/ # LangGraph ReAct graph, streaming, chart generation, analysis
│ ├── api/ # FastAPI routes: conversations, chat (SSE), settings, oauth
│ ├── context/ # DataHub tool loader (datahub_agent_context)
│ ├── db/ # SQLAlchemy models + Alembic migrations
│ │ └── models.py # Conversation, Message, Integration, Setting
│ ├── engines/ # Pluggable query engines (Snowflake, DuckDB, SQLAlchemy)
│ ├── prompts/ # System prompt (system_prompt.md) + chart prompt
│ └── skills/ # Write-back skills: publish-analysis, save-correction,
│ # improve-context (/improve-context slash command)
└── frontend/src/
├── components/Chat/ # MessageList, MessageInput, ContextStatusBar
├── components/Settings/
├── api/ # fetch wrappers for REST + SSE stream reader
└── store/ # Zustand: conversations, display, theme
SSE event flow:
User message → POST /api/conversations/{id}/messages
→ resolver.py resolves credentials → configured engine
→ LangGraph ReAct agent (DataHub tools + engine tools)
→ astream_events → TEXT / TOOL_CALL / TOOL_RESULT / SQL / CHART / COMPLETE
→ Frontend renders each event type inline
→ Background: context quality scored async, stored on conversation row

