A Text-to-SQL / Chat-Data system for the tourism industry. Users ask business questions in natural language; the system parses metrics, dimensions, time ranges, and filters, generates safe SQL queries, and returns structured data with visualization suggestions and natural language summaries.
Luda ships with two execution modes, switchable via environment variable:
| Mode | Description | When to use |
|---|---|---|
legacy (default) |
Local semantic dictionary + SQL template matching. No LLM dependency, 100% deterministic. | Baseline, fast iteration, no API key |
langgraph |
13-node LangGraph workflow with parallel retrieval, semantic filtering, SQL self-correction, and answer summarization. | Full AI capability, requires LLM + embedding keys |
extract_keywords
↓
┌──────────────────────────────────────┐
│ recall_column │ recall_value │ recall_metric │ (parallel)
└──────────────────────────────────────┘
↓
merge_retrieved_info
↓
┌──────────────────────────────────────┐
│ filter_table │ filter_metric │ (parallel)
└──────────────────────────────────────┘
↓
add_extra_context → generate_sql → validate_sql
↓
execute_sql → [pass] → summarize_result → END
→ [fail] → correct_sql → re-execute → summarize_result → END
- Three-way parallel recall: Vector search (Qdrant) for schema entities + fuzzy text search (ES) for dimension values + metric definition lookup
- Semantic filtering: LLM prunes recalled tables/metrics by relevance before SQL generation
- SQL self-correction: On validation failure, the LLM corrects the SQL and retries (up to N iterations)
- Answer summarization: Raw query results are rewritten into natural language with chart suggestions
- Qdrant: Semantic vector recall for table fields, relationships, and business metrics
- Elasticsearch: Fuzzy and full-text search for dimension values (destinations, merchants, product names)
| Layer | Technology |
|---|---|
| Backend | Python 3.12+ · FastAPI (async) |
| Orchestration | LangGraph · LangChain |
| Database | MySQL 8.0 |
| Vector DB | Qdrant |
| Text Search | Elasticsearch 8.x |
| LLM / Embedding | OpenAI-compatible API (DeepSeek / DashScope) |
| Package Manager | uv |
| Frontend | Vanilla HTML + JS |
| Testing | pytest |
旅答/
├── frontend/ # Web UI (chat + visualization)
│ ├── index.html
│ ├── app.js
│ └── styles.css
├── docs/
│ ├── architecture.md # Detailed architecture document
│ ├── api_contract.md # API / SSE event contract
│ ├── metric_glossary.md # Business metric definitions
│ ├── demo_questions.md # Standard test question set
│ ├── frontend_integration.md # Frontend SSE integration guide
│ └── handoff.md # Project handoff documentation
└── data/ # Backend sub-project (own pyproject.toml)
├── app/
│ ├── agent/ # Legacy mode SQL template engine + LangGraph graph
│ ├── api/ # FastAPI route handlers
│ ├── prompts/ # LLM prompt templates
│ ├── scripts/ # Meta knowledge build scripts
│ ├── services/ # Query service (SSE emission)
│ └── ...
├── generate/ # Business data generation engine
├── seeds/ # Seed data
├── sql/ # DDL scripts (meta + business)
├── tests/ # Automated tests
├── docker/ # Infrastructure configs (MySQL, Qdrant, ES)
├── init_db.py # Database initialization
├── pyproject.toml
└── .env.example
data/is a sub-project with its ownpyproject.toml. Alluvcommands use--project data.
# MySQL 8 (host port 3307 → container port 3306)
docker run --name luda-mysql \
-e MYSQL_ROOT_PASSWORD=123321 \
-p 3307:3306 \
-d mysql:8.0 \
--default-authentication-plugin=mysql_native_password \
--character-set-server=utf8mb4 \
--collation-server=utf8mb4_unicode_ci
# Qdrant + Elasticsearch (for langgraph mode)
docker compose -f data/docker/docker-compose.retrieval.yaml up -dcp data/.env.example data/.envEdit data/.env:
# Database
DB_HOST=127.0.0.1
DB_PORT=3307
DB_USER=root
DB_PASSWORD=123321
DB_NAME=travel
APP_PORT=8000
# Agent mode: legacy (default) / langgraph
LUDA_AGENT_MODE=legacy
# Retrieval toggle (default: off; enable for langgraph mode)
RETRIEVAL_ENABLED=false
# LLM config (required for langgraph mode)
LLM_API_KEY=your-llm-key
LLM_BASE_URL=https://api.deepseek.com
LLM_MODEL=deepseek-chat
# Embedding config (required for langgraph mode)
EMBEDDING_API_KEY=your-embedding-key
EMBEDDING_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
EMBEDDING_MODEL=text-embedding-v4# Install Python dependencies
uv sync --project data
# Create meta database
docker exec -i luda-mysql mysql -uroot -p123321 < data/sql/meta.sql
# Initialize business database
uv run --project data python data/init_db.py
# Generate business data (Windows PowerShell)
$env:PYTHONPATH="data"
uv run --project data python -m generate.main --profile full
# Build meta knowledge + retrieval indexes (langgraph mode only)
uv run --project data python -m app.scripts.build_meta_knowledge# Windows PowerShell
$env:PYTHONPATH="data"
uv run --project data python -m app.main
# macOS / Linux
PYTHONPATH=data uv run --project data python -m app.main- Swagger docs: http://127.0.0.1:8000/docs
- Health check: http://127.0.0.1:8000/health
Open frontend/index.html in a browser (or serve it via any static file server).
The system streams progress and results via Server-Sent Events:
| Event | Description | Key Fields |
|---|---|---|
meta |
Request metadata | trace_id, created_at, agent_mode |
progress |
Node execution progress | step, status, message |
keywords |
Extracted keywords and entities | keyword list |
retrieval |
Retrieved tables, columns, metrics, values | retrieval results |
filtered_tables |
LLM-pruned table entities | filtered list |
filtered_metrics |
LLM-pruned metric entities | filtered list |
sql |
Generated SQL statement | SQL string |
sql_validation |
EXPLAIN safety check result | validation status |
sql_correction |
Self-correction log (on failure) | correction details |
result |
Structured tabular data | rows, columns |
chart |
Visualization suggestion config | chart config |
answer |
Natural language summary | summary text |
error |
Exception or safety block | error message |
done |
Workflow complete | — |
All generated SQL passes through multi-layer safety checks before execution:
- Read-only filter: Only
SELECTallowed. BlocksINSERT,UPDATE,DELETE,DROP,ALTER,TRUNCATE, and multi-statement queries. - System schema protection: Blocks access to MySQL internal schemas (
information_schema, etc.). - Row limit enforcement: Rewrites unbounded
SELECT *queries to includeLIMIT. - EXPLAIN validation: Executes
EXPLAIN <SQL>on a read-only connection to verify syntax and catch join errors before returning results.
# Run agent regression tests
uv run --project data pytest data/tests/test_query_agent.py -v
# Run all backend tests
uv run --project data pytest data/tests -v