-
Notifications
You must be signed in to change notification settings - Fork 1
Session Management
SQL Query Engine uses Redis for two critical functions: session caching (storing schema descriptions and chat histories) and real-time streaming (Pub/Sub progress updates). This page explains the Redis architecture, key structure, and how context is preserved across requests.
graph TB
subgraph Redis
direction TB
H["Hash Store<br/>{chatID}:SQLQueryEngine"]
PS["Pub/Sub Channels"]
end
subgraph HashFields["Hash Fields"]
SD["dbSchemaDescription<br/>(cached schema)"]
QG["dbQueryGenerator<br/>(generation history)"]
VC["validatorChat:1, :2, ...<br/>(repair loop histories)"]
HC["historyCounterManager<br/>(auto-increment counter)"]
end
subgraph Channels["Pub/Sub Channels"]
CH1["{chatID}<br/>(tagged progress)"]
CH2["{chatID}:stream<br/>(clean content mirror)"]
end
H --> HashFields
PS --> Channels
QGen["QueryGenerator"] -->|write| SD
QGen -->|write| QG
QGen -->|publish| CH1
QEval["QueryEvaluator"] -->|write| VC
QEval -->|publish| CH1
OAI["openaiCompat.py"] -->|subscribe| CH1
OAI -->|publish| CH2
All session data is stored in a single Redis hash per session, keyed as {chatID}:SQLQueryEngine.
| Field | Written By | Contents |
|---|---|---|
dbSchemaDescription |
QueryGenerator, QueryEvaluator | JSON-serialized LangChain messages containing the full schema description. This is the most expensive piece to generate (requires DB introspection + LLM call) and is cached for reuse. |
dbQueryGenerator |
QueryGenerator | JSON-serialized chat history from the query generation step. Appended to on each query generation call. |
validatorChat:N |
QueryEvaluator | JSON-serialized chat history from the Nth repair loop invocation. A new key is created each time the evaluator runs. |
historyCounterManager |
QueryEvaluator | Auto-incrementing integer counter used to generate unique validatorChat:N keys. |
The schema description is the most important cached artifact. Here's how it works:
flowchart TD
A["Request arrives with chatID"] --> B{"Redis has<br/>dbSchemaDescription?"}
B -->|Yes| C["Load cached schema<br/>(fast path)"]
B -->|No| D["Introspect PostgreSQL<br/>(tables, columns, samples)"]
D --> E["LLM generates schema<br/>description (streamed)"]
E --> F["Cache in Redis"]
F --> C
C --> G["Proceed with<br/>query generation"]
The cache key can be controlled via the schemaDescriptionKey parameter. Changing it (e.g., from "dbSchemaDescription" to "dbSchemaDescription_v2") forces a fresh schema generation, which is useful when the database schema has changed.
The engine publishes progress messages to Redis Pub/Sub channels during pipeline execution. This enables real-time streaming in the OpenAI-compatible endpoint.
All Pub/Sub messages follow this format:
</{Component}:{Event}>{SPLIT_IDENTIFIER}{Content}
Where SPLIT_IDENTIFIER defaults to <|-/|-/>.
Example:
</SQLQueryEvaluator:QueryFixAttempt#1><|-/|-/>Current Query : SELECT COUNT(*) FROM orders
| Tag | Content |
|---|---|
</SQLQueryGenerator:schemaDescriptionChat> |
Streamed chunks of the LLM schema description |
On early-accept (query returned rows — no LLM call needed):
| Message | Content |
|---|---|
QueryFixAttempt#N |
Query executed successfully with {N} rows |
Before LLM call (query failed or returned empty — per retry):
| Message | Content |
|---|---|
QueryFixAttempt#N — Current Query |
The SQL being executed |
QueryFixAttempt#N — Current Observation |
Current status/observation |
QueryFixAttempt#N — Execution Errors |
PostgreSQL error details, or No errors encountered.
|
After LLM evaluation (per retry):
| Message | Content |
|---|---|
QueryFixAttempt#N — LLM Observation |
What the LLM diagnosed |
QueryFixAttempt#N — Fixed Query |
Corrected SQL query |
QueryFixAttempt#N |
Will verify fixed query on next iteration |
sequenceDiagram
participant Client
participant FastAPI as FastAPI (openaiCompat.py)
participant Redis as Redis Pub/Sub
participant Engine as SQLQueryEngine (thread)
Client->>FastAPI: POST /v1/chat/completions (stream: true)
FastAPI->>Redis: Subscribe to {chatID}
FastAPI->>Engine: Launch in thread pool
loop Pipeline Progress
Engine->>Redis: Publish progress message
Redis->>FastAPI: Receive message
FastAPI->>Client: SSE chunk (wrapped in think tags)
FastAPI->>Redis: Mirror to {chatID}:stream
end
Engine->>FastAPI: Return final result
FastAPI->>Redis: Drain residual messages
FastAPI->>Client: SSE chunk (final formatted result)
FastAPI->>Client: data: [DONE]
Key details:
- The subscriber attaches to the channel before the engine starts, preventing message loss
- Progress content is extracted after the
SPLIT_IDENTIFIERdelimiter (<|-/|-/>) - All progress is wrapped in
<think>...</think>tags (collapsible in OpenWebUI) - After the engine completes, residual messages are drained (100 retries, 100ms timeout each)
- A mirror channel
{chatID}:streampublishes clean content for external subscribers
Each chatID gets its own Redis hash namespace. There is no cross-session data sharing:
- Session
abc123→abc123:SQLQueryEnginehash +abc123Pub/Sub channel - Session
xyz789→xyz789:SQLQueryEnginehash +xyz789Pub/Sub channel
This means multiple users (or multiple conversations for the same user) can run concurrently without interference.
The engine uses two naming conventions for the session identifier, depending on the API surface:
| API | Parameter | How It's Set |
|---|---|---|
Native (/inference/*) |
chatID (path parameter) |
Always provided explicitly in the URL |
OpenAI-compat (/v1/*) |
chat_id (request body) |
Optional. If omitted, derived from MD5(first_user_message)[:16]
|
Both resolve to the same Redis key prefix internally. The MD5 fallback ensures that identical first messages share context even without an explicit session ID, but explicit chat_id is more reliable for multi-turn usage.
When the evaluator (Stage 2) needs schema context, it follows a three-tier fallback:
flowchart TD
A["QueryEvaluator needs<br/>schema context"] --> B{"Generator payload<br/>available?"}
B -->|Yes| C["Use schema from<br/>generator output<br/>(fastest — in-memory)"]
B -->|No| D{"Redis cache<br/>has schema?"}
D -->|Yes| E["Load from Redis<br/>(fast — network call)"]
D -->|No| F["Generate from scratch<br/>(slow — DB + LLM)"]
C --> G["Proceed with evaluation"]
E --> G
F --> G
- From payload — When the full pipeline runs, the generator passes its schema context directly to the evaluator (zero-cost)
- From Redis — When only the evaluator is called, it loads the cached schema from a previous session turn
- From scratch — Full schema introspection + LLM description (slowest, used only when no cache exists)
| Environment Variable | Default | Description |
|---|---|---|
REDIS_HOST |
"" (empty) |
Redis server hostname |
REDIS_PORT |
not set | Redis server port |
REDIS_PASSWORD |
not set | Redis authentication password |
REDIS_DB |
not set | Redis database number (0–15) |
The Redis client is configured with decode_responses=True for string-based storage.
📄 Paper: arXiv:2604.16511 | 📊 Dataset: Hugging Face | 💻 Source: GitHub
SQL Query Engine
Design
Setup
API
Internals
Evaluation
Help