Skip to content

Session Management

Adeel Ijaz edited this page Mar 31, 2026 · 2 revisions

SQL Query Engine uses Redis for two critical functions: session caching (storing schema descriptions and chat histories) and real-time streaming (Pub/Sub progress updates). This page explains the Redis architecture, key structure, and how context is preserved across requests.

Redis Architecture Overview

graph TB
    subgraph Redis
        direction TB
        H["Hash Store<br/>{chatID}:SQLQueryEngine"]
        PS["Pub/Sub Channels"]
    end

    subgraph HashFields["Hash Fields"]
        SD["dbSchemaDescription<br/>(cached schema)"]
        QG["dbQueryGenerator<br/>(generation history)"]
        VC["validatorChat:1, :2, ...<br/>(repair loop histories)"]
        HC["historyCounterManager<br/>(auto-increment counter)"]
    end

    subgraph Channels["Pub/Sub Channels"]
        CH1["{chatID}<br/>(tagged progress)"]
        CH2["{chatID}:stream<br/>(clean content mirror)"]
    end

    H --> HashFields
    PS --> Channels

    QGen["QueryGenerator"] -->|write| SD
    QGen -->|write| QG
    QGen -->|publish| CH1
    QEval["QueryEvaluator"] -->|write| VC
    QEval -->|publish| CH1
    OAI["openaiCompat.py"] -->|subscribe| CH1
    OAI -->|publish| CH2
Loading

Hash Store: Session Context

All session data is stored in a single Redis hash per session, keyed as {chatID}:SQLQueryEngine.

Hash Fields

Field Written By Contents
dbSchemaDescription QueryGenerator, QueryEvaluator JSON-serialized LangChain messages containing the full schema description. This is the most expensive piece to generate (requires DB introspection + LLM call) and is cached for reuse.
dbQueryGenerator QueryGenerator JSON-serialized chat history from the query generation step. Appended to on each query generation call.
validatorChat:N QueryEvaluator JSON-serialized chat history from the Nth repair loop invocation. A new key is created each time the evaluator runs.
historyCounterManager QueryEvaluator Auto-incrementing integer counter used to generate unique validatorChat:N keys.

Schema Caching Strategy

The schema description is the most important cached artifact. Here's how it works:

flowchart TD
    A["Request arrives with chatID"] --> B{"Redis has<br/>dbSchemaDescription?"}
    B -->|Yes| C["Load cached schema<br/>(fast path)"]
    B -->|No| D["Introspect PostgreSQL<br/>(tables, columns, samples)"]
    D --> E["LLM generates schema<br/>description (streamed)"]
    E --> F["Cache in Redis"]
    F --> C
    C --> G["Proceed with<br/>query generation"]
Loading

The cache key can be controlled via the schemaDescriptionKey parameter. Changing it (e.g., from "dbSchemaDescription" to "dbSchemaDescription_v2") forces a fresh schema generation, which is useful when the database schema has changed.

Pub/Sub: Real-Time Progress Streaming

The engine publishes progress messages to Redis Pub/Sub channels during pipeline execution. This enables real-time streaming in the OpenAI-compatible endpoint.

Message Format

All Pub/Sub messages follow this format:

</{Component}:{Event}>{SPLIT_IDENTIFIER}{Content}

Where SPLIT_IDENTIFIER defaults to <|-/|-/>.

Example:

</SQLQueryEvaluator:QueryFixAttempt#1><|-/|-/>Current Query : SELECT COUNT(*) FROM orders

Stage 1 Messages (QueryGenerator)

Tag Content
</SQLQueryGenerator:schemaDescriptionChat> Streamed chunks of the LLM schema description

Stage 2 Messages (QueryEvaluator)

On early-accept (query returned rows — no LLM call needed):

Message Content
QueryFixAttempt#N Query executed successfully with {N} rows

Before LLM call (query failed or returned empty — per retry):

Message Content
QueryFixAttempt#N — Current Query The SQL being executed
QueryFixAttempt#N — Current Observation Current status/observation
QueryFixAttempt#N — Execution Errors PostgreSQL error details, or No errors encountered.

After LLM evaluation (per retry):

Message Content
QueryFixAttempt#N — LLM Observation What the LLM diagnosed
QueryFixAttempt#N — Fixed Query Corrected SQL query
QueryFixAttempt#N Will verify fixed query on next iteration

Streaming Subscriber Architecture

sequenceDiagram
    participant Client
    participant FastAPI as FastAPI (openaiCompat.py)
    participant Redis as Redis Pub/Sub
    participant Engine as SQLQueryEngine (thread)

    Client->>FastAPI: POST /v1/chat/completions (stream: true)
    FastAPI->>Redis: Subscribe to {chatID}
    FastAPI->>Engine: Launch in thread pool

    loop Pipeline Progress
        Engine->>Redis: Publish progress message
        Redis->>FastAPI: Receive message
        FastAPI->>Client: SSE chunk (wrapped in think tags)
        FastAPI->>Redis: Mirror to {chatID}:stream
    end

    Engine->>FastAPI: Return final result
    FastAPI->>Redis: Drain residual messages
    FastAPI->>Client: SSE chunk (final formatted result)
    FastAPI->>Client: data: [DONE]
Loading

Key details:

  • The subscriber attaches to the channel before the engine starts, preventing message loss
  • Progress content is extracted after the SPLIT_IDENTIFIER delimiter (<|-/|-/>)
  • All progress is wrapped in <think>...</think> tags (collapsible in OpenWebUI)
  • After the engine completes, residual messages are drained (100 retries, 100ms timeout each)
  • A mirror channel {chatID}:stream publishes clean content for external subscribers

Session Isolation

Each chatID gets its own Redis hash namespace. There is no cross-session data sharing:

  • Session abc123abc123:SQLQueryEngine hash + abc123 Pub/Sub channel
  • Session xyz789xyz789:SQLQueryEngine hash + xyz789 Pub/Sub channel

This means multiple users (or multiple conversations for the same user) can run concurrently without interference.

chatID vs chat_id

The engine uses two naming conventions for the session identifier, depending on the API surface:

API Parameter How It's Set
Native (/inference/*) chatID (path parameter) Always provided explicitly in the URL
OpenAI-compat (/v1/*) chat_id (request body) Optional. If omitted, derived from MD5(first_user_message)[:16]

Both resolve to the same Redis key prefix internally. The MD5 fallback ensures that identical first messages share context even without an explicit session ID, but explicit chat_id is more reliable for multi-turn usage.

Schema Context Resolution in the Evaluator

When the evaluator (Stage 2) needs schema context, it follows a three-tier fallback:

flowchart TD
    A["QueryEvaluator needs<br/>schema context"] --> B{"Generator payload<br/>available?"}
    B -->|Yes| C["Use schema from<br/>generator output<br/>(fastest — in-memory)"]
    B -->|No| D{"Redis cache<br/>has schema?"}
    D -->|Yes| E["Load from Redis<br/>(fast — network call)"]
    D -->|No| F["Generate from scratch<br/>(slow — DB + LLM)"]

    C --> G["Proceed with evaluation"]
    E --> G
    F --> G
Loading
  1. From payload — When the full pipeline runs, the generator passes its schema context directly to the evaluator (zero-cost)
  2. From Redis — When only the evaluator is called, it loads the cached schema from a previous session turn
  3. From scratch — Full schema introspection + LLM description (slowest, used only when no cache exists)

Redis Connection Configuration

Environment Variable Default Description
REDIS_HOST "" (empty) Redis server hostname
REDIS_PORT not set Redis server port
REDIS_PASSWORD not set Redis authentication password
REDIS_DB not set Redis database number (0–15)

The Redis client is configured with decode_responses=True for string-based storage.

SQL Query Engine

Design

Setup

API

Internals

Evaluation

Help

Clone this wiki locally