Skip to content

OpenAI Compatibility

Adeel Ijaz edited this page Mar 31, 2026 · 2 revisions

SQL Query Engine implements the OpenAI API specification, so it works as a drop-in backend for any tool built against the OpenAI API. This page shows how to integrate with popular clients.

Supported OpenAI Endpoints

OpenAI Endpoint Supported Notes
GET /v1/models Yes Returns the engine's model name
POST /v1/chat/completions Yes Full support including streaming
POST /v1/completions Yes Legacy text completions
POST /v1/embeddings No Not applicable
POST /v1/images/* No Not applicable

Python — OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:5181/v1",
    api_key="meow123",
)

# Non-streaming
response = client.chat.completions.create(
    model="SQLBot",
    messages=[{"role": "user", "content": "How many orders were placed last month?"}],
    stream=False,
)
print(response.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
    model="SQLBot",
    messages=[{"role": "user", "content": "Show me the top 5 products by revenue."}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Multi-Turn Conversations

Pass chat_id as an extra body parameter to preserve context across turns:

# Turn 1
response = client.chat.completions.create(
    model="SQLBot",
    messages=[{"role": "user", "content": "How many tables are in the database?"}],
    stream=False,
    extra_body={"chat_id": "my-session-001"},
)
answer_1 = response.choices[0].message.content

# Turn 2 — same chat_id, schema is reused from cache
response = client.chat.completions.create(
    model="SQLBot",
    messages=[
        {"role": "user", "content": "How many tables are in the database?"},
        {"role": "assistant", "content": answer_1},
        {"role": "user", "content": "Show me the columns in the largest table."},
    ],
    stream=False,
    extra_body={"chat_id": "my-session-001"},
)
print(response.choices[0].message.content)

JavaScript / TypeScript — OpenAI SDK

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:5181/v1',
  apiKey: 'meow123',
});

// Non-streaming
const response = await client.chat.completions.create({
  model: 'SQLBot',
  messages: [{ role: 'user', content: 'How many orders last month?' }],
  stream: false,
});
console.log(response.choices[0].message.content);

// Streaming
const stream = await client.chat.completions.create({
  model: 'SQLBot',
  messages: [{ role: 'user', content: 'Top 5 products by revenue?' }],
  stream: true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Python — LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="http://localhost:5181/v1",
    api_key="meow123",
    model="SQLBot",
    streaming=True,
)

# Simple invocation
response = llm.invoke("How many active users are there?")
print(response.content)

# Streaming
for chunk in llm.stream("Show me revenue by month for this year."):
    print(chunk.content, end="")

OpenWebUI Integration

OpenWebUI is pre-configured in the Docker Compose setup. If you're connecting manually:

  1. Open OpenWebUI (default: http://localhost:5182)
  2. Go to Settings → Connections
  3. Add a new OpenAI-compatible connection:
    • URL: http://sqlqueryengine:8080/v1 (Docker internal) or http://localhost:5181/v1 (external)
    • API Key: Your OPENAI_API_KEY value
  4. The model SQLBot should appear in the model selector

OpenWebUI automatically:

  • Sends chat_id for context preservation across turns
  • Renders <think>...</think> blocks as collapsible reasoning sections
  • Shows streaming progress in real-time

curl

See the Usage Guide for comprehensive curl examples covering every endpoint, streaming modes, multi-turn sessions, and connection overrides.

chat_id Context Preservation

The chat_id field controls session-level context caching:

When chat_id is provided: The engine stores the schema description in Redis under {chat_id}:SQLQueryEngine. The first request in a session introspects the database and caches the schema. Subsequent requests with the same chat_id reuse the cache — no re-introspection needed.

When chat_id is omitted: The engine derives a stable ID from MD5(first_user_message)[:16]. This means two requests with the same first message will share context, which works well for single-turn usage but is less reliable for multi-turn sessions.

Recommendation: Always provide chat_id explicitly for multi-turn conversations.

Streaming Format

Streaming responses follow the OpenAI SSE (Server-Sent Events) specification:

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"<think>\n"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"next token"},"finish_reason":null}]}

data: [DONE]

Pipeline progress (schema generation, query execution, repair loop steps) is wrapped in <think>...</think> tags within the streamed content. Clients that support reasoning display (like OpenWebUI) render these as collapsible sections.

Differences from Standard OpenAI API

Feature OpenAI API SQL Query Engine
model field Selects the model Ignored — the engine always uses the configured pipeline
chat_id field Not present Custom field for session management
temperature Controls sampling Ignored — set via LLM_TEMPERATURE env var
max_tokens Limits output Ignored — not applicable to the SQL engine
usage token counts Accurate Returns zeros (token counting not implemented)
Multiple models Many models Single model (the engine itself)
Function calling Supported Not supported
Vision Supported Not supported
<think> tags in streaming Not present Used for pipeline progress visibility

SQL Query Engine

Design

Setup

API

Internals

Evaluation

Help

Clone this wiki locally