-
Notifications
You must be signed in to change notification settings - Fork 1
OpenAI Compatibility
SQL Query Engine implements the OpenAI API specification, so it works as a drop-in backend for any tool built against the OpenAI API. This page shows how to integrate with popular clients.
| OpenAI Endpoint | Supported | Notes |
|---|---|---|
GET /v1/models |
Yes | Returns the engine's model name |
POST /v1/chat/completions |
Yes | Full support including streaming |
POST /v1/completions |
Yes | Legacy text completions |
POST /v1/embeddings |
No | Not applicable |
POST /v1/images/* |
No | Not applicable |
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:5181/v1",
api_key="meow123",
)
# Non-streaming
response = client.chat.completions.create(
model="SQLBot",
messages=[{"role": "user", "content": "How many orders were placed last month?"}],
stream=False,
)
print(response.choices[0].message.content)
# Streaming
stream = client.chat.completions.create(
model="SQLBot",
messages=[{"role": "user", "content": "Show me the top 5 products by revenue."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")Pass chat_id as an extra body parameter to preserve context across turns:
# Turn 1
response = client.chat.completions.create(
model="SQLBot",
messages=[{"role": "user", "content": "How many tables are in the database?"}],
stream=False,
extra_body={"chat_id": "my-session-001"},
)
answer_1 = response.choices[0].message.content
# Turn 2 — same chat_id, schema is reused from cache
response = client.chat.completions.create(
model="SQLBot",
messages=[
{"role": "user", "content": "How many tables are in the database?"},
{"role": "assistant", "content": answer_1},
{"role": "user", "content": "Show me the columns in the largest table."},
],
stream=False,
extra_body={"chat_id": "my-session-001"},
)
print(response.choices[0].message.content)import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:5181/v1',
apiKey: 'meow123',
});
// Non-streaming
const response = await client.chat.completions.create({
model: 'SQLBot',
messages: [{ role: 'user', content: 'How many orders last month?' }],
stream: false,
});
console.log(response.choices[0].message.content);
// Streaming
const stream = await client.chat.completions.create({
model: 'SQLBot',
messages: [{ role: 'user', content: 'Top 5 products by revenue?' }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="http://localhost:5181/v1",
api_key="meow123",
model="SQLBot",
streaming=True,
)
# Simple invocation
response = llm.invoke("How many active users are there?")
print(response.content)
# Streaming
for chunk in llm.stream("Show me revenue by month for this year."):
print(chunk.content, end="")OpenWebUI is pre-configured in the Docker Compose setup. If you're connecting manually:
- Open OpenWebUI (default:
http://localhost:5182) - Go to Settings → Connections
- Add a new OpenAI-compatible connection:
-
URL:
http://sqlqueryengine:8080/v1(Docker internal) orhttp://localhost:5181/v1(external) -
API Key: Your
OPENAI_API_KEYvalue
-
URL:
- The model
SQLBotshould appear in the model selector
OpenWebUI automatically:
- Sends
chat_idfor context preservation across turns - Renders
<think>...</think>blocks as collapsible reasoning sections - Shows streaming progress in real-time
See the Usage Guide for comprehensive curl examples covering every endpoint, streaming modes, multi-turn sessions, and connection overrides.
The chat_id field controls session-level context caching:
When chat_id is provided: The engine stores the schema description in Redis under {chat_id}:SQLQueryEngine. The first request in a session introspects the database and caches the schema. Subsequent requests with the same chat_id reuse the cache — no re-introspection needed.
When chat_id is omitted: The engine derives a stable ID from MD5(first_user_message)[:16]. This means two requests with the same first message will share context, which works well for single-turn usage but is less reliable for multi-turn sessions.
Recommendation: Always provide chat_id explicitly for multi-turn conversations.
Streaming responses follow the OpenAI SSE (Server-Sent Events) specification:
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"<think>\n"},"finish_reason":null}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"next token"},"finish_reason":null}]}
data: [DONE]
Pipeline progress (schema generation, query execution, repair loop steps) is wrapped in <think>...</think> tags within the streamed content. Clients that support reasoning display (like OpenWebUI) render these as collapsible sections.
| Feature | OpenAI API | SQL Query Engine |
|---|---|---|
model field |
Selects the model | Ignored — the engine always uses the configured pipeline |
chat_id field |
Not present | Custom field for session management |
temperature |
Controls sampling | Ignored — set via LLM_TEMPERATURE env var |
max_tokens |
Limits output | Ignored — not applicable to the SQL engine |
usage token counts |
Accurate | Returns zeros (token counting not implemented) |
| Multiple models | Many models | Single model (the engine itself) |
| Function calling | Supported | Not supported |
| Vision | Supported | Not supported |
<think> tags in streaming |
Not present | Used for pipeline progress visibility |
📄 Paper: arXiv:2604.16511 | 📊 Dataset: Hugging Face | 💻 Source: GitHub
SQL Query Engine
Design
Setup
API
Internals
Evaluation
Help