Release v0.7.0 — vLLM provider · SnapdragonPartners/maestro-llms

Headline: self-hosted GPU inference via vLLM. The toolkit now wraps five chat providers behind one stable interface: Anthropic, OpenAI (Responses), Google, Ollama, vLLM.

`llms/providers/vllm` (ADR-0015)

Leaf package alongside openai, using the OpenAI Chat Completions surface (not Responses) via openai-go with a configurable base URL. Zero new dependencies — openai-go is already imported for the existing OpenAI Responses adapter.

import "github.com/SnapdragonPartners/maestro-llms/llms/providers/vllm"

client, _ := vllm.New(
    vllm.WithBaseURL("http://my-vllm:8000"),
    vllm.WithModel("mistralai/Ministral-3-14B-Instruct-2512"),
    // WithAPIKey is OPTIONAL — vLLM's default deployment has no auth.
)

Key behaviors:

No-auth by default is the distinguishing feature vs hosted providers. An empty WithAPIKey is a valid configuration, not a config error.
ModelLister implemented; LatestInFamily is not — HuggingFace-style names have no canonical family convention. Same shape as Ollama. Regression-guard test ensures this stays binding.
ModelInfo.Created is the load time on the vLLM instance, NOT the upstream HuggingFace release date. Don't surface it as freshness.
Tool calling works through the standard tools / tool_choice request fields, but actual emission depends on the vLLM server's per-model --tool-call-parser config.
Streaming deferred per ADR-0003.

Live integration test gated on MAESTRO_VLLM (full base URL). Optional MAESTRO_VLLM_MODEL overrides the model ID, defaulting to the first one /v1/models reports. Picked up automatically by make test-integration.

ADR-0013 amendment (PR #44)

Append-only clarification that the future opt-in TextEstimator design space includes API-backed variants (Anthropic count_tokens, Gemini CountTokens) — both zero-new-dep wrappers since those SDKs are already imported — not just embedded tokenizers like tiktoken-go. Decision unchanged; framing widened so the future ADR doesn't get retconned.

`MAESTRO_DIVERGENCES.md`

Informational V1-V5 rows for vLLM (greenfield, no Maestro behavior to diverge from; rows document vLLM-specific behaviors a cut-over consumer should know about).

Not in this tag (separate module)

examples/chat (PR #46) is a runs-locally web demo wiring every provider through one UI (RecommendedChat middleware, per-provider ListModels, EstimateTextTokens). Lives under its own go.mod so its dependency closure doesn't leak into the toolkit. Build/run from examples/chat with go run ..

Compatibility

Additive throughout. No core llms type changes. Existing v0.6.x consumers are unaffected. govulncheck clean.

Pre-1.0; v0.x minor versions may break.

🤖 Generated with Claude Code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.7.0 — vLLM provider

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

`llms/providers/vllm` (ADR-0015)

ADR-0013 amendment (PR #44)

`MAESTRO_DIVERGENCES.md`

Not in this tag (separate module)

Compatibility

Uh oh!

v0.7.0 — vLLM provider

llms/providers/vllm (ADR-0015)

ADR-0013 amendment (PR #44)

MAESTRO_DIVERGENCES.md

Not in this tag (separate module)

Compatibility

Uh oh!

`llms/providers/vllm` (ADR-0015)

`MAESTRO_DIVERGENCES.md`