Release v0.1.0 — first extraction milestone · SnapdragonPartners/maestro-llms

First extraction milestone: an app-neutral Go LLM/embedding toolkit shared by Maestro and Morris.

Included

llms — app-neutral chat + embedding interfaces; Message/ContentPart (tool calls/results as content parts), Usage (incl. cache tokens), classified ProviderError/LimitError; optional StreamingChatClient capability interface.
llms/testllm — deterministic, concurrency-safe chat + embedding fakes (recordings deep-copied; stable hash embeddings).
llms/ratelimit — Limiter/Reservation reservation protocol + in-memory token-bucket + concurrency limiter (lazy clock-based refill, no background goroutine; reconciliation contract).
llms/middleware — ChainChat/ChainEmbeddings (first arg outermost), DefaultEstimator, rate-limit middleware (reserve → release on cancellation-surviving ctx → commit actual usage).
llms/providers/anthropic — Anthropic chat: typed SDK-error classification, cache-token usage, raw tool-call params, strict-alternation translation.
llms/providers/openai — OpenAI embeddings: input-order/ID preserving, per-request dimension override, batch-limit validation.

Go 1.26.3+ (patched net/net/http stdlib advisories GO-2026-4971 / GO-2026-4918).

v0.2: retry / timeout / circuit-breaker middleware, metrics hook interfaces, richer error classification.
v0.3: Google + Ollama providers, optional streaming, shared provider error-classifier.

Distributed (e.g. PostgreSQL) limiters are implemented by applications against the ratelimit.Limiter interface; none ship here yet by design.