v0.1.0 — first extraction milestone
First extraction milestone: an app-neutral Go LLM/embedding toolkit shared by Maestro and Morris.
Included
llms— app-neutral chat + embedding interfaces;Message/ContentPart(tool calls/results as content parts),Usage(incl. cache tokens), classifiedProviderError/LimitError; optionalStreamingChatClientcapability interface.llms/testllm— deterministic, concurrency-safe chat + embedding fakes (recordings deep-copied; stable hash embeddings).llms/ratelimit—Limiter/Reservationreservation protocol + in-memory token-bucket + concurrency limiter (lazy clock-based refill, no background goroutine; reconciliation contract).llms/middleware—ChainChat/ChainEmbeddings(first arg outermost),DefaultEstimator, rate-limit middleware (reserve → release on cancellation-surviving ctx → commit actual usage).llms/providers/anthropic— Anthropic chat: typed SDK-error classification, cache-token usage, raw tool-call params, strict-alternation translation.llms/providers/openai— OpenAI embeddings: input-order/ID preserving, per-request dimension override, batch-limit validation.
Requirements
Go 1.26.3+ (patched net/net/http stdlib advisories GO-2026-4971 / GO-2026-4918).
Not included (roadmap)
- v0.2: retry / timeout / circuit-breaker middleware, metrics hook interfaces, richer error classification.
- v0.3: Google + Ollama providers, optional streaming, shared provider error-classifier.
Distributed (e.g. PostgreSQL) limiters are implemented by applications against the ratelimit.Limiter interface; none ship here yet by design.