Skip to content

v0.1.0 — first extraction milestone

Choose a tag to compare

@dratner dratner released this 16 May 01:26
· 40 commits to main since this release
342528a

First extraction milestone: an app-neutral Go LLM/embedding toolkit shared by Maestro and Morris.

Included

  • llms — app-neutral chat + embedding interfaces; Message/ContentPart (tool calls/results as content parts), Usage (incl. cache tokens), classified ProviderError/LimitError; optional StreamingChatClient capability interface.
  • llms/testllm — deterministic, concurrency-safe chat + embedding fakes (recordings deep-copied; stable hash embeddings).
  • llms/ratelimitLimiter/Reservation reservation protocol + in-memory token-bucket + concurrency limiter (lazy clock-based refill, no background goroutine; reconciliation contract).
  • llms/middlewareChainChat/ChainEmbeddings (first arg outermost), DefaultEstimator, rate-limit middleware (reserve → release on cancellation-surviving ctx → commit actual usage).
  • llms/providers/anthropic — Anthropic chat: typed SDK-error classification, cache-token usage, raw tool-call params, strict-alternation translation.
  • llms/providers/openai — OpenAI embeddings: input-order/ID preserving, per-request dimension override, batch-limit validation.

Requirements

Go 1.26.3+ (patched net/net/http stdlib advisories GO-2026-4971 / GO-2026-4918).

Not included (roadmap)

  • v0.2: retry / timeout / circuit-breaker middleware, metrics hook interfaces, richer error classification.
  • v0.3: Google + Ollama providers, optional streaming, shared provider error-classifier.

Distributed (e.g. PostgreSQL) limiters are implemented by applications against the ratelimit.Limiter interface; none ship here yet by design.