Skip to content

v0.6.0 — model listing & text token estimator

Choose a tag to compare

@dratner dratner released this 30 May 21:41
· 4 commits to main since this release
70aff1b

Two additive features, no core type breakage, no MAESTRO_DIVERGENCES.md rows. Adds maestro-cms as a third consumer of the toolkit alongside Maestro and Morris.

llms.ModelLister + per-provider LatestInFamily (ADR-0012)

Optional capability for listing a provider's catalog and detecting newer models in the same family as a pinned ID. Surfaces "newer model available, upgrade?" — never auto-updates.

lister, ok := client.(llms.ModelLister)
if !ok { return } // provider doesn't expose a list (e.g. future vLLM)

models, _ := lister.ListModels(ctx)
newer, found := anthropic.LatestInFamily(currentID, models)
if found {
    fmt.Printf("Newer model: %s (released %s)\n", newer.ID, newer.Created.Format("2006-01-02"))
}

// Or one-shot:
newer, found, err = client.LatestInFamily(ctx, currentID)
  • Anthropic — family claude-{opus|sonnet|haiku}, crosses generations (claude-3-5-sonnet-… and claude-sonnet-4-5-… are both claude-sonnet). Ordered by CreatedAt.
  • OpenAI — family = ID with -YYYY-MM-DD stripped. Self-filtering by family-prefix means embedding/image models in the catalog don't collide with gpt-* queries. Ordered by Created (Unix).
  • Google — family gemini-{pro|flash|nano|ultra}. genai exposes no created date, so ordered by parsed numeric version in the ID.
  • OllamaListModels only (local pulls have no canonical family). Created is local pull time, not provider release time.

Permissive family parsing by design. Callers wanting major-version pinning filter the list themselves.

llms.EstimateTextTokens (ADR-0013)

Exported free function for budget-aware text chunking. Zero new dependencies.

budget := llms.EstimateTextTokens(s) // approx token count
// Directly assignable as func(string) int — e.g. for maestro-cms chunk injection.
  • Neutral bias (~4 chars/token, rune-counted) — intentionally distinct from the middleware TokenEstimator's high bias.
  • Two estimators, two purposes: over-reserving at the limiter is safe; over-estimating at chunk time wastes API calls. ADR-0013 makes the split binding.
  • Tokenizer-backed / model-aware variant deferred to a future ADR when a consumer needs the fidelity.

Compatibility

Additive new surface throughout. Existing TokenEstimator, ChatClient, rate-limiter behavior all unchanged. govulncheck clean (bumped golang.org/x/net to v0.55.0 to resolve GO-2026-5026 along the way).

Pre-1.0; v0.x minor versions may break.

🤖 Generated with Claude Code