feat(agent): LLM capability tiers — per-provider model selection per task by arberx · Pull Request #585 · AINYC/canonry

arberx · 2026-05-18T01:11:38Z

Summary

Introduces a small abstraction over Aero's provider registry so future LLM features pick an appropriately-tiered model without hand-coding per-feature model dictionaries. No user-visible change — Aero continues to use Opus/GPT-5.1/Flash/GLM-5.1 exactly as before. The abstraction's only consumer today is the existing `resolveModelForProvider` wrapper; the next PR (Phase 1 "explain this" feature) is the first feature to use `analyze` directly.

What

Three capability tiers describe what an LLM call needs:

Tier	Use	Default models per provider
`agent`	Multi-step tool use, premium models (Aero's loop)	claude-opus-4-7, gpt-5.1, gemini-2.5-flash, glm-5.1
`analyze`	Single-shot synthesis, mid-tier (explain-this, content briefs)	claude-sonnet-4-6, gpt-5-mini, gemini-2.5-flash, glm-5-turbo
`classify`	Short structured judgments, cheapest (page-coverage, classification)	claude-haiku-4-5, gpt-5-nano, gemini-2.5-flash-lite, glm-5-turbo

New primitive `resolveModelForCapability(provider, capability, override?)`: features declare their tier; the project's configured provider decides which actual model fills it. Existing `resolveModelForProvider(provider)` is preserved as a thin wrapper that delegates to the `agent` capability — backward-compatible, behavior-preserving for every existing caller.

Single source of truth + drift detection

`AGENT_PROVIDERS[id].defaultModel` now sources its value from `PROVIDER_MODELS[id].agent` at module-load time (keeps the DTO + CLI display rendering exactly the same thing they did before). `validateAgentProviderRegistry()` runs at module load AND in tests:

Every (provider, capability) pair MUST resolve to a real pi-ai model — catches model-id typos and pi-ai catalog drift at import time, not at first request.
Every provider MUST declare every capability — adding a new capability to `LlmCapabilities` without updating all providers fails CI loudly.
`AGENT_PROVIDERS[id].defaultModel === PROVIDER_MODELS[id].agent` — drift between the two surfaces would silently mislead the dashboard / CLI provider picker.

Tests

10 new tests in `agent-providers.test.ts`. Full suite: 2764/2764 pass.

Stacks

Phase 1 "explain this recommendation" feature lands next, using the `analyze` capability.

🤖 Generated with Claude Code

…task Introduces a small abstraction over Aero's provider registry so future LLM features pick an appropriately-tiered model without hand-coding per-feature model dictionaries. ## Why Aero already has provider selection (Claude/OpenAI/Gemini/Zai with API key auto-detection); each provider has a single `defaultModel` tuned for Aero's multi-step agent loop. The upcoming "explain this recommendation" feature needs the same provider selection but a CHEAPER model — Claude Opus is overkill for a single-shot 4-paragraph summary. Without an abstraction the obvious move is to hard-code an `EXPLAIN_MODELS` dictionary, then a `BRIEF_MODELS`, then a `CLASSIFY_MODELS` — N feature × M provider dictionaries scattered across the codebase, each one a separate place to remember when models get bumped. ## What Three capability tiers describe what an LLM call needs: agent — multi-step tool use, premium models (Aero's loop) analyze — single-shot synthesis, mid-tier models (explain-this, content brief generation) classify — short structured judgments, cheapest fast models (semantic page-coverage match, domain classification) Each provider declares its model per tier in `PROVIDER_MODELS`: claude: agent='claude-opus-4-7' analyze='claude-sonnet-4-6' classify='claude-haiku-4-5' openai: agent='gpt-5.1' analyze='gpt-5-mini' classify='gpt-5-nano' gemini: agent='gemini-2.5-flash' analyze='gemini-2.5-flash' classify='gemini-2.5-flash-lite' zai: agent='glm-5.1' analyze='glm-5-turbo' classify='glm-5-turbo' New primitive `resolveModelForCapability(provider, capability, override?)`: features declare their tier; the project's configured provider decides which actual model fills it. Existing `resolveModelForProvider(provider)` is preserved as a thin wrapper that delegates to the `agent` capability — backward-compatible, behavior-preserving for every existing caller. ## Single source of truth + drift detection `AGENT_PROVIDERS[id].defaultModel` now sources its value from `PROVIDER_MODELS[id].agent` at module-load time (keeps the DTO + CLI display rendering exactly the same thing they did before). `validateAgentProviderRegistry()` runs at module load AND in tests: 1. Every (provider, capability) pair MUST resolve to a real pi-ai model — catches model-id typos and pi-ai catalog drift at import time, not at first request. 2. Every provider MUST declare every capability — adding a new capability to `LlmCapabilities` without updating all providers fails CI loudly. 3. `AGENT_PROVIDERS[id].defaultModel === PROVIDER_MODELS[id].agent` — drift between the two surfaces would silently mislead the dashboard / CLI provider picker. ## What's user-visible Nothing. Aero continues to use Opus / GPT-5.1 / Gemini Flash / GLM 5.1 exactly as before (the `agent` capability tier IS the historical default). DTO shape unchanged. CLI display unchanged. No config migration needed. The abstraction's only consumer today is the existing `resolveModelForProvider` wrapper; the next PR (Phase 1 "explain this" feature) is the first feature to use `analyze` directly. ## Tests 10 new tests in `agent-providers.test.ts`: - Every provider declares every capability tier - `AGENT_PROVIDERS.defaultModel` mirrors `PROVIDER_MODELS[id].agent` - Every (provider, capability) pair resolves in pi-ai - `validateAgentProviderRegistry` catches all three drift modes - `resolveModelForCapability` returns the correct model per tier - Caller-supplied model override works (per-call escape hatch) - Throws on unknown model id - `resolveModelForProvider` is behavior-preserving wrapper Verification: - `pnpm -r typecheck`: 0 errors - `pnpm -r lint`: 0 errors - 2764/2764 tests pass Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…586) Adds an on-demand LLM rationale for each content recommendation card, cached per (project, target_ref, prompt_version) so repeat clicks are free. The heuristic classifier still produces the structured recommendation; this layer explains the reasoning and suggests concrete next steps in natural language. Backend: - New `recommendation_explanations` table + migration v62 (idx on project_id + targetRef + promptVersion unique). - `RecommendationExplanationDto` + `RecommendationExplainRequest` schemas in @ainyc/canonry-contracts. - `GET /projects/:name/content/recommendations/:targetRef/analysis` cache-only read (404 when no cached explanation exists). - `POST /projects/:name/content/recommendations/:targetRef/analyze` returns cached row or invokes injected `ExplainContentRecommendationFn`; supports `provider` / `model` / `forceRefresh` overrides. LLM wiring (api-routes stays LLM-agnostic): - `ExplainContentRecommendationFn` is dependency-injected via `ApiRoutesOptions.explainContentRecommendation`. Missing implementation → 503 PROVIDER_ERROR with operator-friendly message. - `packages/canonry` ships the pi-ai implementation (`createRecommendationExplainer`) using the new `analyze` capability tier from PR #585 — Claude → sonnet-4-6, OpenAI → gpt-5-mini, Gemini → 2.5-flash, Zai → glm-5-turbo. - Provider selection mirrors Aero: caller override → first-configured by priority → 502 PROVIDER_ERROR. Unknown override → 400. - pi-ai dollar cost converted to `costMillicents` (×100,000, rounded int) so totals stay drift-free. Tests: 29 new (route handlers + helper unit tests with mocked `complete()` — provider selection, override validation, cache hit/miss, forceRefresh overwrite, cost conversion, empty-response guard). Closes phase 1 of the LLM-augmented recommendation engine. Frontend "Why this?" panel ships in a follow-up PR. Co-authored-by: Claw (AINYC Agent) <arberx@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

arberx merged commit 78d6d98 into main May 18, 2026
9 checks passed

arberx deleted the feat/llm-capability-tiers branch May 18, 2026 01:11

arberx mentioned this pull request May 18, 2026

feat(content): LLM-backed "Why this?" explainer for recommendations #586

Merged

4 tasks

arberx mentioned this pull request May 18, 2026

feat(web): "Why this?" panel for content recommendations #587

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): LLM capability tiers — per-provider model selection per task#585

feat(agent): LLM capability tiers — per-provider model selection per task#585
arberx merged 1 commit into
mainfrom
feat/llm-capability-tiers

arberx commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arberx commented May 18, 2026

Summary

What

Single source of truth + drift detection

Tests

Stacks

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant