feat(agent): LLM capability tiers — per-provider model selection per task#585
Merged
Conversation
…task
Introduces a small abstraction over Aero's provider registry so future
LLM features pick an appropriately-tiered model without hand-coding
per-feature model dictionaries.
## Why
Aero already has provider selection (Claude/OpenAI/Gemini/Zai with API
key auto-detection); each provider has a single `defaultModel` tuned
for Aero's multi-step agent loop. The upcoming "explain this
recommendation" feature needs the same provider selection but a CHEAPER
model — Claude Opus is overkill for a single-shot 4-paragraph
summary. Without an abstraction the obvious move is to hard-code an
`EXPLAIN_MODELS` dictionary, then a `BRIEF_MODELS`, then a
`CLASSIFY_MODELS` — N feature × M provider dictionaries scattered
across the codebase, each one a separate place to remember when models
get bumped.
## What
Three capability tiers describe what an LLM call needs:
agent — multi-step tool use, premium models (Aero's loop)
analyze — single-shot synthesis, mid-tier models (explain-this,
content brief generation)
classify — short structured judgments, cheapest fast models
(semantic page-coverage match, domain classification)
Each provider declares its model per tier in `PROVIDER_MODELS`:
claude: agent='claude-opus-4-7' analyze='claude-sonnet-4-6' classify='claude-haiku-4-5'
openai: agent='gpt-5.1' analyze='gpt-5-mini' classify='gpt-5-nano'
gemini: agent='gemini-2.5-flash' analyze='gemini-2.5-flash' classify='gemini-2.5-flash-lite'
zai: agent='glm-5.1' analyze='glm-5-turbo' classify='glm-5-turbo'
New primitive `resolveModelForCapability(provider, capability, override?)`:
features declare their tier; the project's configured provider decides
which actual model fills it. Existing `resolveModelForProvider(provider)`
is preserved as a thin wrapper that delegates to the `agent` capability
— backward-compatible, behavior-preserving for every existing caller.
## Single source of truth + drift detection
`AGENT_PROVIDERS[id].defaultModel` now sources its value from
`PROVIDER_MODELS[id].agent` at module-load time (keeps the DTO + CLI
display rendering exactly the same thing they did before).
`validateAgentProviderRegistry()` runs at module load AND in tests:
1. Every (provider, capability) pair MUST resolve to a real pi-ai
model — catches model-id typos and pi-ai catalog drift at import
time, not at first request.
2. Every provider MUST declare every capability — adding a new
capability to `LlmCapabilities` without updating all providers
fails CI loudly.
3. `AGENT_PROVIDERS[id].defaultModel === PROVIDER_MODELS[id].agent`
— drift between the two surfaces would silently mislead the
dashboard / CLI provider picker.
## What's user-visible
Nothing. Aero continues to use Opus / GPT-5.1 / Gemini Flash / GLM 5.1
exactly as before (the `agent` capability tier IS the historical
default). DTO shape unchanged. CLI display unchanged. No config
migration needed.
The abstraction's only consumer today is the existing
`resolveModelForProvider` wrapper; the next PR (Phase 1 "explain this"
feature) is the first feature to use `analyze` directly.
## Tests
10 new tests in `agent-providers.test.ts`:
- Every provider declares every capability tier
- `AGENT_PROVIDERS.defaultModel` mirrors `PROVIDER_MODELS[id].agent`
- Every (provider, capability) pair resolves in pi-ai
- `validateAgentProviderRegistry` catches all three drift modes
- `resolveModelForCapability` returns the correct model per tier
- Caller-supplied model override works (per-call escape hatch)
- Throws on unknown model id
- `resolveModelForProvider` is behavior-preserving wrapper
Verification:
- `pnpm -r typecheck`: 0 errors
- `pnpm -r lint`: 0 errors
- 2764/2764 tests pass
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
arberx
added a commit
that referenced
this pull request
May 18, 2026
…586) Adds an on-demand LLM rationale for each content recommendation card, cached per (project, target_ref, prompt_version) so repeat clicks are free. The heuristic classifier still produces the structured recommendation; this layer explains the reasoning and suggests concrete next steps in natural language. Backend: - New `recommendation_explanations` table + migration v62 (idx on project_id + targetRef + promptVersion unique). - `RecommendationExplanationDto` + `RecommendationExplainRequest` schemas in @ainyc/canonry-contracts. - `GET /projects/:name/content/recommendations/:targetRef/analysis` cache-only read (404 when no cached explanation exists). - `POST /projects/:name/content/recommendations/:targetRef/analyze` returns cached row or invokes injected `ExplainContentRecommendationFn`; supports `provider` / `model` / `forceRefresh` overrides. LLM wiring (api-routes stays LLM-agnostic): - `ExplainContentRecommendationFn` is dependency-injected via `ApiRoutesOptions.explainContentRecommendation`. Missing implementation → 503 PROVIDER_ERROR with operator-friendly message. - `packages/canonry` ships the pi-ai implementation (`createRecommendationExplainer`) using the new `analyze` capability tier from PR #585 — Claude → sonnet-4-6, OpenAI → gpt-5-mini, Gemini → 2.5-flash, Zai → glm-5-turbo. - Provider selection mirrors Aero: caller override → first-configured by priority → 502 PROVIDER_ERROR. Unknown override → 400. - pi-ai dollar cost converted to `costMillicents` (×100,000, rounded int) so totals stay drift-free. Tests: 29 new (route handlers + helper unit tests with mocked `complete()` — provider selection, override validation, cache hit/miss, forceRefresh overwrite, cost conversion, empty-response guard). Closes phase 1 of the LLM-augmented recommendation engine. Frontend "Why this?" panel ships in a follow-up PR. Co-authored-by: Claw (AINYC Agent) <arberx@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Introduces a small abstraction over Aero's provider registry so future LLM features pick an appropriately-tiered model without hand-coding per-feature model dictionaries. No user-visible change — Aero continues to use Opus/GPT-5.1/Flash/GLM-5.1 exactly as before. The abstraction's only consumer today is the existing `resolveModelForProvider` wrapper; the next PR (Phase 1 "explain this" feature) is the first feature to use `analyze` directly.
What
Three capability tiers describe what an LLM call needs:
New primitive `resolveModelForCapability(provider, capability, override?)`: features declare their tier; the project's configured provider decides which actual model fills it. Existing `resolveModelForProvider(provider)` is preserved as a thin wrapper that delegates to the `agent` capability — backward-compatible, behavior-preserving for every existing caller.
Single source of truth + drift detection
`AGENT_PROVIDERS[id].defaultModel` now sources its value from `PROVIDER_MODELS[id].agent` at module-load time (keeps the DTO + CLI display rendering exactly the same thing they did before). `validateAgentProviderRegistry()` runs at module load AND in tests:
Tests
10 new tests in `agent-providers.test.ts`. Full suite: 2764/2764 pass.
Stacks
Phase 1 "explain this recommendation" feature lands next, using the `analyze` capability.
🤖 Generated with Claude Code