Skip to content

feat(agent): LLM capability tiers — per-provider model selection per task#585

Merged
arberx merged 1 commit into
mainfrom
feat/llm-capability-tiers
May 18, 2026
Merged

feat(agent): LLM capability tiers — per-provider model selection per task#585
arberx merged 1 commit into
mainfrom
feat/llm-capability-tiers

Conversation

@arberx
Copy link
Copy Markdown
Member

@arberx arberx commented May 18, 2026

Summary

Introduces a small abstraction over Aero's provider registry so future LLM features pick an appropriately-tiered model without hand-coding per-feature model dictionaries. No user-visible change — Aero continues to use Opus/GPT-5.1/Flash/GLM-5.1 exactly as before. The abstraction's only consumer today is the existing `resolveModelForProvider` wrapper; the next PR (Phase 1 "explain this" feature) is the first feature to use `analyze` directly.

What

Three capability tiers describe what an LLM call needs:

Tier Use Default models per provider
`agent` Multi-step tool use, premium models (Aero's loop) claude-opus-4-7, gpt-5.1, gemini-2.5-flash, glm-5.1
`analyze` Single-shot synthesis, mid-tier (explain-this, content briefs) claude-sonnet-4-6, gpt-5-mini, gemini-2.5-flash, glm-5-turbo
`classify` Short structured judgments, cheapest (page-coverage, classification) claude-haiku-4-5, gpt-5-nano, gemini-2.5-flash-lite, glm-5-turbo

New primitive `resolveModelForCapability(provider, capability, override?)`: features declare their tier; the project's configured provider decides which actual model fills it. Existing `resolveModelForProvider(provider)` is preserved as a thin wrapper that delegates to the `agent` capability — backward-compatible, behavior-preserving for every existing caller.

Single source of truth + drift detection

`AGENT_PROVIDERS[id].defaultModel` now sources its value from `PROVIDER_MODELS[id].agent` at module-load time (keeps the DTO + CLI display rendering exactly the same thing they did before). `validateAgentProviderRegistry()` runs at module load AND in tests:

  1. Every (provider, capability) pair MUST resolve to a real pi-ai model — catches model-id typos and pi-ai catalog drift at import time, not at first request.
  2. Every provider MUST declare every capability — adding a new capability to `LlmCapabilities` without updating all providers fails CI loudly.
  3. `AGENT_PROVIDERS[id].defaultModel === PROVIDER_MODELS[id].agent` — drift between the two surfaces would silently mislead the dashboard / CLI provider picker.

Tests

10 new tests in `agent-providers.test.ts`. Full suite: 2764/2764 pass.

Stacks

Phase 1 "explain this recommendation" feature lands next, using the `analyze` capability.

🤖 Generated with Claude Code

…task

Introduces a small abstraction over Aero's provider registry so future
LLM features pick an appropriately-tiered model without hand-coding
per-feature model dictionaries.

## Why

Aero already has provider selection (Claude/OpenAI/Gemini/Zai with API
key auto-detection); each provider has a single `defaultModel` tuned
for Aero's multi-step agent loop. The upcoming "explain this
recommendation" feature needs the same provider selection but a CHEAPER
model — Claude Opus is overkill for a single-shot 4-paragraph
summary. Without an abstraction the obvious move is to hard-code an
`EXPLAIN_MODELS` dictionary, then a `BRIEF_MODELS`, then a
`CLASSIFY_MODELS` — N feature × M provider dictionaries scattered
across the codebase, each one a separate place to remember when models
get bumped.

## What

Three capability tiers describe what an LLM call needs:

  agent    — multi-step tool use, premium models (Aero's loop)
  analyze  — single-shot synthesis, mid-tier models (explain-this,
             content brief generation)
  classify — short structured judgments, cheapest fast models
             (semantic page-coverage match, domain classification)

Each provider declares its model per tier in `PROVIDER_MODELS`:

  claude:  agent='claude-opus-4-7'  analyze='claude-sonnet-4-6'  classify='claude-haiku-4-5'
  openai:  agent='gpt-5.1'           analyze='gpt-5-mini'         classify='gpt-5-nano'
  gemini:  agent='gemini-2.5-flash'  analyze='gemini-2.5-flash'   classify='gemini-2.5-flash-lite'
  zai:     agent='glm-5.1'           analyze='glm-5-turbo'        classify='glm-5-turbo'

New primitive `resolveModelForCapability(provider, capability, override?)`:
features declare their tier; the project's configured provider decides
which actual model fills it. Existing `resolveModelForProvider(provider)`
is preserved as a thin wrapper that delegates to the `agent` capability
— backward-compatible, behavior-preserving for every existing caller.

## Single source of truth + drift detection

`AGENT_PROVIDERS[id].defaultModel` now sources its value from
`PROVIDER_MODELS[id].agent` at module-load time (keeps the DTO + CLI
display rendering exactly the same thing they did before).
`validateAgentProviderRegistry()` runs at module load AND in tests:

  1. Every (provider, capability) pair MUST resolve to a real pi-ai
     model — catches model-id typos and pi-ai catalog drift at import
     time, not at first request.
  2. Every provider MUST declare every capability — adding a new
     capability to `LlmCapabilities` without updating all providers
     fails CI loudly.
  3. `AGENT_PROVIDERS[id].defaultModel === PROVIDER_MODELS[id].agent`
     — drift between the two surfaces would silently mislead the
     dashboard / CLI provider picker.

## What's user-visible

Nothing. Aero continues to use Opus / GPT-5.1 / Gemini Flash / GLM 5.1
exactly as before (the `agent` capability tier IS the historical
default). DTO shape unchanged. CLI display unchanged. No config
migration needed.

The abstraction's only consumer today is the existing
`resolveModelForProvider` wrapper; the next PR (Phase 1 "explain this"
feature) is the first feature to use `analyze` directly.

## Tests

10 new tests in `agent-providers.test.ts`:
  - Every provider declares every capability tier
  - `AGENT_PROVIDERS.defaultModel` mirrors `PROVIDER_MODELS[id].agent`
  - Every (provider, capability) pair resolves in pi-ai
  - `validateAgentProviderRegistry` catches all three drift modes
  - `resolveModelForCapability` returns the correct model per tier
  - Caller-supplied model override works (per-call escape hatch)
  - Throws on unknown model id
  - `resolveModelForProvider` is behavior-preserving wrapper

Verification:
  - `pnpm -r typecheck`: 0 errors
  - `pnpm -r lint`: 0 errors
  - 2764/2764 tests pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@arberx arberx merged commit 78d6d98 into main May 18, 2026
9 checks passed
@arberx arberx deleted the feat/llm-capability-tiers branch May 18, 2026 01:11
arberx added a commit that referenced this pull request May 18, 2026
…586)

Adds an on-demand LLM rationale for each content recommendation card, cached
per (project, target_ref, prompt_version) so repeat clicks are free. The
heuristic classifier still produces the structured recommendation; this layer
explains the reasoning and suggests concrete next steps in natural language.

Backend:
- New `recommendation_explanations` table + migration v62 (idx on
  project_id + targetRef + promptVersion unique).
- `RecommendationExplanationDto` + `RecommendationExplainRequest` schemas
  in @ainyc/canonry-contracts.
- `GET /projects/:name/content/recommendations/:targetRef/analysis`
  cache-only read (404 when no cached explanation exists).
- `POST /projects/:name/content/recommendations/:targetRef/analyze`
  returns cached row or invokes injected `ExplainContentRecommendationFn`;
  supports `provider` / `model` / `forceRefresh` overrides.

LLM wiring (api-routes stays LLM-agnostic):
- `ExplainContentRecommendationFn` is dependency-injected via
  `ApiRoutesOptions.explainContentRecommendation`. Missing implementation
  → 503 PROVIDER_ERROR with operator-friendly message.
- `packages/canonry` ships the pi-ai implementation
  (`createRecommendationExplainer`) using the new `analyze` capability
  tier from PR #585 — Claude → sonnet-4-6, OpenAI → gpt-5-mini,
  Gemini → 2.5-flash, Zai → glm-5-turbo.
- Provider selection mirrors Aero: caller override → first-configured
  by priority → 502 PROVIDER_ERROR. Unknown override → 400.
- pi-ai dollar cost converted to `costMillicents` (×100,000, rounded
  int) so totals stay drift-free.

Tests: 29 new (route handlers + helper unit tests with mocked
`complete()` — provider selection, override validation, cache hit/miss,
forceRefresh overwrite, cost conversion, empty-response guard).

Closes phase 1 of the LLM-augmented recommendation engine. Frontend
"Why this?" panel ships in a follow-up PR.

Co-authored-by: Claw (AINYC Agent) <arberx@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant