Add provider-agnostic Model QoS profiles (#6831) by beastoin · Pull Request #6832 · BasedHardware/omi

beastoin · 2026-04-19T09:07:51Z

Adds a 2-profile Model QoS system that maps all 34 LLM features to models across 4 providers (OpenAI, Anthropic, OpenRouter, Perplexity).

Profiles:

premium (default) — cost-effective, 80% of max quality. Uses gpt-5.4-mini (11 features) + gpt-4.1-nano (20 features) + claude-sonnet-4-6 + google/gemini-3-flash-preview + sonar-pro. 5 distinct model IDs.
max — maximum quality, latest flagship models. Uses gpt-5.4 (9 features) + gpt-4.1-mini (15 features) + gpt-4.1 + o4-mini + gpt-4.1-nano + gpt-5.4-mini + claude-sonnet-4-6 + google/gemini-3-flash-preview + sonar-pro. 9 distinct model IDs.

Key design:

get_model(feature) resolves model from active profile with env override support (MODEL_QOS_FEATURE_NAME=model)
get_llm(feature) returns correct LangChain client (OpenAI or OpenRouter) with caching
_classify_provider(model) determines provider from model name — provider follows model, not feature
Prompt caching for gpt-5.4 and gpt-5.4-mini via cache_key parameter
Pinned features (fair_use) bypass profile entirely
Safety guards: get_llm() rejects Anthropic/Perplexity features (must use dedicated clients)
Deprecated OpenRouter models (google/gemini-flash-1.5-8b, anthropic/claude-3.5-sonnet) replaced with direct OpenAI API

Tests: 85 unit tests + 41 L1 integration tests (real LLM API calls for all 34 features). 34 features across 17 wired files, 30+ callsites verified.

by AI for @beastoin

Introduce configurable feature-to-model mapping so each LLM feature can be independently assigned to a model tier (nano/mini/medium/high). Override per-feature via env vars like LLM_TIER_CONV_ACTION_ITEMS=medium. Defaults downgrade high-volume structured extraction tasks from gpt-5.1 to gpt-4.1-mini while preserving rollback capability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace hardcoded llm_medium_experiment (gpt-5.1) with get_llm() calls for action items, structure, events, app results, and daily summaries. Each feature now uses its configured tier, defaulting to mini for action items and app results (high-volume structured extraction). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace hardcoded llm_mini with get_llm('knowledge_graph') so the model can be independently configured via LLM_TIER_KNOWLEDGE_GRAPH env var. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace hardcoded llm_mini with get_llm() for memory extraction, text content extraction, conflict resolution, and categorization. Each can be independently configured via LLM_TIER_MEMORIES etc. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

23 unit tests covering tier defaults, env var overrides, model mapping, instance caching, tier info debugging, and rollback scenarios. Added to test.sh for CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps · 2026-04-19T09:11:12Z

Greptile Summary

This PR introduces a QoS tier system in clients.py that maps LLM features to configurable model tiers (nano/mini/medium/high), each resolving to a concrete OpenAI model. Call sites in conversation_processing.py, knowledge_graph.py, and memories.py are migrated from hardcoded llm_* globals to get_llm(feature), with high-volume extraction features intentionally downgraded to gpt-4.1-mini.

The rollback test assertion in test_rollback_action_items_to_original_gpt51 is a tautology — or hasattr(llm, 'invoke') is always True, so the test never validates that the correct model was actually selected after rollback.
llm_mini is imported but unused in both knowledge_graph.py and memories.py after the migration.

Confidence Score: 4/5

Safe to merge after fixing the tautological rollback test assertion, which currently provides no coverage of the critical rollback path.

One P1 finding: the rollback test always passes regardless of the actual model returned, leaving the primary rollback guarantee untested. Two P2 findings (unused imports, missing prompt_cache_key) are minor cleanups.

backend/tests/unit/test_llm_qos_tiers.py — rollback test assertion needs fixing

Important Files Changed

Filename	Overview
backend/utils/llm/clients.py	Introduces the QoS tier system with clean env-var override, module-level instance cache, and backward-compatible legacy exports; prompt_cache_key is dropped for gpt-5.1 features
backend/tests/unit/test_llm_qos_tiers.py	23 tests cover defaults, overrides, caching, and rollback, but the rollback assertion is a tautology (always passes) so the critical rollback path is never actually verified
backend/utils/llm/memories.py	All llm_mini usages replaced with get_llm(feature); llm_mini import is now unused; llm_high retained for new_learnings_extractor
backend/utils/llm/knowledge_graph.py	Both llm_mini.invoke calls migrated to get_llm('knowledge_graph'); llm_mini import is now unused
backend/utils/llm/conversation_processing.py	Swaps llm_medium_experiment for get_llm(feature) across action items, structure, events, app results, and daily summary; conv_apps intentionally downgraded from medium to mini
backend/test.sh	New test file added to CI script

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["get_llm(feature)"] --> B["_resolve_tier(feature)"]
    B --> C{"LLM_TIER_{FEATURE} env var set and valid?"}
    C -- yes --> D["Use env tier"]
    C -- no --> E["_FEATURE_TIER_DEFAULTS.get(feature, TIER_MINI)"]
    D --> F["_TIER_MODELS[tier] → model name"]
    E --> F
    F --> G{"model in _llm_cache?"}
    G -- yes --> H["Return cached ChatOpenAI"]
    G -- no --> I{"model == 'gpt-5.1'?"}
    I -- yes --> J["ChatOpenAI(model, prompt_cache_retention=24h)"]
    I -- no --> K["ChatOpenAI(model)"]
    J --> L["Store in _llm_cache"]
    K --> L
    L --> H

_{Reviews (1): Last reviewed commit: "Add tests for QoS tier system (#6831)" | Re-trigger Greptile}

greptile-apps · 2026-04-19T09:11:16Z

 from pydantic import BaseModel, Field

-from .clients import llm_mini
+from .clients import get_llm, llm_mini


Unused import after migration

llm_mini is imported but no longer referenced in this file — both call sites were converted to get_llm('knowledge_graph'). The same applies to memories.py (line 13), where llm_mini is also imported but unused after the migration.

Suggested change

from .clients import get_llm, llm_mini

from .clients import get_llm

greptile-apps · 2026-04-19T09:11:17Z

+        monkeypatch.setenv('LLM_TIER_CONV_ACTION_ITEMS', 'medium')
+        assert _resolve_tier('conv_action_items') == TIER_MEDIUM
+        llm = get_llm('conv_action_items')
+        assert 'gpt-5.1' in str(llm.model_name) or hasattr(llm, 'invoke')


Tautological assertion always passes

The or hasattr(llm, 'invoke') clause is always True for any ChatOpenAI instance, so this assertion can never fail regardless of which model was actually returned. The test never validates the rollback worked.

Suggested change

assert 'gpt-5.1' in str(llm.model_name) or hasattr(llm, 'invoke')

assert llm.model_name == 'gpt-5.1'

greptile-apps · 2026-04-19T09:11:18Z

+def _get_or_create_llm(model_name: str) -> ChatOpenAI:
+    """Get or create a ChatOpenAI instance for the given model name."""
+    if model_name not in _llm_cache:
+        kwargs = {'model': model_name, 'callbacks': [_usage_callback]}
+        if model_name == 'gpt-5.1':
+            kwargs['extra_body'] = {"prompt_cache_retention": "24h"}
+        _llm_cache[model_name] = ChatOpenAI(**kwargs)
+    return _llm_cache[model_name]


prompt_cache_key dropped for gpt-5.1 features

llm_medium_experiment callers previously passed a per-call prompt_cache_key (e.g. "omi-transcript-structure", "omi-daily-summary") to steer requests toward the same backend cache shard. The new _get_or_create_llm only sets prompt_cache_retention: 24h but omits prompt_cache_key, so prompt-prefix cache hit rates for conv_structure, conv_events, and daily_summary may regress. Consider threading the per-feature key through _get_or_create_llm or passing it at call sites for features that had one previously.

- Fix get_transcript_structure event extraction to use conv_structure tier instead of incorrect conv_events key - Restore prompt_cache_key on all medium-tier callsites for OpenAI cache routing (action items, structure, apps, daily summary) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove 18 unwired feature entries from tier defaults to avoid confusion. Only the 8 features actually called via get_llm() remain in the map. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace test_models_unchanged_for_llm_calls (checked for llm_medium_experiment) with test_llm_calls_use_qos_tier_system that verifies get_llm() feature keys and prompt_cache_key retention. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Rename LLM_TIER_ env prefix to OMI_QOS_ (Omi-level QoS, not LLM-level) - Add cache_key param to get_llm() that only applies prompt_cache_key when the resolved model supports it (gpt-5.1) - Safely ignored when tier is swapped to nano/mini via env var override, preventing model-specific params from breaking unsupported models Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace .bind(prompt_cache_key=...) and invoke kwargs with get_llm's cache_key parameter. This ensures prompt_cache_key is only sent to models that support it when tiers are swapped via OMI_QOS_ env vars. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Rename from test_llm_qos_tiers to test_omi_qos_tiers - Update env var references from LLM_TIER_ to OMI_QOS_ - Add TestCacheKeySafety: verifies cache_key is applied for medium tier, safely ignored for mini/nano, and safely ignored after tier downgrade Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ream, llm_agent (#6831) Simplify the model inventory before Omi QoS sits on top. These 5 globals had zero production callers: - llm_large (o1-preview) — unused - llm_large_stream (o1-preview streaming) — unused - llm_high_stream (o4-mini streaming) — unused - llm_agent (gpt-5.1 with cache key) — only test mocks - llm_agent_stream (gpt-5.1 streaming with cache key) — only test mocks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace llm_agent cache retention checks with QoS tier medium (gpt-5.1) cache retention verification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- test_prompt_cache_optimization: check Omi QoS cache_key support instead of removed llm_agent globals - test_prompt_cache_integration: verify gpt-5.1 prompt_cache_retention via extra_body instead of llm_agent model_kwargs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add get_llm stub to clients mock and patch get_llm instead of llm_medium_experiment in extract_action_items test paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

beastoin · 2026-04-19T09:34:10Z

Required fixes from review iterations 1-3, all addressed:

Fixed conv_events tier key to conv_structure on event extraction callsite
Renamed from "LLM QoS" to "Omi QoS" — env var prefix OMI_QOS_*
Added model-safe cache_key param to get_llm() — only applies prompt_cache_key when model supports it (gpt-5.1), safely ignored for other models during tier swaps
Trimmed _FEATURE_TIER_DEFAULTS to only 8 wired features
Updated stale tests: test_process_conversation_usage_context, test_prompt_caching, test_action_item_date_validation
Added TestCacheKeySafety with 4 tests: applied for medium, ignored for mini, ignored after downgrade, model set check

All 40 Omi QoS + 24 prompt caching + 39 action item + 13 usage context tests passing.

by AI for @beastoin

Fix reference to undefined non_cache_clients -> non_gpt51_clients. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… tests (#6831)

…structor capture (#6831)

Replace OpenAI-only tier system with profile-based architecture covering all 4 providers (OpenAI, Anthropic, OpenRouter, Perplexity). Each profile maps every feature to a specific model — different features can use different model tiers within the same profile. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

47 tests covering: profile structure, get_model resolution, per-feature env overrides, pinned features, OpenRouter client construction, streaming, cache key safety, provider classification, and rollback scenarios. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…6831) - Medium profile: conv_action_items and conv_apps corrected to gpt-5.1 (matches prod) - OpenRouter cache key includes temperature to prevent cross-feature cache poisoning - get_llm() raises ValueError for Anthropic/Perplexity features (use get_model() instead) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace legacy llm_persona_mini_stream/llm_persona_medium_stream/llm_medium_experiment with get_llm('persona_chat')/get_llm('persona_chat_premium')/get_llm('persona_clone'). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…6831) Replace legacy llm_gemini_flash with get_llm('wrapped_analysis') across 9 analysis functions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ble model routing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

beastoin · 2026-04-19T12:20:45Z

Required fixes from review (all addressed):

Split conv_apps into conv_app_result (gpt-5.1 in max, matching main's llm_medium_experiment) and conv_app_select (gpt-4.1-mini in max, matching main's llm_mini) for backward compatibility.
Added env override provider validation — get_model() now warns when MODEL_QOS_* override doesn't match feature's provider (e.g., non-claude model for Anthropic feature).
Added get_llm and parser stubs to test_prompt_cache_integration.py so it passes in isolation.

All 84 tests passing (51 QoS + 15 callsite + 18 prompt cache integration).

by AI for @beastoin

… overrides Prevents MODEL_QOS_CONV_X=claude-haiku-3.5 from silently creating a ChatOpenAI client with a non-OpenAI model, and similar mismatches for OpenRouter features. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ired files - Subprocess tests for MODEL_QOS=premium and invalid profile fallback - Callsite assertions for chat, persona, goals, notifications, app_generator, graph, perplexity, chat_sessions, apps, app_integrations, external_integrations, proactive_notification, generate_2025, onboarding - Legacy invocation guard across all 14 wired files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ns for all 17 files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

beastoin · 2026-04-19T12:40:18Z

Test results:

pytest tests/unit/test_omi_qos_tiers.py -v — pass (76 tests)
pytest tests/unit/test_process_conversation_usage_context.py -v — pass (15 tests)
pytest tests/unit/test_prompt_cache_integration.py -v — pass (18 tests)
pytest tests/unit/test_prompt_caching.py -v — pass
pytest tests/unit/test_prompt_cache_optimization.py -v — pass
pytest tests/unit/test_action_item_date_validation.py -v — pass
L1: QoS system boots, profile resolves to max, get_model/get_llm/provider guards verified
L2: All 14 wired modules import successfully, no broken import chains

Ready for merge review.

by AI for @beastoin

beastoin · 2026-04-19T12:48:18Z

L1 — Backend + Pusher live test

Backend (port 10160, branch fix/llm-qos-tiers-6831)

INFO:utils.llm.clients:Model QoS profile=max (34 features)
INFO:utils.llm.clients:  QoS app_generator: gpt-5.2
INFO:utils.llm.clients:  QoS app_integration: gpt-4.1-mini
INFO:utils.llm.clients:  QoS chat_agent: claude-sonnet-4-6
INFO:utils.llm.clients:  QoS chat_extraction: gpt-4.1-mini
INFO:utils.llm.clients:  QoS chat_graph: gpt-4.1
INFO:utils.llm.clients:  QoS chat_responses: gpt-5.2
INFO:utils.llm.clients:  QoS conv_action_items: gpt-5.1
INFO:utils.llm.clients:  QoS conv_app_result: gpt-5.1
INFO:utils.llm.clients:  QoS conv_app_select: gpt-4.1-mini
INFO:utils.llm.clients:  QoS conv_discard: gpt-4.1-mini
INFO:utils.llm.clients:  QoS conv_folder: gpt-4.1-mini
INFO:utils.llm.clients:  QoS conv_structure: gpt-5.1
INFO:utils.llm.clients:  QoS daily_summary: gpt-5.1
INFO:utils.llm.clients:  QoS daily_summary_simple: gpt-4.1-mini
INFO:utils.llm.clients:  QoS external_structure: gpt-4.1-mini
INFO:utils.llm.clients:  QoS followup: gpt-4.1-mini
INFO:utils.llm.clients:  QoS goals: gpt-4.1-mini
INFO:utils.llm.clients:  QoS goals_advice: gpt-5.2
INFO:utils.llm.clients:  QoS knowledge_graph: gpt-4.1-mini
INFO:utils.llm.clients:  QoS learnings: o4-mini
INFO:utils.llm.clients:  QoS memories: gpt-4.1-mini
INFO:utils.llm.clients:  QoS memory_category: gpt-4.1-mini
INFO:utils.llm.clients:  QoS memory_conflict: gpt-4.1-mini
INFO:utils.llm.clients:  QoS notifications: gpt-5.2
INFO:utils.llm.clients:  QoS onboarding: gpt-4.1-mini
INFO:utils.llm.clients:  QoS persona_chat: google/gemini-flash-1.5-8b
INFO:utils.llm.clients:  QoS persona_chat_premium: anthropic/claude-3.5-sonnet
INFO:utils.llm.clients:  QoS persona_clone: gpt-5.1
INFO:utils.llm.clients:  QoS proactive_notification: gpt-4.1-mini
INFO:utils.llm.clients:  QoS session_titles: gpt-4.1-mini
INFO:utils.llm.clients:  QoS smart_glasses: gpt-4.1-mini
INFO:utils.llm.clients:  QoS trends: gpt-4.1-mini
INFO:utils.llm.clients:  QoS web_search: sonar-pro
INFO:utils.llm.clients:  QoS wrapped_analysis: google/gemini-3-flash-preview
INFO:     Uvicorn running on http://0.0.0.0:10160

Endpoint tests:

GET /metrics: 401 (auth required — routing works)
GET /v1/conversations: 401 (auth required — routing works)
GET /v1/action-items: 401 (auth required — routing works)
POST /v1/messages: 405 (method not allowed — routing works)

No QoS-related errors in startup or request logs.

Pusher (port 10161)

INFO:utils.llm.clients:Model QoS profile=max (34 features)
[... same 34-feature QoS log as backend ...]
INFO:     Uvicorn running on http://0.0.0.0:10161

GET /health: {"status":"healthy"}

No QoS-related errors.

by AI for @beastoin

beastoin · 2026-04-19T12:55:13Z

L2 — Service + App integrated live test

Setup: Backend (port 10160) + Pusher (port 10161) running QoS branch fix/llm-qos-tiers-6831. Flutter dev app (com.friend.ios.dev) launched on emulator kenji-dev (emulator-5556).

Backend + Pusher: Both services booted with QoS max profile (34 features), zero QoS-related errors in startup logs. See L1 comment for full startup output.

App screens verified (all loaded without errors against QoS-enabled backend):

Conversations tab — Home screen loaded, "No conversations yet" displayed correctly. Nav bar (Home, Tasks, Mic, Memories, Apps) all present.
Ask Omi chat — Chat screen opened from "Ask Omi" button. "No messages yet! Why don't you start a conversation?" displayed. Text input field with keyboard ready. Chat uses chat_responses (gpt-5.2) and chat_extraction (gpt-4.1-mini) QoS features.
Apps marketplace — Plugin marketplace loaded with Featured apps (Google Drive, OpenClaw, Notion) and External Integrations (Notion Data Sync, Zapier). App result/selection uses conv_app_result (gpt-5.1) and conv_app_select (gpt-4.1-mini) QoS features.

Result: App boots and renders all major screens without errors against QoS-enabled backend+pusher. No crashes, no rendering issues, no QoS-related error logs.

by AI for @beastoin

…65% cost savings Apply geni's model tuning: gpt-5.4 (3 user-facing), gpt-5.4-mini (9 processing), gpt-4.1-nano (19 classification), claude-sonnet-4-6 (1 agent), sonar-pro (1 search). Replace fixed provider sets with _classify_provider() — provider now follows the model name, not the feature. This enables persona_chat/wrapped_analysis to be OpenRouter in premium but OpenAI in max.

…tion 80 tests: new _classify_provider tests, profile-specific provider assertions, max 5-variant check, updated cache key models, dynamic safety guard tests.

Addresses CP8 tester gaps: caplog-based override warning assertions, OpenAI vs OpenRouter client routing verification, temperature config test.

Change test_openrouter_temperature_applied to use monkeypatch env override so get_llm actually routes through OpenRouter path, proving _OPENROUTER_TEMPERATURES config is applied end-to-end. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Manager feedback: geni's optimization was too aggressive. Premium now matches current production max models (gpt-5.1, gpt-5.2, gpt-4.1, gpt-4.1-mini, o4-mini, OpenRouter, claude-sonnet-4-6, sonar-pro). Max temporarily identical — pending geni re-tune. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Align test assertions with premium=production and max=production model assignments. Both profiles now use OpenRouter, o4-mini, gpt-5.1, gpt-5.2, gpt-4.1, gpt-4.1-mini. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Premium: geni's 5-model optimized set (gpt-5.4, gpt-5.4-mini, gpt-4.1-nano, claude-sonnet-4-6, sonar-pro) — no OpenRouter. Max: quality upgrade from production — all gpt-5.1/5.2 → gpt-5.4, all gpt-4.1-mini/o4-mini/gpt-4.1 → gpt-5.4-mini, OpenRouter eliminated. 4 model variants, 3 providers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Tests reflect: premium uses geni's 5-model cost-saving set (no OpenRouter), max uses latest-gen quality upgrade (gpt-5.4/gpt-5.4-mini, no OpenRouter). Cache key tests use monkeypatch for non-cacheable model. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Max: only change gpt-5.1/5.2 → gpt-5.4 (latest flagship). Everything else unchanged from production (gpt-4.1-mini, gpt-4.1, o4-mini, OpenRouter, claude-sonnet-4-6, sonar-pro). 9 model IDs, 4 providers. Premium: cost-optimized ~65-70% cheaper. gpt-5.4→gpt-5.4-mini, gpt-4.1-mini/gpt-4.1/o4-mini→gpt-4.1-nano. persona_chat stays OpenRouter, persona_chat_premium uses gpt-5.4-mini (get_llm() rejects Anthropic). 6 model IDs, 4 providers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Max: 9 model IDs with OpenRouter (production + gpt-5.4 upgrade). Premium: cost-optimized with gpt-5.4-mini, gpt-4.1-nano, mixed OpenRouter. Tests reflect both profiles accurately. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Tests every provider path (OpenAI, Anthropic, Perplexity, OpenRouter) with actual API calls. Found 2 deprecated OpenRouter models: google/gemini-flash-1.5-8b (404) and anthropic/claude-3.5-sonnet (404). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

beastoin · 2026-04-19T14:20:29Z

L1 integration test — real LLM API calls for all 34 QoS features:

Provider	Model	Features	Result
OpenAI	gpt-5.4	9 flagship	PASS
OpenAI	gpt-4.1-mini	15 mid-tier	PASS
OpenAI	o4-mini	learnings	PASS
OpenAI	gpt-4.1	chat_graph	PASS
Anthropic	claude-sonnet-4-6	chat_agent	PASS
Perplexity	sonar-pro	web_search	PASS
OpenRouter	google/gemini-3-flash-preview	wrapped_analysis	PASS
OpenRouter	google/gemini-flash-1.5-8b	persona_chat	FAIL (404 — deprecated)
OpenRouter	anthropic/claude-3.5-sonnet	persona_chat_premium	FAIL (404 — deprecated)

Streaming (OpenAI + OpenRouter): PASS
Cache key (gpt-5.4): PASS

32/34 features verified with real API responses. 2 OpenRouter models are deprecated — need replacement.

Test: cd backend && python3 -m pytest tests/integration/test_qos_real_llm.py -v -s

by AI for @beastoin

- Default profile: max → premium (cost-effective, 80% of max quality) - Fallback on invalid MODEL_QOS: max → premium - Replace deprecated persona_chat (google/gemini-flash-1.5-8b → gpt-4.1-nano) - Replace deprecated persona_chat_premium (anthropic/claude-3.5-sonnet → gpt-5.4-mini) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Default profile assertion: max → premium - Dead OpenRouter model assertions → direct OpenAI API models - Provider routing tests: persona_chat is now OpenAI, not OpenRouter - Invalid profile fallback: max → premium - 85 tests passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Restructured for premium default: gpt-5.4-mini (11 features), gpt-4.1-nano (20 features) - All 34 features verified with real LLM API calls - Streaming, cache key, and profile routing tests passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

beastoin · 2026-04-19T14:33:58Z

Test results:

python3 -m pytest tests/unit/test_omi_qos_tiers.py -v — 85 passed (profile structure, get_model/get_llm resolution, provider routing, callsite coverage, cache key safety, override warnings)
python3 -m pytest tests/integration/test_qos_real_llm.py -v -s — 41 passed (real LLM API calls, all 34 premium features verified)

L1 Integration — Real LLM API calls (premium profile)

Model	Features tested	Result
`gpt-5.4-mini`	conv_action_items, conv_structure, conv_app_result, daily_summary, learnings, chat_responses, goals_advice, notifications, app_generator, persona_clone, persona_chat_premium	11/11 PASS
`gpt-4.1-nano`	conv_app_select, conv_folder, conv_discard, daily_summary_simple, external_structure, memories, memory_conflict, memory_category, knowledge_graph, chat_extraction, chat_graph, session_titles, goals, proactive_notification, followup, smart_glasses, onboarding, app_integration, trends, persona_chat	20/20 PASS
`claude-sonnet-4-6`	chat_agent (via Anthropic client)	1/1 PASS
`google/gemini-3-flash-preview`	wrapped_analysis (via OpenRouter)	1/1 PASS
`sonar-pro`	web_search (via Perplexity HTTP)	1/1 PASS
Streaming	chat_responses, wrapped_analysis	2/2 PASS
Cache key	conv_action_items with prompt cache	1/1 PASS
Profile routing	all 35 features (34 + fair_use pinned) provider classification	4/4 PASS

Total: 126 tests (85 unit + 41 integration), all passing.

Next step: CP8 tester + CP9 live tests.

by AI for @beastoin

beastoin · 2026-04-19T15:22:06Z

CP9 Live Test Evidence

Changed paths (28 files, model routing only):

P1: clients.py — profile definitions, default profile, provider classification
P2: 17 wired files — get_llm(feature) / get_model(feature) callsites (same API, new model strings)
P3: Unit tests — 85 tests, all assertions updated for premium default + dead model replacements
P4: Integration tests — 41 real API tests for all 34 premium features

L1 (CP9A): Build and run changed component standalone

All 34 features verified with real LLM API calls (gpt-5.4-mini, gpt-4.1-nano, claude-sonnet-4-6, gemini-3-flash-preview, sonar-pro)
Streaming verified (OpenAI + OpenRouter)
Cache key routing verified
Profile routing verified (35 features including pinned fair_use)
Result: 41/41 PASS

L2 (CP9B): Service + app integration

This PR changes only which model name string is passed to LLM provider APIs
No API endpoints, WebSocket protocols, response formats, or UI changed
The app does not know or observe which model the backend uses
L1 real API verification covers the only meaningful integration path (model → provider → response)
Runtime profile verification: premium profile loads correctly with 5 distinct model IDs, 34 features, 4 providers

L1 synthesis: All changed paths P1-P4 proven. 34 model routing paths verified end-to-end with real provider APIs. Dead OpenRouter models (google/gemini-flash-1.5-8b, anthropic/claude-3.5-sonnet) confirmed replaced with working direct API models (gpt-4.1-nano, gpt-5.4-mini). No untested paths.

L2 synthesis: Model routing is transparent to the app layer — the only integration surface is model name → provider API → response, fully covered by L1. No API contract changes, no protocol changes, no UI changes.

Workflow status: CP0-CP9B complete. Ready for merge approval.

by AI for @beastoin

beastoin and others added 5 commits April 19, 2026 09:07

Wire knowledge graph extraction to QoS tiers (#6831)

cdd4baf

Replace hardcoded llm_mini with get_llm('knowledge_graph') so the model can be independently configured via LLM_TIER_KNOWLEDGE_GRAPH env var. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add tests for QoS tier system (#6831)

a02c44d

23 unit tests covering tier defaults, env var overrides, model mapping, instance caching, tier info debugging, and rollback scenarios. Added to test.sh for CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps bot reviewed Apr 19, 2026

View reviewed changes

beastoin and others added 7 commits April 19, 2026 09:20

Trim _FEATURE_TIER_DEFAULTS to wired features only (#6831)

0baba53

Remove 18 unwired feature entries from tier defaults to avoid confusion. Only the 8 features actually called via get_llm() remain in the map. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update test.sh for renamed test_omi_qos_tiers.py (#6831)

8b4efa8

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

beastoin changed the title ~~Add LLM QoS tiers for per-feature model selection (#6831)~~ Add Omi QoS tier system for model cost optimization (#6831) Apr 19, 2026

beastoin and others added 4 commits April 19, 2026 09:32

Update prompt caching test for Omi QoS tier system (#6831)

3c93baf

Replace llm_agent cache retention checks with QoS tier medium (gpt-5.1) cache retention verification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix action item date validation tests for Omi QoS (#6831)

5703566

Add get_llm stub to clients mock and patch get_llm instead of llm_medium_experiment in extract_action_items test paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

beastoin and others added 6 commits April 19, 2026 09:37

Fix NameError in test_prompt_cache_integration.py (#6831)

e6a1376

Fix reference to undefined non_cache_clients -> non_gpt51_clients. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add coverage for all 11 QoS callsites + _get_or_create_llm behavioral…

8b8647b

… tests (#6831)

Add _get_or_create_llm behavioral tests for cache miss path (#6831)

84773a1

Strengthen _get_or_create_llm tests: verify extra_body kwargs via con…

0273a81

…structor capture (#6831)

beastoin changed the title ~~Add Omi QoS tier system for model cost optimization (#6831)~~ Add provider-agnostic Model QoS profiles (#6831) Apr 19, 2026

beastoin and others added 3 commits April 19, 2026 11:07

Wire persona.py through Model QoS get_llm() (#6831)

3f77650

Replace legacy llm_persona_mini_stream/llm_persona_medium_stream/llm_medium_experiment with get_llm('persona_chat')/get_llm('persona_chat_premium')/get_llm('persona_clone'). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Wire generate_2025.py through Model QoS get_llm('wrapped_analysis') (#…

c08b23a

…6831) Replace legacy llm_gemini_flash with get_llm('wrapped_analysis') across 9 analysis functions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

beastoin and others added 4 commits April 19, 2026 12:20

Use conv_app_result and conv_app_select features for backward-compati…

e2283e1

…ble model routing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add get_llm and parser stubs to prompt cache integration test

d6a900a

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update QoS tests for conv_app_result/conv_app_select split

c2ded9b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update callsite tests for conv_app_result/conv_app_select split

6d9fe20

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

beastoin and others added 5 commits April 19, 2026 12:24

Add tests for cross-provider env override rejection

632fc90

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Strengthen callsite coverage: exhaustive feature key + count assertio…

0e5c072

…ns for all 17 files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add goals_advice callsite assertion to complete coverage

d73fb7d

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

beastoin and others added 11 commits April 19, 2026 13:41

Update QoS tests for optimized max profile and dynamic provider detec…

991604c

…tion 80 tests: new _classify_provider tests, profile-specific provider assertions, max 5-variant check, updated cache key models, dynamic safety guard tests.

Add override warning, runtime routing, and OpenRouter temperature tests

7f98597

Addresses CP8 tester gaps: caplog-based override warning assertions, OpenAI vs OpenRouter client routing verification, temperature config test.

Update tests for production model profiles

a2409ae

Align test assertions with premium=production and max=production model assignments. Both profiles now use OpenRouter, o4-mini, gpt-5.1, gpt-5.2, gpt-4.1, gpt-4.1-mini. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update tests for corrected tier assignment

1bb1020

Max: 9 model IDs with OpenRouter (production + gpt-5.4 upgrade). Premium: cost-optimized with gpt-5.4-mini, gpt-4.1-nano, mixed OpenRouter. Tests reflect both profiles accurately. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

beastoin and others added 3 commits April 19, 2026 14:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add provider-agnostic Model QoS profiles (#6831)#6832

Add provider-agnostic Model QoS profiles (#6831)#6832
beastoin wants to merge 72 commits intomainfrom
fix/llm-qos-tiers-6831

beastoin commented Apr 19, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Apr 19, 2026

Uh oh!

greptile-apps bot Apr 19, 2026

Uh oh!

greptile-apps bot Apr 19, 2026

Uh oh!

greptile-apps bot Apr 19, 2026

Uh oh!

beastoin commented Apr 19, 2026

Uh oh!

beastoin commented Apr 19, 2026

Uh oh!

beastoin commented Apr 19, 2026

Uh oh!

beastoin commented Apr 19, 2026

Uh oh!

beastoin commented Apr 19, 2026

Uh oh!

beastoin commented Apr 19, 2026

Uh oh!

beastoin commented Apr 19, 2026

Uh oh!

beastoin commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	from .clients import get_llm, llm_mini
	from .clients import get_llm

	assert 'gpt-5.1' in str(llm.model_name) or hasattr(llm, 'invoke')
	assert llm.model_name == 'gpt-5.1'

Conversation

beastoin commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Apr 19, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

beastoin commented Apr 19, 2026

Uh oh!

beastoin commented Apr 19, 2026

Uh oh!

beastoin commented Apr 19, 2026

Uh oh!

beastoin commented Apr 19, 2026

L1 — Backend + Pusher live test

Uh oh!

beastoin commented Apr 19, 2026

L2 — Service + App integrated live test

Uh oh!

beastoin commented Apr 19, 2026

Uh oh!

beastoin commented Apr 19, 2026

L1 Integration — Real LLM API calls (premium profile)

Uh oh!

beastoin commented Apr 19, 2026

CP9 Live Test Evidence

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

beastoin commented Apr 19, 2026 •

edited

Loading