Add Omi QoS tier system for model cost optimization

Backend LLM costs span multiple providers (OpenAI, Anthropic, Google/Gemini, Perplexity) across 60+ callsites. There is no unified way to control model selection per-feature — models are hardcoded throughout the codebase, making cost optimization and A/B testing impossible without code changes.

### Current Behavior
- 60+ LLM callsites across 4 providers with hardcoded model instances
- OpenAI: 15+ features using `llm_mini`, `llm_medium`, `llm_medium_experiment` directly
- Anthropic: chat agent hardcoded to `claude-sonnet-4-6`
- OpenRouter: persona chat and wrapped analysis hardcoded to specific Gemini/Claude models
- Perplexity: web search hardcoded to `sonar-pro`
- No mechanism to downgrade/upgrade models per-feature without code changes
- No way to switch cost profiles (e.g., "run everything on cheapest acceptable models")

### Expected Behavior
A **provider-agnostic QoS profile system** where each profile (mini/medium/high) maps every feature to a specific model — potentially different model tiers within the same profile, since some features need more quality than others even in a cost-optimized profile.

### Solution

**QoS Profiles** — each profile is a complete feature→model mapping across all providers:

```
MODEL_QOS_MINI:
  conv_action_items:  gpt-4.1-nano       (cheapest, structured extraction)
  conv_structure:     gpt-4.1-mini       (needs more quality)
  chat_agent:         claude-haiku-3.5   (cost-optimized chat)
  persona_chat:       gemini-flash-1.5-8b
  ...

MODEL_QOS_MEDIUM:
  conv_action_items:  gpt-4.1-mini
  conv_structure:     gpt-5.1
  chat_agent:         claude-sonnet-4-6
  persona_chat:       claude-3.5-sonnet
  ...

MODEL_QOS_HIGH:
  conv_action_items:  gpt-5.1
  conv_structure:     o4-mini
  chat_agent:         claude-sonnet-4-6
  persona_chat:       gemini-3-flash-preview
  ...
```

**Global switch**: `MODEL_QOS=mini` selects entire profile
**Per-feature override**: `MODEL_QOS_CONV_STRUCTURE=gpt-5.1` overrides one feature

**21 features across 4 providers:**
- OpenAI (16): conv_action_items, conv_structure, conv_apps, daily_summary, memories, memory_conflict, memory_category, knowledge_graph, chat_responses, chat_extraction, session_titles, goals, notifications, followup, smart_glasses, onboarding
- Anthropic (1): chat_agent
- OpenRouter (3): persona_chat, persona_clone, wrapped_analysis
- Perplexity (1): web_search

**Pinned features**: fair_use classifier pinned to specific model regardless of profile (accuracy-critical).

### Affected Areas

| Area | Files | Callsites |
|------|-------|-----------|
| QoS core | `utils/llm/clients.py` | Profile definitions, `get_model()`, client factories |
| Conversation processing | `utils/llm/conversation_processing.py` | 5 callsites |
| Memories | `utils/llm/memories.py` | 4 callsites |
| Knowledge graph | `utils/llm/knowledge_graph.py` | 2 callsites |
| Chat | `utils/llm/chat.py` | 10+ callsites |
| Persona | `utils/llm/persona.py` | 5 callsites |
| Goals | `utils/llm/goals.py` | 3 callsites |
| Notifications | `utils/llm/notifications.py` | 2 callsites |
| Agentic chat | `utils/retrieval/agentic.py` | 1 callsite (Anthropic) |
| Wrapped | `utils/wrapped/generate_2025.py` | 9 callsites (Gemini) |
| Other | Various routers/utils | 10+ callsites |

### Impact
Unified cost control across all LLM providers. One env var (`MODEL_QOS=mini`) to switch the entire backend to cost-optimized models. Per-feature overrides for A/B testing. No user-facing changes.

---
_by AI for @beastoin_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Omi QoS tier system for model cost optimization #6831

Current Behavior

Expected Behavior

Solution

Affected Areas

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Area	Files	Callsites
QoS core	`utils/llm/clients.py`	Profile definitions, `get_model()`, client factories
Conversation processing	`utils/llm/conversation_processing.py`	5 callsites
Memories	`utils/llm/memories.py`	4 callsites
Knowledge graph	`utils/llm/knowledge_graph.py`	2 callsites
Chat	`utils/llm/chat.py`	10+ callsites
Persona	`utils/llm/persona.py`	5 callsites
Goals	`utils/llm/goals.py`	3 callsites
Notifications	`utils/llm/notifications.py`	2 callsites
Agentic chat	`utils/retrieval/agentic.py`	1 callsite (Anthropic)
Wrapped	`utils/wrapped/generate_2025.py`	9 callsites (Gemini)
Other	Various routers/utils	10+ callsites

Add Omi QoS tier system for model cost optimization #6831

Description

Current Behavior

Expected Behavior

Solution

Affected Areas

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions