You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Native Feral Agent runtime. Agents now run on a built-in Feral Agent sidecar (Bun/TS) wired directly into the chat stream — no external gateway process. A Chat/Agent toggle in the composer switches modes and auto-loads the selected local model into the agent engine.
DeepResearch & adaptive reasoning. Dynamic max-iteration budgets for deep-research and complex tasks, model-fitness scoring, error-correcting control loop, and persistent agent memory.
Sturdier tool calls. Parser now handles Gemma-style <tool_call>, bracket and bare-JSON formats, and the arguments key; adds silent tool calls and an empty-response fallback; raises token budgets for thinking models.
Chat & UI
Live context ring. Real token usage straight from the model — exact prompt tokens from llama.cpp locally, real usage stats from cloud providers — with a hover popover showing tokens used, free space and message count, and the model's true context window instead of an estimate.
Streaming polish. Words fade in one-by-one as tokens stream, and a phase indicator shows Thinking / Calling tool / Processing.
Thinking blocks. Support for multiple formats (<think>, <thinking>, <|channel>), a thinking timer, and blocks that now persist across chat and tab switches.
Response resilience. Partial responses are always persisted, and responses survive navigating away and back, tab switches, and hot-swapping the active model.
Stability & performance
Fix a GGML_ASSERT crash on long agent prompts by chunking the prefill batch.
Memoize message rendering so streaming no longer re-parses already-completed messages.
Warning-free cargo clippy on the inference build, repo hygiene (.gitattributes, corrected .gitignore), and unist-util-visit pinned as a direct dependency.