Feral v0.1.4

github-actions released this 05 Jun 23:13

c9814ee

Agent mode

Native Feral Agent runtime. Agents now run on a built-in Feral Agent sidecar (Bun/TS) wired directly into the chat stream — no external gateway process. A Chat/Agent toggle in the composer switches modes and auto-loads the selected local model into the agent engine.
DeepResearch & adaptive reasoning. Dynamic max-iteration budgets for deep-research and complex tasks, model-fitness scoring, error-correcting control loop, and persistent agent memory.
Sturdier tool calls. Parser now handles Gemma-style <tool_call>, bracket and bare-JSON formats, and the arguments key; adds silent tool calls and an empty-response fallback; raises token budgets for thinking models.

Chat & UI

Live context ring. Real token usage straight from the model — exact prompt tokens from llama.cpp locally, real usage stats from cloud providers — with a hover popover showing tokens used, free space and message count, and the model's true context window instead of an estimate.
Streaming polish. Words fade in one-by-one as tokens stream, and a phase indicator shows Thinking / Calling tool / Processing.
Thinking blocks. Support for multiple formats (<think>, <thinking>, <|channel>), a thinking timer, and blocks that now persist across chat and tab switches.
Response resilience. Partial responses are always persisted, and responses survive navigating away and back, tab switches, and hot-swapping the active model.

Stability & performance

Fix a GGML_ASSERT crash on long agent prompts by chunking the prefill batch.
Memoize message rendering so streaming no longer re-parses already-completed messages.
Warning-free cargo clippy on the inference build, repo hygiene (.gitattributes, corrected .gitignore), and unist-util-visit pinned as a direct dependency.

Assets 7