v0.11.1
[0.11.1] - 2026-06-09
Added
- read_file image vision: reading JPEG, PNG, GIF, or WebP files attaches them as tool-origin images so the LLM receives pixels on follow-up turns (same synthetic observation path as browser screenshots).
- Browser tool hardening: session-scoped tab isolation, interactive action approvals, secret-safe fill/screenshot output, bounded waits, graceful reconnect/shutdown, and ~80 mock-backend contract tests. New observation actions
get_console_logsandget_network_errorscapture per-tab console output and network load failures (secret- and origin-redacted). Ignored real-Chrome smoke test:cargo test --features browser browser_smoke_real_chrome -- --ignored --nocapture. - Vision / image understanding: user-uploaded images from Telegram, Slack, and Discord are lazily base64-encoded for the current turn and sent to vision-capable LLM providers (OpenAI-compatible passthrough, Anthropic/Gemini mappers). Structured
MessageAttachmentmetadata is persisted alongside the existing[File received: ...]text stub. Configurable via[files] vision_enabled,max_vision_image_mb, andvision_mime_types. Graceful fallback to text-only when vision is disabled, files are missing, or the provider rejects image payloads. - Native audio input: voice notes and audio file uploads are lazily encoded as OpenAI-style
input_audioblocks for audio-capable models (OpenAI audio models withmodalities: ["text"], GeminiinlineData). Configurable via[files] audio_enabled,max_audio_mb,audio_mime_types, andaudio_model_patterns. Graceful fallback to text stub + system hint when audio is disabled, the model is ineligible, or encoding fails. Multimodal token budgeting uses byte surrogates so large audio payloads do not explode context estimates. - Whisper STT fallback: when native audio is skipped, optionally transcribe inbound audio with local
whisper-cli(ffmpeg prep for OGG/Opus) and append[Transcription of <file>]: ...to the user message before the LLM turn. Configurable via[files.stt](enabled,cli_path,model_path,ffmpeg_path,language,max_audio_mb,timeout_secs). Opt-in (enabled = falseby default). Setup wizard probes whisper-cli/ffmpeg/model paths;manage_configenable_sttaction auto-writes[files.stt](optionalstt_cli_path,stt_model_path,stt_ffmpeg_path,stt_language). - Browser screenshot vision: browser
screenshotactions save PNGs to the shared inbox and attach them to tool results so the LLM always receives pixels (synthetic user observation message at render time). Tool-origin images stay vision-eligible in archived turns for follow-up questions; user uploads remain current-turn-only. - Harness eval instrumentation (Phase A): per-task effectiveness snapshot on
TaskEnd(HarnessEvalSnapshot) scoring routing accuracy, progress yield, contract fulfillment, and tier-weighted cost efficiency. Configurable via[diagnostics.harness_eval]; sub-agent metrics roll up into the parent task at spawn complete. - Harness eval offline suite (Phase B): YAML fixtures in
tests/harness_eval/fixtures/withcargo test --lib harness_evalregression runner (src/harness_eval/). - Harness eval analysis tooling (Phase C):
db_probe --eval-task,--eval-summary,--record-fixture;diagnoseaction includes Harness Effectiveness section when snapshot present. - Harness eval follow-ups: orchestration direct-return metrics recorded before
TaskEndfinalize;TurnStateshares the eval accumulator handle; post-exec validation failures and terminal approval denials roll into quality metrics;policy_metricsexposesharness_eval_tasks_totaland rollingharness_eval_overall_avg; CI runscargo test --lib harness_evalexplicitly; two new offline fixtures (internal_maintenance_direct_return,orchestrator_fallthrough_status) and stricter schedule direct-return expectations. - Harness eval phase wiring + fixture suite expansion: bootstrap direct-return shortcuts (stop/cancel, time query, etc.) now install and finalize
HarnessEvalSnapshot; message-build, tool-prelude, response-fallthrough, and stopping-phase signals (context drops, intent/evidence gates, budget extensions) roll into progress/routing payloads; YAML fixtures supportseed.goals,routing_models,stop_reason, andresponse_fallthrough; 15 offline fixtures cover cancel, deferred-no-tool, repetition/stall guards, and orchestration fallthrough paths.
Changed
- Browser SSRF policy (Task 8): tool-initiated navigations are validated pre-flight and the final committed URL is revalidated after load (blocking redirect-to-loopback). Per-request CDP subresource/XHR/WebSocket interception remains deferred — chromiumoxide 0.8 has no safe per-request continue/abort seam; see the Task 8 feasibility note in
src/tools/browser/tests.rs.
Fixed
- Harness eval telemetry on deferred/stall paths: post-tool deferred loops now record
stall_guard_fires; pre-tool deferred loops in the main completion guard recorddeferred_no_tool_events; false-capability-denial retries record stall guards. - Harness eval
contract_fulfilledalignment: boolean flag andContractFulfillmentPayload.fulfillednow use the same obligation checks ascontract_fulfillmentscoring (no longer true when mutation was expected but missing). - Harness eval progress yield for chat turns: clean 1–3 iteration conversational successes get modest non-zero progress credit instead of always scoring 0.00.
- Post-tool deferred recovery: first deferral after successful tools tries structured tool-output completion before another LLM iteration; memory-related false-capability-denial phrases (
don't have that in my records, etc.) trigger the same recovery path. - Richer
db_probe --eval-taskoutput: reports deferred-no-tool count, no-progress iterations, contract obligation fields, and validation/error flags. - Bootstrap stop/cancel missing harness eval snapshot: exact
stop/cancel/abortcommands handled in bootstrap returnedTaskEndwithoutharness_eval, breaking offline fixtures anddb_probe --eval-taskfor those paths.
Full Changelog: v0.11.0...v0.11.1