v0.9.0 stewardship integration by Hmbown · Pull Request #2762 · Hmbown/CodeWhale

Hmbown · 2026-06-05T02:10:39Z

Summary

Integration branch for v0.9.0 stewardship harvest and stabilization work.
Keeps release actions out of scope: no tag, publish, GitHub Release, or release artifact push is requested here.
Opens a PR check surface while focused harvest PRs and maintainer review continue.

Current verification

Co-author trailer audit passed after the latest ledger commit.
Focused harvest PR fix(tui): #2760 correct sessions resume footer #2761 passed gate and GitGuardian before merging into this branch.

Maintainer notes

Draft intentionally: not ready for final merge until remaining PR-harvest comments, stabilization/manual gates, contributor-credit review, and release notes are complete.

…metadata (#2682) - Add model_visible() hook to ToolSpec trait (default true) - Override model_visible() -> false on todo_write, todo_add, todo_update, todo_list - Checklist variants remain model-visible as the canonical surface - Legacy todo_* calls still work for saved transcript replay - Return _deprecation metadata with use_instead and removed_in=0.9.0 - Update prompts to recommend checklist_* only - Update TOOL_SURFACE.md with v0.9.0 deprecation notes - Add tests for hidden catalog, compat alias behavior, and metadata Verification: cargo test -p codewhale-tui -- todo, cargo clippy -D warnings

The token estimator walks the full session.messages and the active system prompt. Five call sites per turn in the engine (capacity pre/post tool checkpoints, error escalation, the seam manager, the trim budget check) plus four TUI/command consumers (footer, /status, /debug, context inspector) all re-walked the same data independently. On a 200-message history with 5 KB of tool results that is roughly 2 ms per call, or ~20 ms of pure waste on a single turn. Introduce a process-local TokenEstimateCache keyed on (session.messages_revision, system_prompt_fingerprint). Repeated calls with the same inputs return the cached value without re-walking the message list. The cache invalidates as soon as either input changes: * session.messages_revision is a monotonic counter bumped in Session::add_message, Session::replace_messages, the new Session::bump_messages_revision helper, and at every direct session.messages mutation site in core/engine.rs and core/engine/capacity_flow.rs. * system_prompt_fingerprint is a stable 64-bit hash of the SystemPrompt::Text or SystemPrompt::Blocks payload. Also restructures layered_context_checkpoint to compute the estimated token count before taking a long-lived &SeamManager borrow, and re-routes the capacity pre/post tool checkpoints to compute the observation into a local before calling capacity_controller.observe_*. Both refactors are required to satisfy the borrow checker once estimated_input_tokens requires &mut self. Tests: 10 new unit tests cover the miss/hit path, revision bumps, system-prompt changes, audit-ring capacity, and downward-revision no-ops. The full 157-test engine suite still passes.

PrefixFingerprint::compute is called once per turn by the turn loop prefix-stability check. The tool-side work serializes every tool to the chat-API JSON shape, sorts the resulting strings, joins with newlines, and SHA-256s the result. For a 60-tool catalog that is ~25-40 KB of allocation plus a sort, all of which produces a byte-identical output once the tool set is stable across turns (the common case after the first turn of a session). Introduce a process-local ToolCatalogCache that stores the joined+sorted catalog under a content-derived u64 identity (length + per-tool name + description + serialized input_schema). On a hit, the per-tool JSON serialization, sort, and join are skipped entirely — the pre-computed SHA-256 hex digest is returned directly. The cache lives on PrefixStabilityManager (per-session ownership) and backs a new PrefixFingerprint::compute_with_tool_cache entry point. check_and_update, PrefixStabilityManager::new, and pin() all use the cached path. The original compute() is kept as a fallback for callers that do not have a cache in hand (e.g. CLI tools that build a one-shot fingerprint). The cache is bounded (default capacity = 8) and uses insertion-order eviction, matching the eviction strategy already in transcript_cache.rs. invalidate() is exposed for tool-registry hot-reload and MCP attach paths. Tests: 8 new unit tests cover the miss/hit path (pointer-equal Arc on hit), identity collisions, schema change detection, capacity eviction, invalidate, empty slice, and the equivalence between cached and uncached fingerprints. The full 30-test prefix_cache suite passes; the wider prefix-cache contract tests in settings, prompts, and core::engine::tests continue to pass.

…with PrefixFingerprint::compute Three follow-ups to the previous perf commit: 1. Correctness: tool.strict participates in the wire format emitted by tool_to_api_json, so it MUST participate in the cache identity. Two catalogs that differ only in strict would otherwise collide and serve a stale SHA-256, silently busting prefix-cache stability on the wire. 2. Allocation: replace the per-tool serde_json::to_string in tool_set_identity with a hash_json_value helper that walks the JSON tree directly. For a 60-tool catalog this drops ~25-40 KB of transient allocation per cache miss. 3. Dead code: the previous patch introduced PrefixFingerprint::compute, CachedCatalog::joined, ToolCatalogCache::{invalidate,is_empty}, and a thread-local cache helper that were not used outside tests. With -D warnings in CI all four triggered dead-code errors. The compute helper is now only built in cfg(test); the rest are marked #[allow(dead_code)] with comments explaining their observability and test-only use.

… pass build_canonical_state previously did two independent reverse walks of session.messages — one to extract the most recent user goal, and one to collect up to four confirmed-fact snippets. apply_verify_and_replan then added a third and fourth reverse scan to locate the latest user message and the latest [verification replay] user message for the re-plan path. All four reverse scans collect disjoint facts about the same most- recent-first view of the conversation. This PR folds them into a single helper, scan_canonical_inputs, that walks messages once in reverse, fills a CanonicalStateScan, and short-circuits as soon as every collector is satisfied. The helper exposes the latest-message indices so apply_verify_and_replan can clone the full Message values after the scan (eliminating the two independent find().cloned() walks). The output CanonicalState is byte-identical to the prior implementation: same goal, same confirmed facts (newest first, errors filtered), same fallback string when no user text exists. The re-plan path's keep-messages set is identical: latest user + latest verified. Tests: 6 new unit tests cover the goal lookup, fact cap, error-result filter, verified-marker scan, empty input, and the early-exit condition. The full engine test suite (153 tests) still passes.

…-user lookup The build_canonical_state path never reads CanonicalStateScan::latest_verified_user_idx, but the previous patch required is_complete() to find a verified user message before it would short-circuit. On a long history with no verification replay — the common case — the scan walked the entire message list looking for a match that could not exist. Add a find_verified: bool parameter to scan_canonical_inputs and CanonicalStateScan::is_complete. build_canonical_state now passes false, so the loop stops as soon as the goal and CANONICAL_SCAN_MAX_FACTS facts are found. The replan path (apply_verify_and_replan) keeps the existing true behavior so it still locates the latest verified user message. Test calls are updated to match; no behavior change for any test.

output_rows (in tui::history) walks the raw tool output, ANSI-strips each line, classifies path/URL-like rows, and wraps the rest to the current viewport width. selected_output_indices then computes the head/tail/importance subset that the compact Live view shows. Both functions are pure, but they are called on every render frame for every visible tool cell. For a 4 KB tool output on a 120 FPS render loop that is 2-6 redundant walks per frame, per cell, and the function is called from a non-trivial number of cells across exec, tool, command, and review history. Add tui::output_rows_cache, a thread-local, content-addressed cache keyed on (content_hash, width) for the rows and (content_hash, width, line_limit) for the indices. The cache stores the wrapped Vec<OutputRow> plus a per-line-limit map of selected indices on a single entry, so a single key lookup satisfies both render steps. render_preserved_output_mode now consults the cache for both the rows and the indices; on a hit, neither the per-line ANSI strip nor the importance-ranking pass runs. The cache is bounded (default capacity 256) with insertion-order eviction. The OutputRow struct gains PartialEq + Eq + pub fields so the cache module can store and hash it without exposing private internals. Tests: 6 new unit tests cover the hit/miss path, width invalidation, content invalidation, indices per-line_limit caching, capacity eviction, and hash stability. The wider tui::history test suite (68 tests) still passes.

Three follow-ups to the previous perf commit: 1. Drop the rows_hash field on CacheEntry. The field was computed and stored but never read on the hot path; tests exercised it only to assert the cache returned a stable hash. After this change get_or_compute_rows returns just Vec<OutputRow>, halving the tuple-return ABI and removing one DefaultHasher::write pass on every cache miss. 2. Replace DefaultHasher (SipHash) with a hand-rolled FNV-1a 64-bit hash. SipHash is per-process-keyed and ~5-10x slower than FNV on the small-to-medium tool output strings we see at 120 FPS. FNV-1a has no per-process key, fits in 20 lines of pure-Rust, and a 64-bit collision space is more than wide enough for the per-process LRU's expected <= a few hundred entries. The cache is a correctness optimization, not a security boundary; collisions only cause a false miss, never wrong data. 3. Caller in tui::history::render_preserved_output_mode updated to the new Vec<OutputRow>-only signature. Two new tests cover the FNV-1a properties (length-suffix sensitivity, empty-input stability).

- Add scroll state field to PlanPromptView with PgUp/PgDn, Ctrl+U/D/F/B, Home/End, gg/G vim-style keybindings - Show scroll indicator footer when content overflows the popup - Add confirming_exit state: Esc while scrolled asks for confirmation before discarding, preventing accidental exits on long plans - Clamp scroll in render() so overscroll doesn't hide bottom options - Use wrapped_line_count() with UnicodeWidthStr for accurate overflow detection with CJK characters - Add 11 unit tests covering scroll, keybindings, and exit confirmation

- Use word-wrapping-aware line count to prevent underestimating scroll range (gemini-code-assist / greptile-apps) - Merge PLAN_OPTIONS, PLAN_SHORTCUTS, PLAN_SHORT_LABELS into PlanOption struct (gemini-code-assist) - Remove dead Esc code in handle_key (greptile-apps) - Guard gg/G with modifier checks (gemini-code-assist) - Increase PgUp/PgDn scroll amount from 6 to 12 (greptile-apps) - Use u16::try_from for scroll value to avoid silent truncation (greptile-apps) - Update related unit tests for new scroll values

… key Use clamped (effective) scroll instead of raw `self.scroll` in the Esc handler so a short plan that fits entirely (max_scroll == 0) never triggers the "exit without implementing?" dialog when the user pressed a scroll key (PgDn/Ctrl-D/G/End) beforehand.

…ppy lint

…to scroll region

…e wrap width - wrapped_line_count: compute leading-space width via UnicodeWidthStr instead of byte length, so non-ASCII leading whitespace is measured correctly. - render: hoist popup_area / content_width computation above plan rendering so wrap_text can share the same content_width derived from the actual popup geometry instead of a magic 68.

…rome - Clear pending_g when Esc triggers the exit-confirmation prompt so a stray 'g' press does not leak into and survive the confirmation dialog. - Move render_modal_chrome into the else branch so only one call fires per render pass, eliminating a shadow artifact when confirming_exit is active.

…nder work - wrap_text: replace chars().count() with UnicodeWidthStr::width() so CJK text is wrapped by display columns, consistent with wrapped_line_count and ratatui's Paragraph::wrap. Also fix the hard-split loop to use exclusive byte ranges (..end) instead of inclusive (..=i) so multi-byte UTF-8 prefixes are always valid. - render: hoist the confirming_exit branch to an early return so the plan-content construction (lines, scroll bounds, footer) is skipped entirely when the confirmation dialog is visible.

…breaks at script boundaries Wrap plan steps via wrap_text() before rendering, breaking only on display-width overflow, not on Latin/CJK Unicode word boundaries. Switch main render path from Wrap { trim: true } to Wrap { trim: false } since all content is pre-wrapped. Replace wrapped_line_count() with lines.len() for accurate scroll bounds. Keep confirm-exit dialog on Wrap { trim: true } (English-only, no risk).

#2683) Subagent aliases: - Legacy names (agent_spawn, agent_result, agent_cancel, resume_agent, agent_list, agent_send_input, agent_assign, agent_wait, delegate_to_agent) are already NOT registered — they exist as dead code with #[allow(dead_code)] since v0.8.33 - Add test verifying model catalog only advertises canonical subagent tools: agent_open, agent_eval, agent_close, tool_agent Shell aliases: - Hide exec_wait from model catalog (legacy alias for exec_shell_wait) - Hide exec_interact from model catalog (legacy alias for exec_shell_interact) - Both remain callable for saved transcript replay - Add test verifying shell aliases are hidden but callable Verification: cargo test -p codewhale-tui --locked (4040 passed), cargo clippy -D warnings

…2683)

Root cause: AgentComplete unconditionally calls resume_terminal() even when the terminal was never paused, causing a secondary EnterAlternateScreen on Windows that creates a new buffer whose width may differ from the window width. Additionally, ColorCompatBackend had no terminal_size cache, so size() fell through to crossterm::terminal::size() which on Windows returns the WinAPI buffer width rather than the window width. Changes: - AgentComplete: add event_broker.is_paused() guard - resume_terminal(): cache real terminal size before reset_viewport - Resize handler: also set terminal_size alongside forced_size - subagent_routing: 3x mark_history_updated -> bump_history_cell(idx) - color_compat: add terminal_size field, set_terminal_size(), fix size() fallback priority (forced_size > terminal_size) - tests: 3 unit tests for size() fallback chain Review feedback addressed: - forced_size now takes priority over terminal_size (gemini-code-assist) - Redundant map lookups removed in subagent_routing (both bots) - set_terminal_size moved before reset_terminal_viewport (greptile-apps) (cherry picked from commit 4463c46)

@xyuai

Harvests the MiMo Token Plan auth-header behavior from #2627 while keeping Xiaomi env-key precedence unchanged so standard endpoints do not accidentally receive a Token Plan key. Harvested from PR #2627 by @xyuai. Co-authored-by: xyuai <281015099+xyuai@users.noreply.github.com>

@idling11

Harvested from PR #2581 by @idling11. Co-authored-by: idling11 <8055620+idling11@users.noreply.github.com>

@cyq1017

Harvested from PR #2513 by @cyq1017. Co-authored-by: cyq1017 <61975706+cyq1017@users.noreply.github.com>

@wavezhang

Harvests provider-scoped TLS skip-verify from #1893 by @wavezhang. Disabled by default, active-provider-only, doctor-reported, and keeps SSL_CERT_FILE as the preferred custom CA path.

fix(tui): count workspace MCP servers in status surfaces

docs(runtime): document read-only VS Code Agent View APIs

Classify stream/body decode failures such as the #2847 report as recoverable network interruptions and add focused taxonomy coverage.\n\nVerification:\n- cargo test -p codewhale-tui error_taxonomy::tests --locked\n- git diff --check\n- cmp -s CHANGELOG.md crates/tui/CHANGELOG.md\n- ./scripts/release/check-versions.sh\n- ./scripts/release/check-ohos-deps.sh

Add a pure HarnessProfile resolver for provider/model routes while keeping runtime provider/model routing, prompts, tools, auth, context, and persisted config unchanged.\n\nVerification:\n- cargo test -p codewhale-config harness_profile --locked\n- cargo fmt --all --check\n- git diff --check\n- cmp -s CHANGELOG.md crates/tui/CHANGELOG.md\n- ./scripts/release/check-versions.sh\n- ./scripts/release/check-ohos-deps.sh

Add recorded mock-trace replay coverage for workflows/rlm_cache_change.star and prove missing dogfood records produce ReplayDiverged instead of live fallback.\n\nVerification:\n- cargo test -p codewhale-whaleflow rlm_cache_change --locked\n- cargo fmt --all --check\n- git diff --check\n- cmp -s CHANGELOG.md crates/tui/CHANGELOG.md\n- ./scripts/release/check-versions.sh\n- ./scripts/release/check-ohos-deps.sh

Hmbown and others added 30 commits June 3, 2026 19:20

fix(settings): tighten legacy path migration coverage

9e15805

fix release crate publish checks

05950d1

fix(tui): replace manual div_ceil with usize::div_ceil to satisfy cli…

537a8bc

…ppy lint

fix(plan_prompt): remove step truncation to allow content overflow in…

11c448d

…to scroll region

chore: apply cargo fmt fix to plan_prompt.rs

47c071a

docs: update TOOL_SURFACE.md with v0.9.0 hidden-alias table (#2682, #…

27db89c

…2683)

fix(tui): invalidate fanout card rows on sibling starts

159f509

docs: refresh README and v0.9 execution map

44ceabd

docs: harvest provider fallback chain RFC

5dc1a63

Harvested from PR #2581 by @idling11. Co-authored-by: idling11 <8055620+idling11@users.noreply.github.com>

docs: mark mention depth hint harvested

111a805

feat(tui): add bounded restore snapshot listing

311eb40

Harvested from PR #2513 by @cyq1017. Co-authored-by: cyq1017 <61975706+cyq1017@users.noreply.github.com>

docs: update v0.9 PR harvest map

868f99b

Hmbown added 24 commits June 5, 2026 22:23

feat(whaleflow): add memo telemetry counters (#2833)

85b5ca5

feat(config): add provider TLS skip verify

190e9f3

Harvests provider-scoped TLS skip-verify from #1893 by @wavezhang. Disabled by default, active-provider-only, doctor-reported, and keeps SSL_CERT_FILE as the preferred custom CA path.

fix(cli): include TLS skip flag in runtime option tests

6269cb9

fix(tui): count workspace MCP servers in status surfaces

56fa055

fix(tui): count workspace MCP servers in status surfaces

docs(runtime): document read-only VS Code Agent View APIs

96b825b

docs(runtime): document read-only VS Code Agent View APIs

fix(whaleflow): reject unknown workflow references (#2837)

6bb564e

feat(compaction): add dormant hard compaction planner (#2838)

d784f1e

feat(whaleflow): add teacher candidate artifacts (#2839)

14d14f5

feat(whaleflow): add student replay promotion gate (#2840)

6a527fc

feat(whaleflow): mark mock cancellation and budgets (#2841)

7fc4ec8

docs(whaleflow): define external memory cutline (#2842)

a705275

docs(release): add v0.9 acceptance matrix (#2843)

2bb24d0

docs(harness): define profile cutline (#2844)

efbcc68

docs(harness): align v0.9 profile acceptance

e22a7da

docs(release): fill v0.9 acceptance evidence

cd9a044

fix(vscode): keep agent view metadata on snapshot errors

23a188e

docs(release): mark deferred v0.9 acceptance gates

caa1d4a

docs(release): resolve v0.9 UI acceptance cutline

e69ea45

docs(release): record slash picker v0.9 evidence

a5a6b0a

docs(release): close v0.9 credit rollback gates (#2856)

2561a54

docs(release): record v0.9 core gate evidence (#2857)

ab8e3a1

Hmbown mentioned this pull request Jun 6, 2026

v0.9.0 Release acceptance matrix: required checks before tagging #2729

Open

Hmbown added 5 commits June 6, 2026 02:33

docs(release): mark asset verification as pre-npm gate (#2858)

b2e1ba1

docs(release): record macOS startup evidence (#2859)

7bd6827

docs(release): record DeepSeek v4 live smoke (#2860)

137d65c

docs(release): record Linux startup evidence (#2861)

cc3cbc8

feat(runtime-api): expose git status metadata for agent view (#2862)

5bd2f6a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.9.0 stewardship integration#2762

v0.9.0 stewardship integration#2762
Hmbown wants to merge 190 commits into
mainfrom
codex/v0.9.0-stewardship

Hmbown commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Conversation

Hmbown commented Jun 5, 2026

Summary

Current verification

Maintainer notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants