Skip to content

fix(tui): keep task shell tools eagerly loaded (#2253)#2271

Open
Hmbown wants to merge 1 commit into
mainfrom
fix/2253-task-shell-eager-tools
Open

fix(tui): keep task shell tools eagerly loaded (#2253)#2271
Hmbown wants to merge 1 commit into
mainfrom
fix/2253-task-shell-eager-tools

Conversation

@Hmbown
Copy link
Copy Markdown
Owner

@Hmbown Hmbown commented May 27, 2026

Summary

  • keep task_shell_start and task_shell_wait in the default eager native-tool catalog
  • prevent first-use schema hydration from replacing execution for the long-running shell flow
  • add regression coverage to the native deferral policy

Closes #2253.

Thanks @bevis-wong for the clear Windows/deepseek-v4-flash report and logs.

Verification

  • cargo fmt --all -- --check
  • git diff --check
  • cargo test -p codewhale-tui non_yolo_mode_retains_default_defer_policy -- --nocapture
  • cargo check -p codewhale-tui --all-features --locked

Open in Devin Review

Greptile Summary

This PR fixes a regression where task_shell_start and task_shell_wait were omitted from the eagerly-loaded native tool catalog, causing first-use calls to be intercepted by the deferred schema-hydration path instead of executing — breaking the long-running shell flow, notably on Windows with deepseek-v4-flash.

  • tool_catalog.rs: Inserts task_shell_start and task_shell_wait into DEFAULT_ACTIVE_NATIVE_TOOLS in the correct alphabetical position, so should_default_defer_tool returns false for both and maybe_hydrate_requested_deferred_tool no longer intercepts their first call.
  • tests.rs: Adds two assertions to the existing non_yolo_mode_retains_default_defer_policy test, providing targeted regression coverage for the fix.

Confidence Score: 5/5

Safe to merge — the change is a two-line catalog addition with a matching regression test; no existing behavior is altered for any other tool.

The fix is minimal and surgical: two tool names added to an alphabetically-ordered constant, a targeted should_default_defer_tool check confirmed by two new test assertions, and the logic path through maybe_hydrate_requested_deferred_tool is already well-exercised by the surrounding test suite. There is no risk of side-effects on other tools in the catalog.

No files require special attention.

Important Files Changed

Filename Overview
crates/tui/src/core/engine/tool_catalog.rs Adds task_shell_start and task_shell_wait to DEFAULT_ACTIVE_NATIVE_TOOLS, keeping them eagerly loaded and preventing first-use schema hydration from intercepting the long-running shell flow.
crates/tui/src/core/engine/tests.rs Extends non_yolo_mode_retains_default_defer_policy with two new assertions confirming task_shell_start and task_shell_wait are not deferred; regression coverage is correct and well-placed.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Model requests task_shell_start / task_shell_wait"] --> B{"Is tool in DEFAULT_ACTIVE_NATIVE_TOOLS?"}
    B -- "Before fix: NO (deferred)" --> C["maybe_hydrate_requested_deferred_tool intercepts call"]
    C --> D["Returns schema-hydration response\n(tool NOT executed, retry required)"]
    D --> E["Long-running shell flow broken\n(regression on Windows / deepseek-v4-flash)"]
    B -- "After fix: YES (eager)" --> F["should_default_defer_tool returns false\ndefer_loading = Some(false)"]
    F --> G["maybe_hydrate_requested_deferred_tool returns None\n(tool proceeds to execution)"]
    G --> H["task_shell_start / task_shell_wait execute normally"]
Loading

Reviews (1): Last reviewed commit: "fix(tui): keep task shell tools eagerly ..." | Re-trigger Greptile

Copilot AI review requested due to automatic review settings May 27, 2026 10:16
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds "task_shell_start" and "task_shell_wait" to the list of default active native tools in the TUI engine's tool catalog, and includes corresponding unit tests to verify that these tools do not default to a deferred policy. I have no feedback to provide as there are no review comments to evaluate.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 1 additional finding.

Open in Devin Review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@Hmbown
Copy link
Copy Markdown
Owner Author

Hmbown commented May 27, 2026

Independent review:

Confirmed minimal & correct. Two-line insertion into DEFAULT_ACTIVE_NATIVE_TOOLS preserves alphabetical order, and should_default_defer_tool (lines 56-72 in tool_catalog.rs) returns false via the existing DEFAULT_ACTIVE_NATIVE_TOOLS lookup. Regression assertions in tests.rs are well-placed inside non_yolo_mode_retains_default_defer_policy and exercise the exact code path that was failing.

Merges cleanly against main (no conflicts). The test only covers AppMode::AgentAppMode::Yolo first-invocation isn't asserted, but the catalog lookup is mode-agnostic so coverage is sufficient.

v0.8.48 (PR #2256): Supersedes this fix. That PR rewrites should_default_defer_tool to _name/_always_load and unconditionally returns false for every tool, making the DEFAULT_ACTIVE_NATIVE_TOOLS membership check moot. No intent conflict — both land on "task_shell_* must not defer." Recommend landing #2271 first as the targeted hotfix for #2253; the catalog entries become harmless once #2256 merges, but the test assertions remain valid regression coverage and should be kept.

@Hmbown
Copy link
Copy Markdown
Owner Author

Hmbown commented May 27, 2026

@Hmbown — quick harvest note (cross-author since you opened both). #2256's should_default_defer_tool change to always-return-false is the broader supersession of this targeted fix; intent-aligned but #2271 ships the narrower, safer-to-cherry-pick version.

Recommendation: land #2271 as the targeted hotfix for #2253 now (it's clean against main, low risk), and the assertions in non_yolo_mode_retains_default_defer_policy remain valid regression coverage even after #2256's broader change lands. So no work lost — this gets credited as "first fix" + the broader policy change supersedes the implementation but not the intent. Both contribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: tool lazy-registration window causes first invocation of task_shell_* to be deferred

2 participants