Skip to content

test(client): add plan mode toggle byte-stability invariant test#2519

Open
HUQIANTAO wants to merge 1 commit into
Hmbown:mainfrom
HUQIANTAO:feat/plan-mode-byte-stable
Open

test(client): add plan mode toggle byte-stability invariant test#2519
HUQIANTAO wants to merge 1 commit into
Hmbown:mainfrom
HUQIANTAO:feat/plan-mode-byte-stable

Conversation

@HUQIANTAO
Copy link
Copy Markdown
Contributor

Summary

Add a dedicated test that verifies the tool catalog remains byte-stable across Plan mode toggles and deferred tool activations. This converts the existing soft invariant (documented in comments and verified by ordering tests) into a hard byte-level assertion that catches any future regression in catalog construction determinism.

Background

DeepSeek's KV prefix cache includes the tools array in the immutable prefix. Any byte-level change in the tools array forces a full re-prefill on the next turn. The existing tests verify tool ordering (alphabetical within partitions, deferred tools at the tail), but they don't verify that the serialized bytes are identical across repeated catalog builds.

This matters for two scenarios:

  1. Mode toggles: Users switch between Plan, Agent, and YOLO modes during a session. Each mode builds a different tool catalog (Plan excludes execution tools). If the catalog construction has any non-determinism (e.g., HashMap iteration order, timestamp-dependent logic), toggling between modes could produce different byte sequences for the same mode, busting the prefix cache.

  2. Deferred tool activation: When ToolSearch activates a deferred tool mid-session, the tool must be appended to the tail of the catalog without reordering the head. The existing active_tool_list_pushes_deferred_activations_to_the_tail test verifies the ordering, but not the byte-level stability of the head.

What changed

New test: plan_mode_toggle_preserves_catalog_byte_stability

Verifies three invariants:

  1. Same-mode determinism: Building the catalog twice for Plan mode produces identical JSON bytes. Same for Agent mode. This catches any non-determinism in catalog construction.

  2. Cross-mode head stability: Non-deferred tools common to Plan and Agent modes appear in the same order. Plan mode excludes execution tools, but the tools that are present in both modes must have stable byte positions.

  3. Deferred activation tail-append: Activating a deferred tool mid-session appends to the tail without reordering the catalog head. The test builds a catalog with a deferred tool, activates it, and verifies the head prefix is preserved.

New doc comment on build_model_tool_catalog

Documents the catalog-head stability invariant: the head of the catalog (non-deferred tools) must remain byte-identical across mode toggles for tools common to both modes.

Testing

The new test passes. All existing tool catalog tests continue to pass.

Files changed

  • crates/tui/src/core/engine/tool_catalog.rs: Add doc comment documenting the catalog-head stability invariant
  • crates/tui/src/core/engine/tests.rs: Add plan_mode_toggle_preserves_catalog_byte_stability test

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new test to verify that toggling between Plan and Agent modes preserves the byte stability of the tool catalog head, which is critical for DeepSeek's KV prefix cache. It also adds documentation explaining this stability invariant. The review feedback points out an unused import PrefixFingerprint in the new test that should be removed to prevent compiler warnings.

Comment thread crates/tui/src/core/engine/tests.rs Outdated
/// when deferred tools are activated mid-session.
#[test]
fn plan_mode_toggle_preserves_catalog_byte_stability() {
use crate::prefix_cache::PrefixFingerprint;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The import PrefixFingerprint is unused in this test and can be safely removed to clean up the code and avoid compiler warnings.

Add test plan_mode_toggle_preserves_catalog_byte_stability that verifies
three invariants critical for DeepSeek's KV prefix cache:

1. Building the tool catalog twice for the same mode produces identical
   JSON bytes. This catches any non-determinism in catalog construction
   (e.g., HashMap iteration order, timestamp-dependent logic).

2. Non-deferred tools common to Plan and Agent modes appear in the same
   order. Plan mode excludes execution tools, but the tools that are
   present in both modes must have stable byte positions so that toggling
   between modes doesn't shift byte offsets of shared tools.

3. Activating a deferred tool mid-session appends to the tail without
   reordering the catalog head. This is the existing invariant from Hmbown#263,
   now covered by a dedicated byte-level assertion.

Also add a doc comment to build_model_tool_catalog documenting the
catalog-head stability invariant.
@HUQIANTAO HUQIANTAO force-pushed the feat/plan-mode-byte-stable branch from d600a85 to 5c786f5 Compare June 1, 2026 16:29
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@Hmbown
Copy link
Copy Markdown
Owner

Hmbown commented Jun 2, 2026

Hey @HUQIANTAO — the plan mode byte-stability test has been harvested into v0.8.50 (#2504)! This is a great addition — catching catalog-head reordering before it busts the prefix cache is exactly the kind of defense-in-depth we want. The doc comment on `build_model_tool_catalog` documenting the stability invariant is a nice touch too. Thank you! 🐋

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants