Fix interactive YOLO plan review semantics#1836
Fix interactive YOLO plan review semantics#1836WeZZard wants to merge 3 commits intoMoonshotAI:mainfrom
Conversation
da40dc3 to
9d01429
Compare
9d01429 to
9e3eaa0
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 43d913d85b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if soul.can_request_user_feedback: | ||
| return [] |
There was a problem hiding this comment.
Avoid persisting non-interactive ACP guidance in history
NonInteractiveModeInjectionProvider now injects whenever can_request_user_feedback is false, and ACP sessions are explicitly created with that flag disabled (src/kimi_cli/acp/server.py). Because injections are appended into conversation history (via KimiSoul context writes), this “Do NOT call AskUserQuestion” reminder survives session continuation; if the same session is later resumed in interactive shell/wire mode, the model still sees stale non-interactive instructions and may skip plan review/questions even though feedback is available. This regression newly affects default (non‑YOLO) ACP sessions, which previously did not receive a non-interactive reminder.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed with an explicit regression test and tightened invariants. Dynamic injections are now verified to be step-local only: the reminder is present in the model-facing effective history for the current step, but it is not appended to Context.history, not written into the context file, and not restored on a fresh Context.restore() from disk. See tests/core/test_kimisoul_steer.py::test_step_keeps_runtime_specific_injections_out_of_persisted_history in commit c9460e9c.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c9460e9c57
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| effective_messages = list(self._context.history) | ||
| if injections: | ||
| combined_reminders = "\n".join(system_reminder(inj.content).text for inj in injections) | ||
| await self._context.append_message( | ||
| effective_messages.append( | ||
| Message( |
There was a problem hiding this comment.
Restore plan reminder throttling in plan mode
PlanModeInjectionProvider.get_injections() throttles by scanning history for previous plan reminders, but this change makes injections step-local by copying self._context.history and appending reminders only to effective_messages. Because reminders are no longer persisted in context, plan-mode turns after the first never contain a prior reminder marker, so the provider treats each step as first-time and re-injects the full plan reminder every step, which can quickly inflate prompt tokens and degrade long plan-mode sessions.
Useful? React with 👍 / 👎.
| effective_messages = list(self._context.history) | ||
| if injections: | ||
| combined_reminders = "\n".join(system_reminder(inj.content).text for inj in injections) | ||
| await self._context.append_message( | ||
| effective_messages.append( | ||
| Message( | ||
| role="user", | ||
| content=[TextPart(text=combined_reminders)], | ||
| ) | ||
| ) |
There was a problem hiding this comment.
🔴 PlanModeInjectionProvider throttling broken by non-persisted injections
The _step() method was changed to no longer persist dynamic injections into self._context (kimisoul.py:838-846). However, PlanModeInjectionProvider (plan_mode.py:61-79) relies on scanning self._context.history (passed at kimisoul.py:226) to find its own previous reminders via _has_plan_reminder() for throttling (inject every 5 assistant turns). Since injections are no longer persisted, _has_plan_reminder never finds a match, found_previous is always False, and the provider takes the if not found_previous branch on every step — injecting the full plan mode reminder (a multi-paragraph prompt) on every single LLM call instead of every 5 turns. This also resets _inject_count to 1 each time, breaking the sparse/full alternation logic at plan_mode.py:86-91.
Impact in plan mode
Every step in plan mode now gets the full ~30-line plan reminder injected, significantly increasing token usage and potentially degrading LLM output quality from repeated instructions. The throttle interval (_TURN_INTERVAL = 5) and full/sparse alternation (_FULL_EVERY_N = 5) are completely bypassed.
Prompt for agents
The root cause is that PlanModeInjectionProvider (src/kimi_cli/soul/dynamic_injections/plan_mode.py lines 61-79) scans the history passed to get_injections() to find its own previous reminders for throttling. Since _step() in kimisoul.py now builds effective_messages as a local copy (not persisting injections), the provider never finds its previous reminders in self._context.history.
Two possible approaches:
1. Change PlanModeInjectionProvider to track its own state internally (e.g. counting assistant messages or steps since last injection) rather than relying on scanning persisted history. This aligns with the new non-persisting design and matches how YoloModeInjectionProvider and NonInteractiveModeInjectionProvider already use internal _injected flags.
2. Alternatively, pass effective_messages (which includes the current step's injection) to _collect_injections on subsequent calls within the same turn, or maintain a separate ephemeral history that includes injections for the provider to scan. This is more complex and less clean.
Approach 1 is recommended. The provider could maintain an internal counter of assistant messages seen since the last injection, incrementing when it sees new assistant messages and resetting when it injects. This removes the dependency on persisted history entirely.
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Rationale
Verification
uv run pytest tests/tools/test_ask_user.py tests/tools/test_plan_yolo.py tests/core/test_yolo_injection.py tests/ui_and_conv/test_shell_prompt_echo.py tests/ui_and_conv/test_replay.py tests/ui_and_conv/test_visualize_running_prompt.py tests/ui_and_conv/test_shell_run_placeholders.py tests/e2e/test_shell_modal_e2e.py -q -k 'ask_user or plan_yolo or yolo_injection or shell_prompt_echo or replay or visualize_running_prompt or shell_run_placeholders or yolo_enter_plan_still_requests_review or yolo_exit_plan_still_requests_review'uv run ruff check src/kimi_cli/ui/shell/echo.py tests/ui_and_conv/test_shell_prompt_echo.py tests/ui_and_conv/test_replay.py tests/ui_and_conv/test_visualize_running_prompt.py tests/ui_and_conv/test_shell_run_placeholders.pyVisual Outcomes