Skip to content

fix: move PTY command tracking off the shell input path#117

Merged
F16shen merged 8 commits intoAI-Shell-Team:mainfrom
F16shen:split/pty-command-metadata-leak
Apr 22, 2026
Merged

fix: move PTY command tracking off the shell input path#117
F16shen merged 8 commits intoAI-Shell-Team:mainfrom
F16shen:split/pty-command-metadata-leak

Conversation

@F16shen
Copy link
Copy Markdown
Collaborator

@F16shen F16shen commented Apr 20, 2026

Summary

This PR fixes PTY command metadata leakage by removing internal command-tracking metadata from the shell input stream.

Instead of injecting internal command state directly into the command text sent to bash, command association now uses Python-side pending submission state plus a shell metadata side channel. The change also preserves user and backend history semantics, improves multiline output cleanup, and hardens interactive echo suppression.

Why

Internal metadata should never depend on shell echo behavior to stay hidden.

The old approach assumed bash would echo the submitted command in a form close enough to the original injected text for cleanup to strip it reliably. That breaks in real shell scenarios such as history expansion, quoting changes, multiline commands, and other bash rewrites, which can expose internal AISH metadata in terminal output or bash_exec results.

Testing

  • Ran /home/lixin/workspace/aishell/aish/.venv/bin/python -m pytest tests/terminal/pty/test_pty_control_protocol.py tests/shell/runtime/test_shell_pty_core.py tests/tools/test_bash_output_offload.py -q
  • Result: 79 passed

Summary by Sourcery

Remove internal PTY command-tracking metadata from the shell input stream and shift command association to a sidechannel and pending-submission state while tightening echo filtering and lifecycle tracking.

New Features:

  • Expose PTY command submission metadata and pending submissions via CommandState and PTYManager for higher-level consumers.

Bug Fixes:

  • Prevent internal AISH PTY command-tracking metadata from leaking into user-visible bash output, history, and tool execution results.
  • Ensure user command echo suppression works across split and interleaved PTY output chunks without stripping unrelated text.
  • Avoid mutating backend events that arrive without command_seq while still resolving them against pending submissions.
  • Ensure bash exit commands emit a dedicated shell_exiting backend event with the correct exit code.

Enhancements:

  • Route PTY command metadata through a dedicated temporary file and bash rc wrapper sidechannel instead of inline environment-variable prefixes on user commands.
  • Refine PTY output cleaning to strip echoed commands more robustly, including multiline and prompt-prefixed echoes, while preserving legitimate output.
  • Improve thread safety and ordering guarantees in PTYManager when writing command metadata and sending commands.
  • Harden output processing by stripping ANSI control sequences when matching echoes and resetting echo suppression state on prompt_ready.
  • Extend PTY command state tracking with observed-command capture, better command_seq resolution, and richer pending submission queries.
  • Adjust backend commands to be prefixed only with a leading space so they do not end up in bash history while preserving user history semantics.

Tests:

  • Add extensive PTY, shell, and tool tests covering echo suppression edge cases, bash history behavior for user vs. backend commands, history expansion, metadata leakage prevention, multiline commands, and shell exit signaling.
  • Expand exit tracker tests to validate pending submission lifecycle, command_seq resolution from events, and separation of submitted vs. observed command text.
  • Add Bash tool tests ensuring shared-PTY execution does not leak internal command metadata for single-line and multiline commands.

Summary by CodeRabbit

  • New Features

    • Propagates command metadata (submission IDs/metadata) to improve command/result correlation and lifecycle visibility.
  • Bug Fixes

    • More reliable user-command echo suppression that handles chunked output and strips ANSI sequences.
    • Prevented internal metadata from appearing in command output or shell history; pending echoes cleared on prompt completion.
  • Improvements

    • Enhanced command lifecycle tracking, protocol diagnostics, PTY readiness, and serialized metadata writes for concurrent submissions.
  • Tests

    • Expanded coverage for echo handling, protocol issues, lifecycle resolution, history hygiene, and concurrent submission ordering.

Move PTY command tracking off the shell input path and replace the injected __AISH_ACTIVE_COMMAND_* prefix with Python-side pending submission state plus a shell metadata side channel.

Also tighten shared PTY output cleanup for multiline commands, serialize metadata writes with PTY sends, preserve user/backend history semantics, and harden user echo suppression against chunked and async pre-echo output.

Add regression coverage for command lifecycle association, history expansion, multiline and metacharacter commands, shared PTY bash_exec paths, shell exit events, and live shell echo filtering.

Validation: /home/lixin/workspace/aishell/aish/.venv/bin/python -m pytest tests/terminal/pty/test_pty_control_protocol.py tests/shell/runtime/test_shell_pty_core.py tests/tools/test_bash_output_offload.py
@sourcery-ai

This comment was marked as outdated.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the pull request. A maintainer will review it when available.

Please keep the PR focused, explain the why in the description, and make sure local checks pass before requesting review.

Contribution guide: https://github.com/AI-Shell-Team/aish/blob/main/CONTRIBUTING.md

@github-actions
Copy link
Copy Markdown
Contributor

This pull request description looks incomplete. Please update the missing sections below before review.

Missing items:

  • User-visible Changes
  • Compatibility
  • Change Type
  • Scope

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In PTYManager.send_command you special-case the absence of _lock via getattr(self, "_lock", None); since _lock is always initialized on real instances, consider simplifying to always use the lock (or explicitly documenting the test-only pathway) to avoid hiding attribute errors or diverging behavior between tests and production.
  • The new command-metadata file helpers (_ensure_command_metadata_file, _write_command_metadata, _clear_command_metadata_file, _remove_command_metadata_file) silently swallow all OSErrors; if failures here would be diagnostically important, consider at least logging at debug level so issues with metadata propagation can be traced without impacting shell behavior.
  • In OutputProcessor._strip_terminal_control, the OSC regex rb"\x1b\].*?(?:\x07|\x1b\\)" is not DOTALL, so it will stop at newlines; if you expect multiline OSC sequences (e.g., some terminal titles), you may want to add re.DOTALL to ensure they are fully stripped and don't interfere with echo detection.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `PTYManager.send_command` you special-case the absence of `_lock` via `getattr(self, "_lock", None)`; since `_lock` is always initialized on real instances, consider simplifying to always use the lock (or explicitly documenting the test-only pathway) to avoid hiding attribute errors or diverging behavior between tests and production.
- The new command-metadata file helpers (`_ensure_command_metadata_file`, `_write_command_metadata`, `_clear_command_metadata_file`, `_remove_command_metadata_file`) silently swallow all `OSError`s; if failures here would be diagnostically important, consider at least logging at debug level so issues with metadata propagation can be traced without impacting shell behavior.
- In `OutputProcessor._strip_terminal_control`, the OSC regex `rb"\x1b\].*?(?:\x07|\x1b\\)"` is not DOTALL, so it will stop at newlines; if you expect multiline OSC sequences (e.g., some terminal titles), you may want to add `re.DOTALL` to ensure they are fully stripped and don't interfere with echo detection.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Switched command metadata from env-vars to a dedicated metadata FD/pipe; CommandState gained submission IDs, started/completed tracking, and protocol-issue diagnostics; PTYManager writes newline-delimited JSON metadata, uses an RLock and updated readiness/serialization; OutputProcessor matches raw echoed bytes (ANSI-stripped) via incremental buffering.

Changes

Cohort / File(s) Summary
Shell runtime / app
src/aish/shell/runtime/app.py
Stop calling OutputProcessor.set_waiting_for_result for non-exit commands; derive command_seq via CommandState.resolve_command_seq for command_started/prompt_ready and avoid mutating event payloads.
Output / Echo suppression
src/aish/shell/runtime/output.py
Replace sentinel-string echo suppression with raw bytes buffering and incremental, line-oriented matching; strip ANSI control sequences before comparing; add _pending_user_echo_buffer, _clear_pending_user_echo(), and matching/consumption logic.
PTY manager & bash wrapper
src/aish/terminal/pty/manager.py, src/aish/terminal/pty/bash_rc_wrapper.sh
Add dedicated metadata pipe/FD (AISH_COMMAND_METADATA_FD), write newline-delimited JSON per send; manager uses RLock, new _wait_ready() readiness sync, serializes metadata on send_command(), buffers protocol issues; bash wrapper reads metadata FD into pending fields and emits submission_id/source in control events.
Command state & exports
src/aish/terminal/pty/command_state.py, src/aish/terminal/pty/__init__.py
Introduce submission_id, observed_command, status on CommandSubmission; CommandResult gains submission_id; register methods return `CommandSubmission
Tests updated / added
tests/.../test_shell_pty_core.py, tests/terminal/pty/test_exit_tracker_repeat_error.py, tests/terminal/pty/test_pty_control_protocol.py, tests/tools/test_bash_output_offload.py, tests/test_exit_tracker_repeat_error.py
Adapt tests for byte-based echo filtering and chunked delivery, pending-submission lifecycle and submission_id handling, metadata-FD behavior and serialization, PTYManager concurrency/serialization, protocol-issue assertions, history/no-leak checks, and shared-PTY tool behavior. Added concurrency and protocol-diagnostics tests.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Manager as PTYManager
    participant MetaFD as Metadata FD
    participant Shell as Bash
    participant CmdState as CommandState
    participant Output as OutputProcessor
    participant App as ShellApp

    User->>Manager: send_command(cmd, source)
    Manager->>CmdState: register_command()
    CmdState-->>Manager: CommandSubmission (submission_id)
    Manager->>MetaFD: write newline-delimited JSON metadata
    Manager->>Shell: write command to PTY
    Shell->>MetaFD: read metadata -> populate pending fields
    Shell->>Manager: emit control event (command_started with submission_id/source)
    Manager->>CmdState: resolve/bind command_seq
    Shell->>Output: emit stdout/echo bytes
    Output->>Output: strip ANSI, buffer lines, match pending echo, remove matched bytes
    Shell->>Manager: emit control event (prompt_ready with submission_id/source)
    Manager->>CmdState: resolve & pop completed submission
    Manager->>App: deliver completion/control event (phase transition)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

size: M

Poem

🐰 A pipe for metadata, no env-vars to hide,
Bytes hop through tunnels where JSON lines abide.
IDs for each submission, echoes trimmed just right,
Locks and prompts and diagnostics keep the garden bright.
I nibble bugs and cheer — the PTY sleeps tight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.58% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: moving PTY command tracking metadata from shell input to a dedicated side channel.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@F16shen
Copy link
Copy Markdown
Collaborator Author

F16shen commented Apr 21, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (7)
src/aish/terminal/pty/__init__.py (1)

6-6: Consider sorting __all__ alphabetically.

Static analysis flags that __all__ is not sorted. While not critical, sorted exports improve readability and reduce merge conflicts.

🔧 Proposed fix
-__all__ = ["PTYManager", "CommandResult", "CommandState", "CommandSubmission"]
+__all__ = ["CommandResult", "CommandState", "CommandSubmission", "PTYManager"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/aish/terminal/pty/__init__.py` at line 6, The export list __all__ in
__init__ contains unsorted names; sort the entries alphabetically to improve
readability and avoid lint warnings — update the __all__ assignment so the
identifiers "CommandResult", "CommandState", "CommandSubmission", and
"PTYManager" appear in lexicographic order (use the existing symbol names to
locate the list and replace it with the sorted sequence).
src/aish/terminal/pty/manager.py (1)

279-304: Defensive lock check is unnecessary.

The getattr(self, "_lock", None) pattern at line 281 is overly defensive since __init__ always initializes _lock. While it doesn't cause harm, it adds cognitive overhead and suggests uncertainty about object state.

♻️ Simplified version
     def send_command(
         self,
         command: str,
         command_seq: int | None = None,
         source: str = "backend",
     ) -> None:
         stripped_command = command.strip()
         command_to_send = command if source == "user" else f" {command}"
-        lock = getattr(self, "_lock", None)
-
-        if lock is None:
-            self._command_state.register_command(
-                stripped_command, source=source, command_seq=command_seq
-            )
-            self._write_command_metadata(
-                command=stripped_command,
-                source=source,
-                command_seq=command_seq,
-            )
-            self.send((command_to_send + "\n").encode())
-            return
-
-        with lock:
+        with self._lock:
             self._command_state.register_command(
                 stripped_command, source=source, command_seq=command_seq
             )
             self._write_command_metadata(
                 command=stripped_command,
                 source=source,
                 command_seq=command_seq,
             )
             self.send((command_to_send + "\n").encode())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/aish/terminal/pty/manager.py` around lines 279 - 304, The defensive
getattr(self, "_lock", None) check is unnecessary because _lock is always
initialized; simplify by removing the None-branch and always using the lock's
context manager. Replace the getattr+if/with duplication around
_command_state.register_command, _write_command_metadata, and send with a single
with self._lock: block that performs register_command(stripped_command,...),
_write_command_metadata(...), and send((command_to_send + "\n").encode()); keep
using stripped_command, source, command_seq, and command_to_send as before.
src/aish/shell/runtime/output.py (1)

19-21: ANSI stripping regex patterns differ from manager.py.

These patterns are more comprehensive than those in manager.py:386-387, which use [a-zA-Z] for CSI final bytes and only \x07 for OSC termination. While the patterns here are technically more correct (per ECMA-48), the inconsistency could cause different stripping behavior if both modules process overlapping output paths.

Consider unifying these patterns into a shared utility to ensure consistent terminal control sequence handling across the codebase.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/aish/shell/runtime/output.py` around lines 19 - 21, The CSI/OSC regexes
are duplicated and inconsistent; extract the canonical patterns (the ones here:
_ANSI_CSI_RE and _ANSI_OSC_RE) into a single shared constant in a new module
(e.g., aish.shell.runtime.ansi or similar), replace the local definitions in
output.py and the patterns in manager.py with imports from that shared symbol,
and update all uses to reference the shared _ANSI_CSI_RE and _ANSI_OSC_RE
constants so both modules strip ANSI sequences identically.
tests/shell/runtime/test_shell_pty_core.py (2)

1202-1203: Missing blank line between test functions.

There should be two blank lines between test_shell_does_not_restart_after_explicit_quit_when_flag_was_not_set and test_backend_error_suppressed_prevents_repeated_hints per PEP 8 conventions.

♻️ Suggested fix
     shell._restart_pty.assert_not_called()
+
+
 def test_backend_error_suppressed_prevents_repeated_hints(capsys):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/shell/runtime/test_shell_pty_core.py` around lines 1202 - 1203, Add a
blank line to separate the two test function definitions so they conform to PEP
8: insert one empty line between the end of
test_shell_does_not_restart_after_explicit_quit_when_flag_was_not_set and the
start of test_backend_error_suppressed_prevents_repeated_hints in
tests/shell/runtime/test_shell_pty_core.py so the two test function definitions
are separated by two newlines in the file.

362-362: Consider using time.sleep for clarity.

threading.Event().wait(0.05) creates an Event that's never set, so this always waits the full 0.05s timeout. Using time.sleep(0.05) would be clearer and more idiomatic for this purpose.

♻️ Suggested fix
-    threading.Event().wait(0.05)
+    import time
+    time.sleep(0.05)

Or add time to the existing imports at the top of the file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/shell/runtime/test_shell_pty_core.py` at line 362, Replace the
non-idiomatic sleep call threading.Event().wait(0.05) with time.sleep(0.05) in
the test (locate the call to threading.Event().wait(0.05) in
test_shell_pty_core.py) and add import time to the module imports at the top of
the file so the test uses the clearer, idiomatic time.sleep call.
src/aish/terminal/pty/command_state.py (2)

223-224: Verify the redundant-looking fallback is intentional.

Line 224 computes resolved_command_seq as submission.command_seq if submission.command_seq is not None else command_seq. After _bind_submission_seq(submission, command_seq) on line 223, if command_seq is not None, then submission.command_seq will equal command_seq. The only case where they differ is when command_seq is None and the submission had a prior command_seq from registration.

This appears intentional to preserve the original submission sequence for result tracking when events lack command_seq, but a brief inline comment would clarify the intent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/aish/terminal/pty/command_state.py` around lines 223 - 224, The fallback
assignment to resolved_command_seq (using submission.command_seq if not None
else command_seq) after calling _bind_submission_seq may look redundant; add a
brief inline comment by the assignment explaining that _bind_submission_seq will
set submission.command_seq when command_seq is provided but we must preserve any
pre-existing submission.command_seq when command_seq is None so result tracking
uses the original registration sequence (reference _bind_submission_seq,
resolved_command_seq, submission.command_seq, command_seq).

121-124: Consider caching or simplifying pending_submission_count.

pending_submission_count creates a full tuple via pending_submissions just to return its length. For frequent access patterns, this could be optimized, though for typical usage with few concurrent submissions, the overhead is negligible.

♻️ Optional optimization
     `@property`
     def pending_submission_count(self) -> int:
         """Return the number of tracked pending submissions."""
-        return len(self.pending_submissions)
+        count = 1 if self._active_submission is not None else 0
+        for submission in self._submitted_by_seq.values():
+            if submission is not self._active_submission:
+                count += 1
+        return count
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/aish/terminal/pty/command_state.py` around lines 121 - 124, The property
pending_submission_count currently builds a full tuple by calling
pending_submissions just to get its length; change it to avoid allocating that
tuple by returning the length of the underlying container directly (e.g., use
len(self._pending_submissions) if the internal attribute is
_pending_submissions) or maintain and return an updated counter
incremented/decremented in the methods that add/remove submissions (refer to
pending_submission_count and pending_submissions and the add/remove methods that
touch the pending collection to ensure the counter stays accurate).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/aish/shell/runtime/output.py`:
- Around line 19-21: The CSI/OSC regexes are duplicated and inconsistent;
extract the canonical patterns (the ones here: _ANSI_CSI_RE and _ANSI_OSC_RE)
into a single shared constant in a new module (e.g., aish.shell.runtime.ansi or
similar), replace the local definitions in output.py and the patterns in
manager.py with imports from that shared symbol, and update all uses to
reference the shared _ANSI_CSI_RE and _ANSI_OSC_RE constants so both modules
strip ANSI sequences identically.

In `@src/aish/terminal/pty/__init__.py`:
- Line 6: The export list __all__ in __init__ contains unsorted names; sort the
entries alphabetically to improve readability and avoid lint warnings — update
the __all__ assignment so the identifiers "CommandResult", "CommandState",
"CommandSubmission", and "PTYManager" appear in lexicographic order (use the
existing symbol names to locate the list and replace it with the sorted
sequence).

In `@src/aish/terminal/pty/command_state.py`:
- Around line 223-224: The fallback assignment to resolved_command_seq (using
submission.command_seq if not None else command_seq) after calling
_bind_submission_seq may look redundant; add a brief inline comment by the
assignment explaining that _bind_submission_seq will set submission.command_seq
when command_seq is provided but we must preserve any pre-existing
submission.command_seq when command_seq is None so result tracking uses the
original registration sequence (reference _bind_submission_seq,
resolved_command_seq, submission.command_seq, command_seq).
- Around line 121-124: The property pending_submission_count currently builds a
full tuple by calling pending_submissions just to get its length; change it to
avoid allocating that tuple by returning the length of the underlying container
directly (e.g., use len(self._pending_submissions) if the internal attribute is
_pending_submissions) or maintain and return an updated counter
incremented/decremented in the methods that add/remove submissions (refer to
pending_submission_count and pending_submissions and the add/remove methods that
touch the pending collection to ensure the counter stays accurate).

In `@src/aish/terminal/pty/manager.py`:
- Around line 279-304: The defensive getattr(self, "_lock", None) check is
unnecessary because _lock is always initialized; simplify by removing the
None-branch and always using the lock's context manager. Replace the
getattr+if/with duplication around _command_state.register_command,
_write_command_metadata, and send with a single with self._lock: block that
performs register_command(stripped_command,...), _write_command_metadata(...),
and send((command_to_send + "\n").encode()); keep using stripped_command,
source, command_seq, and command_to_send as before.

In `@tests/shell/runtime/test_shell_pty_core.py`:
- Around line 1202-1203: Add a blank line to separate the two test function
definitions so they conform to PEP 8: insert one empty line between the end of
test_shell_does_not_restart_after_explicit_quit_when_flag_was_not_set and the
start of test_backend_error_suppressed_prevents_repeated_hints in
tests/shell/runtime/test_shell_pty_core.py so the two test function definitions
are separated by two newlines in the file.
- Line 362: Replace the non-idiomatic sleep call threading.Event().wait(0.05)
with time.sleep(0.05) in the test (locate the call to
threading.Event().wait(0.05) in test_shell_pty_core.py) and add import time to
the module imports at the top of the file so the test uses the clearer,
idiomatic time.sleep call.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 2b892ea9-dfbd-4164-a7d1-aba4b5758252

📥 Commits

Reviewing files that changed from the base of the PR and between bf56b02 and e2429a9.

📒 Files selected for processing (10)
  • src/aish/shell/runtime/app.py
  • src/aish/shell/runtime/output.py
  • src/aish/terminal/pty/__init__.py
  • src/aish/terminal/pty/bash_rc_wrapper.sh
  • src/aish/terminal/pty/command_state.py
  • src/aish/terminal/pty/manager.py
  • tests/shell/runtime/test_shell_pty_core.py
  • tests/terminal/pty/test_exit_tracker_repeat_error.py
  • tests/terminal/pty/test_pty_control_protocol.py
  • tests/tools/test_bash_output_offload.py

@F16shen
Copy link
Copy Markdown
Collaborator Author

F16shen commented Apr 21, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Replace the shell input path and metadata file fallback with a dedicated metadata pipe that sends JSON payloads for command submissions.

Track submission_id alongside command_seq in CommandState and PTYManager, drain startup prompt_ready before exposing the shell, surface protocol decode/state issues for diagnostics, and remove the stale output waiting-state plumbing.

Update PTY control, command-state, shell runtime, and regression tests to cover submission_id matching, unmatched event diagnostics, startup readiness draining, and the smaller post-cleanup API surface.

Validation: make test; python -m pytest tests/terminal/pty/test_pty_control_protocol.py tests/terminal/pty/test_exit_tracker_repeat_error.py tests/shell/runtime/test_shell_pty_core.py tests/test_exit_tracker_repeat_error.py
@github-actions github-actions Bot added size: XL and removed size: L labels Apr 22, 2026
Resolve the remaining PR merge conflict in tests/terminal/pty/test_pty_control_protocol.py by keeping the PTY metadata-pipe regression coverage and the newer execute_command timeout semantics from main.

Also bring in main-branch subprocess environment sanitization updates and related tests that merged cleanly.

Validation: python -m pytest tests/terminal/pty/test_pty_control_protocol.py tests/shell/runtime/test_shell_pty_core.py tests/tools/test_bash_output_offload.py tests/tools/test_bash_executor.py tests/shell/ui/test_shell_editor.py -q
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/aish/terminal/pty/manager.py (1)

200-239: ⚠️ Potential issue | 🟠 Major

Buffer startup PTY output instead of discarding it during readiness drain.

_wait_ready() now reads _master_fd before any output callback or OutputProcessor is attached, and those bytes are dropped on the floor. Anything printed by ~/.bashrc, /etc/bash.bashrc, virtualenv activation, or startup warnings will never reach the user. Please stash PTY bytes seen during readiness and replay them on the first real read instead of swallowing them here. As per coding guidelines "src/**/*.py: Focus on terminal, PTY, subprocess, timeout, cancellation, signal handling, state-machine consistency, error propagation, resource cleanup, and whether the changes preserve existing CLI and plan mode semantics."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/aish/terminal/pty/manager.py` around lines 200 - 239, The readiness drain
in _wait_ready is dropping bytes read from _master_fd before any output callback
or OutputProcessor is attached; change it to buffer those bytes into a startup
stash (e.g., self._startup_buffer) instead of discarding them, preserving
control-channel handling via _control_fd and _dispatch_control_chunk as-is; when
the first real reader/output thread starts (e.g., in _output_loop start or on
first read path), replay and flush the buffered bytes into the normal output
handling pipeline (or feed them to the OutputProcessor/callback) under a short
lock to be thread-safe, then clear the stash so subsequent reads are normal.
Ensure the buffer is created and guarded (e.g., threading.Lock) and that control
events continue to trigger session_ready/prompt_ready logic unmodified.
src/aish/shell/runtime/app.py (1)

1751-1778: ⚠️ Potential issue | 🟠 Major

Do not map unresolved backend events onto the active user submission.

resolve_command_seq() returning None currently covers both "startup/no identifiers" and "identifiers were present but did not match anything". In the second case, falling back to self._pending_command_seq will advance the shell state for the wrong command and can clear _pending_command_seq / _pending_command_text while the real user command is still running. Only use that fallback when the event carries neither command_seq nor submission_id.

Suggested guard
         resolved_command_seq = None
+        has_backend_identity = (
+            event.payload.get("command_seq") is not None
+            or event.payload.get("submission_id") is not None
+        )
         if self._pty_manager is not None and event.type in {"command_started", "prompt_ready"}:
             resolved_command_seq = self._pty_manager.command_state.resolve_command_seq(
                 event.payload.get("command_seq"),
                 event.payload.get("submission_id"),
             )
@@
         elif event.type == "command_started":
             command_seq = resolved_command_seq
-            if command_seq is None:
+            if command_seq is None and not has_backend_identity:
                 command_seq = self._pending_command_seq
@@
         elif event.type == "prompt_ready":
             command_seq = resolved_command_seq
-            if command_seq is None:
+            if command_seq is None and not has_backend_identity:
                 command_seq = self._pending_command_seq
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/aish/shell/runtime/app.py` around lines 1751 - 1778, When
resolve_command_seq(...) returns None we must not assume it means "no
identifiers"; change the fallback logic in the command_started and prompt_ready
handling so you only use self._pending_command_seq (and thus allow
advancing/clearing _pending_command_seq/_pending_command_text) when the event
payload contains neither "command_seq" nor "submission_id". Concretely, in the
branches that set command_seq = resolved_command_seq; if command_seq is None
then inspect event.payload.get("command_seq") and
event.payload.get("submission_id") and only assign command_seq =
self._pending_command_seq when both are None; otherwise treat the event as
unmatched and do not map it onto the active submission. Ensure this change is
applied around resolve_command_seq, the command_started / prompt_ready branches,
and any logic that clears _pending_command_seq/_pending_command_text.
src/aish/terminal/pty/bash_rc_wrapper.sh (1)

178-190: ⚠️ Potential issue | 🟠 Major

history -s appends instead of rewriting, creating duplicate history entries.

The function name __aish_rewrite_last_history_entry() suggests intent to replace the executed command in history, but builtin history -s only appends. Since bash normally adds the command to history during execution, this function creates a second entry for every user command. With HISTCONTROL=ignorespace, duplicates are not prevented—that setting only suppresses commands starting with a space. Every user command will appear twice in history.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/aish/terminal/pty/bash_rc_wrapper.sh` around lines 178 - 190, The
function __aish_rewrite_last_history_entry currently appends a second history
entry via builtin history -s; instead compute the last history entry index
(e.g., from the output of builtin history 1 or history | tail -n1), remove that
entry with builtin history -d <index>, then add the corrected command (builtin
history -s "$__AISH_PENDING_COMMAND_TEXT") so the original entry is replaced
rather than duplicated; ensure you only do this when
$__AISH_PENDING_COMMAND_SOURCE == "user" and $__AISH_PENDING_COMMAND_TEXT is
non-empty as the existing guards require.
src/aish/terminal/pty/command_state.py (1)

533-539: ⚠️ Potential issue | 🟠 Major

Placeholders created from backend metadata should stay backend-sourced.

If a backend event arrives with only submission_id and no matching submission exists, this path synthesizes source="user". That lets a backend command participate in user-only behaviors like error correction and last-result semantics, which breaks the user/backend separation this PR is trying to preserve.

Suggested fix
-        source = "backend" if command_seq is not None else "user"
+        if submission_id is not None and command_seq is None:
+            self._record_protocol_issue(
+                f"backend event missing matching submission for submission_id={submission_id}"
+            )
+        source = "backend" if (submission_id is not None or command_seq is not None) else "user"
         return self._store_submission(
             command=(command or self._last_command).strip(),
             source=source,
             command_seq=command_seq,
             submission_id=submission_id,

As per coding guidelines, focus on terminal, PTY, subprocess, timeout, cancellation, signal handling, state-machine consistency, error propagation, resource cleanup, and whether the changes preserve existing CLI and plan mode semantics.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/aish/terminal/pty/command_state.py` around lines 533 - 539, The current
logic sets source="user" when command_seq is None, which incorrectly tags
submissions created from backend metadata (e.g. when only submission_id is
present) as user-sourced; change the source calculation to treat any
backend-supplied identifiers as backend-originated (e.g. set source = "backend"
if command_seq is not None or submission_id is not None, otherwise "user") so
that calls to _store_submission(...) preserve backend sourcing and avoid letting
backend placeholders participate in user-only behaviors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/aish/terminal/pty/command_state.py`:
- Around line 112-134: pending_submissions currently only collects from
_active_submission, _active_started_submission, and _submitted_by_seq, so queued
submissions stored in _submitted_by_id (id-only/unsequenced submissions) are
ignored and can be dropped; update the pending_submissions method to also
iterate over self._submitted_by_id.values() (after handling _submitted_by_seq)
and append any submission whose id is not in the seen set, using the same
deduplication logic (id(submission) and seen set) so id-only queued submissions
are included in the returned tuple and state-machine consistency is preserved.
- Around line 191-211: resolve_command_seq currently returns
_active_started_submission or _active_submission unconditionally, which can
mis-stamp seq when multiple queued submissions exist; change it to consult the
same uniqueness logic as _resolve_submission: try _coerce_command_seq and
_coerce_submission_id as now, but instead of falling back to active_submission
directly, call self._resolve_submission(...) (or replicate its uniqueness check)
to obtain a single unambiguous pending submission and only return its
command_seq when that call returns a concrete submission; if _resolve_submission
indicates ambiguity or None, return None (do not use
_active_started_submission/_active_submission in ambiguous cases). Ensure you
reference resolve_command_seq, _resolve_submission,
_active_started_submission/_active_submission, and _submitted_by_id when
locating code to modify.

---

Outside diff comments:
In `@src/aish/shell/runtime/app.py`:
- Around line 1751-1778: When resolve_command_seq(...) returns None we must not
assume it means "no identifiers"; change the fallback logic in the
command_started and prompt_ready handling so you only use
self._pending_command_seq (and thus allow advancing/clearing
_pending_command_seq/_pending_command_text) when the event payload contains
neither "command_seq" nor "submission_id". Concretely, in the branches that set
command_seq = resolved_command_seq; if command_seq is None then inspect
event.payload.get("command_seq") and event.payload.get("submission_id") and only
assign command_seq = self._pending_command_seq when both are None; otherwise
treat the event as unmatched and do not map it onto the active submission.
Ensure this change is applied around resolve_command_seq, the command_started /
prompt_ready branches, and any logic that clears
_pending_command_seq/_pending_command_text.

In `@src/aish/terminal/pty/bash_rc_wrapper.sh`:
- Around line 178-190: The function __aish_rewrite_last_history_entry currently
appends a second history entry via builtin history -s; instead compute the last
history entry index (e.g., from the output of builtin history 1 or history |
tail -n1), remove that entry with builtin history -d <index>, then add the
corrected command (builtin history -s "$__AISH_PENDING_COMMAND_TEXT") so the
original entry is replaced rather than duplicated; ensure you only do this when
$__AISH_PENDING_COMMAND_SOURCE == "user" and $__AISH_PENDING_COMMAND_TEXT is
non-empty as the existing guards require.

In `@src/aish/terminal/pty/command_state.py`:
- Around line 533-539: The current logic sets source="user" when command_seq is
None, which incorrectly tags submissions created from backend metadata (e.g.
when only submission_id is present) as user-sourced; change the source
calculation to treat any backend-supplied identifiers as backend-originated
(e.g. set source = "backend" if command_seq is not None or submission_id is not
None, otherwise "user") so that calls to _store_submission(...) preserve backend
sourcing and avoid letting backend placeholders participate in user-only
behaviors.

In `@src/aish/terminal/pty/manager.py`:
- Around line 200-239: The readiness drain in _wait_ready is dropping bytes read
from _master_fd before any output callback or OutputProcessor is attached;
change it to buffer those bytes into a startup stash (e.g.,
self._startup_buffer) instead of discarding them, preserving control-channel
handling via _control_fd and _dispatch_control_chunk as-is; when the first real
reader/output thread starts (e.g., in _output_loop start or on first read path),
replay and flush the buffered bytes into the normal output handling pipeline (or
feed them to the OutputProcessor/callback) under a short lock to be thread-safe,
then clear the stash so subsequent reads are normal. Ensure the buffer is
created and guarded (e.g., threading.Lock) and that control events continue to
trigger session_ready/prompt_ready logic unmodified.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: cf56eba6-0d23-4983-859f-1574606c7ff3

📥 Commits

Reviewing files that changed from the base of the PR and between e2429a9 and e75b5d3.

📒 Files selected for processing (9)
  • src/aish/shell/runtime/app.py
  • src/aish/shell/runtime/output.py
  • src/aish/terminal/pty/bash_rc_wrapper.sh
  • src/aish/terminal/pty/command_state.py
  • src/aish/terminal/pty/manager.py
  • tests/shell/runtime/test_shell_pty_core.py
  • tests/terminal/pty/test_exit_tracker_repeat_error.py
  • tests/terminal/pty/test_pty_control_protocol.py
  • tests/test_exit_tracker_repeat_error.py

Comment thread src/aish/terminal/pty/command_state.py
Comment thread src/aish/terminal/pty/command_state.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/aish/terminal/pty/manager.py (1)

200-239: ⚠️ Potential issue | 🟠 Major

Do not discard PTY startup bytes while waiting for readiness.

_wait_ready() now reads from self._master_fd and throws those bytes away before the caller ever sees the PTY. That drops anything emitted by shell init files or the initial prompt in poll mode, so start() changes observable shell behavior even though readiness is already determined from control events.

Suggested direction
-            read_fds = [self._master_fd]
-            if self._control_fd is not None:
-                read_fds.append(self._control_fd)
+            if self._control_fd is None:
+                break
+            read_fds = [self._control_fd]
...
-                if not data:
-                    continue
-                if fd == self._control_fd:
+                if not data:
+                    continue
+                if fd == self._control_fd:
                     events = self._dispatch_control_chunk(data)
                     for event in events:
                         ...

As per coding guidelines, src/**/*.py: Focus on terminal, PTY, subprocess, timeout, cancellation, signal handling, state-machine consistency, error propagation, resource cleanup, and whether the changes preserve existing CLI and plan mode semantics.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/aish/terminal/pty/manager.py` around lines 200 - 239, The _wait_ready
method currently reads from self._master_fd and throws those bytes away; change
it to buffer, not discard, startup bytes so callers still see shell init output:
when reading data from fd inside _wait_ready (the code using os.read on
self._master_fd), append the bytes into an internal buffer (e.g.
self._startup_buffer or self._pending_output) instead of dropping them; keep
dispatching control events via _dispatch_control_chunk for control fd reads as
before; ensure the output consumer paths (the _output_loop and any public read
methods) first drain that startup buffer before reading directly from
self._master_fd so semantics are preserved. Ensure the new buffer is thread-safe
(use existing locks or add one) and cleared once consumed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/aish/terminal/pty/manager.py`:
- Around line 200-239: The _wait_ready method currently reads from
self._master_fd and throws those bytes away; change it to buffer, not discard,
startup bytes so callers still see shell init output: when reading data from fd
inside _wait_ready (the code using os.read on self._master_fd), append the bytes
into an internal buffer (e.g. self._startup_buffer or self._pending_output)
instead of dropping them; keep dispatching control events via
_dispatch_control_chunk for control fd reads as before; ensure the output
consumer paths (the _output_loop and any public read methods) first drain that
startup buffer before reading directly from self._master_fd so semantics are
preserved. Ensure the new buffer is thread-safe (use existing locks or add one)
and cleared once consumed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 4ed8075f-d162-4f3d-8e5c-0fa83ad35023

📥 Commits

Reviewing files that changed from the base of the PR and between e75b5d3 and 144b7d9.

📒 Files selected for processing (2)
  • src/aish/terminal/pty/manager.py
  • tests/terminal/pty/test_pty_control_protocol.py

Comment thread src/aish/terminal/pty/manager.py
Handle leading blank lines when stripping echoed PTY commands so execute_command output remains clean across CI and shell startup variations.

Make bash history rewrite idempotent for normal user commands and preserve prior history entries for history expansion commands like exit.

Validation: python -m pytest tests/terminal/pty/test_pty_control_protocol.py tests/shell/runtime/test_shell_pty_core.py -q; make test
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/aish/terminal/pty/bash_rc_wrapper.sh`:
- Line 85: The read invocation uses an unquoted file-descriptor variable; update
the line using IFS read -r -t 0.001 metadata_line
<&"${__AISH_COMMAND_METADATA_FD}" || return 1 so the FD variable
__AISH_COMMAND_METADATA_FD is quoted to prevent word splitting—modify the
existing IFS read line accordingly (keep the same flags and return behavior).

In `@src/aish/terminal/pty/manager.py`:
- Around line 304-331: The getattr(self, "_lock", None) fallback creates an
unguarded branch; remove the defensive None-check and always use the instance
RLock (self._lock) around the critical section. Consolidate the duplicate logic
so you call self._command_state.register_command(...),
self._write_command_metadata(...), and self.send(...) inside a single with
self._lock: block (or, if the fallback is intentionally required for partial
construction/testing, replace the fallback with an explicit comment explaining
that edge case and why it must skip locking). Ensure references remain to
self._lock, _command_state.register_command, _write_command_metadata, and send
so the behavior and intent are clear.

In `@tests/terminal/pty/test_pty_control_protocol.py`:
- Around line 13-36: The helper _wait_for_prompt_ready should not silently
return partial output on timeout; modify it to either return a success indicator
or raise on timeout—prefer raising a TimeoutError when the expected prompt_ready
with matching command_seq was not seen. Update _wait_for_prompt_ready to track
saw_prompt and, if the loop exits without saw_prompt true, raise TimeoutError
including the command_seq and any collected output to make failures diagnostic;
keep existing logic that calls manager.decode_control_events and
manager.handle_backend_event so existing behavior and manager.last_command
checks remain valid.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 6886662a-8d68-41a8-82fa-ea1b329fbe37

📥 Commits

Reviewing files that changed from the base of the PR and between 144b7d9 and c9d8e81.

📒 Files selected for processing (3)
  • src/aish/terminal/pty/bash_rc_wrapper.sh
  • src/aish/terminal/pty/manager.py
  • tests/terminal/pty/test_pty_control_protocol.py

Comment thread src/aish/terminal/pty/bash_rc_wrapper.sh Outdated
Comment thread src/aish/terminal/pty/manager.py
Comment thread tests/terminal/pty/test_pty_control_protocol.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/aish/terminal/pty/command_state.py`:
- Around line 379-396: The problem is that _resolve_submission and
_bind_submission_seq silently accept a submission when submission_id and
command_seq disagree, allowing events to be applied to the wrong
CommandSubmission; change behavior so a mismatch is treated as protocol
corruption: in _resolve_submission (and callers that pass both submission_id and
command_seq) verify that if both identifiers are present they map to the same
CommandSubmission and if they disagree return None / reject the event (and log
an error/metric) instead of falling back; in _bind_submission_seq do not
steal/rewire a seq onto a submission when the submission already has a different
command_seq—detect the conflict, log and refuse to rebind, and ensure callers
handle the None/invalid result to drop the backend event. Ensure checks
reference _resolve_submission, _bind_submission_seq,
CommandSubmission.command_seq and the _submitted_by_seq map.

In `@tests/terminal/pty/test_pty_control_protocol.py`:
- Around line 189-193: The test currently asserts on PTY output immediately
after _wait_for_prompt_ready(manager, 11) but that only confirms the control
event; drain the PTY master FD first to ensure the printf bytes have been read.
After calling _wait_for_prompt_ready(manager, 11), read and drain
manager._master_fd (e.g., perform a nonblocking read loop or use existing helper
that consumes master FD until empty) and append that data to the variable used
for output before performing assert b"ready" in output and checking
manager.last_command; update references in the test around
_wait_for_prompt_ready and manager to use the drained data for the assertion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: bc9a1943-2901-44a2-841f-c2f20045027b

📥 Commits

Reviewing files that changed from the base of the PR and between c9d8e81 and bf0eb2a.

📒 Files selected for processing (7)
  • src/aish/shell/runtime/app.py
  • src/aish/terminal/pty/bash_rc_wrapper.sh
  • src/aish/terminal/pty/command_state.py
  • src/aish/terminal/pty/manager.py
  • tests/shell/runtime/test_shell_pty_core.py
  • tests/terminal/pty/test_exit_tracker_repeat_error.py
  • tests/terminal/pty/test_pty_control_protocol.py
✅ Files skipped from review due to trivial changes (1)
  • src/aish/terminal/pty/manager.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/terminal/pty/test_exit_tracker_repeat_error.py
  • src/aish/shell/runtime/app.py
  • tests/shell/runtime/test_shell_pty_core.py
  • src/aish/terminal/pty/bash_rc_wrapper.sh

Comment thread src/aish/terminal/pty/command_state.py
Comment thread tests/terminal/pty/test_pty_control_protocol.py
@F16shen F16shen merged commit 16e38df into AI-Shell-Team:main Apr 22, 2026
10 checks passed
@F16shen F16shen deleted the split/pty-command-metadata-leak branch April 22, 2026 06:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant