fix: final answer rendering and timing leaks by F16shen · Pull Request #129 · AI-Shell-Team/aish

F16shen · 2026-04-23T03:29:21Z

Summary

Problem: tool-call preview content and nested diagnose events could leak into the shell's final-answer path, causing summaries to render as grey streamed text and printing misleading 思考: Xs lines.
Changes: only mark shell content as streamed for final content deltas, only record TTFT on final content, and restrict SystemDiagnoseAgent event forwarding to tool/error/cancel/interaction events.
Related Issue: #

Change Type

Scope

User-visible Changes

Final answers after tool-call flows no longer get treated as grey streamed preview text.
Misleading 思考: Xs summaries no longer appear when a turn only emitted non-final preview content.
system_diagnose_agent no longer leaks nested lifecycle/content events into the outer shell.

Compatibility

Backward compatible? Yes
Config changes? No

Testing

/home/lixin/workspace/aishell/aish/.venv/bin/python -m pytest tests/shell/runtime/test_shell_pty_core.py -k "content_streamed or ttft_timing_records_on_first_content_delta or ttft_timing_preserves_state_across_generations"
/home/lixin/workspace/aishell/aish/.venv/bin/python -m pytest tests/tools/test_final_answer.py -k "filters_nested_content_and_lifecycle_events or mocked_llm_with_final_answer_after_bash or end_to_end_nested_agent_execution"
/home/lixin/workspace/aishell/aish/.venv/bin/python -m pytest tests/shell/runtime/test_shell_pty_core.py -k "ttft_timing_records_on_first_content_delta or non_final_content_delta_does_not_record_ttft or op_end_does_not_render_ttft_for_non_final_preview_only or final_content_delta_marks_content_streamed"

Checklist

Code follows project style
Tests added if needed
Documentation updated if needed

Summary by CodeRabbit

Performance Improvements
- TTFT now records only on final streamed content for more accurate responsiveness metrics.
Reliability Improvements
- Shell startup uses reported readiness and can inherit backend working directory to avoid unnecessary delays.
- Event forwarding now strictly filters lifecycle events to reduce spurious callbacks.
Robustness
- Improved PTY protocol decoding and error tracking for more resilient terminal sessions.
Tests
- Expanded tests validating startup handshake reuse, streaming/TTFT behavior, and event filtering.

github-actions · 2026-04-23T03:29:33Z

Thanks for the pull request. A maintainer will review it when available.

Please keep the PR focused, explain the why in the description, and make sure local checks pass before requesting review.

Contribution guide: https://github.com/AI-Shell-Team/aish/blob/main/CONTRIBUTING.md

coderabbitai · 2026-04-23T03:29:34Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 70d84ab4-49db-4a09-879d-7769bce8a106

📥 Commits

Reviewing files that changed from the base of the PR and between dc21d3b and 1483815.

📒 Files selected for processing (2)

src/aish/shell/runtime/app.py
tests/shell/runtime/test_shell_pty_core.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/shell/runtime/test_shell_pty_core.py

📝 Walkthrough

Walkthrough

The PR adds PTY startup handshake state exposure in PTYManager, makes PTYAIShell consume those signals (avoiding unconditional sleeps), restricts parent callback forwarding via an LLM event allowlist, and changes TTFT/streaming to record only on final content deltas.

Changes

Cohort / File(s)	Summary
Event Proxy Filtering `src/aish/llm/agents.py`	Adds an explicit allowlist of child `LLMEventType` values forwarded to a parent callback; non-allowlisted events return `LLMCallbackResult.CONTINUE` without invoking the parent.
Content Delta & PTY-driven TTFT `src/aish/shell/runtime/app.py`	`handle_content_delta` reads `event.data["is_final"]` to set `_content_streamed_to_terminal` and record TTFT only on final deltas; `_setup_pty` prefers PTY-provided `startup_*` signals and `startup_cwd` over fixed sleep fallback.
PTY Startup Handshake Tracking `src/aish/terminal/pty/manager.py`	Introduces properties `startup_session_ready`, `startup_prompt_ready`, `startup_ready`, `startup_cwd`; `_wait_ready` initializes from stored values, decodes control events (capturing protocol issues), stores ps2/cwd, and persists observed handshake state.
Tests: handshake, timing, event filtering `tests/shell/runtime/test_shell_pty_core.py`, `tests/terminal/pty/test_pty_control_protocol.py`, `tests/tools/test_final_answer.py`	Adds tests for PTY startup reuse and cwd propagation, final vs non-final delta TTFT behavior, draining/poll-mode readiness assertions, and verifies parent callback only receives allowlisted LLM events from nested sessions.

Sequence Diagram

sequenceDiagram
    participant Shell as PTYAIShell
    participant PTYMgr as PTYManager
    participant Proto as PTY Handshake Protocol

    Shell->>PTYMgr: start()
    activate PTYMgr
    PTYMgr->>PTYMgr: reset startup flags
    PTYMgr->>Proto: initialize PTY
    loop handshake polling
        Proto-->>PTYMgr: control events (session_ready, prompt_ready, cwd, ps2)
        PTYMgr->>PTYMgr: decode events, record readiness, cwd, ps2, protocol issues
    end
    PTYMgr-->>Shell: start() returns with startup_* and startup_cwd set
    deactivate PTYMgr

    Shell->>Shell: _setup_pty()
    Shell->>PTYMgr: query startup_ready, startup_session_ready, startup_cwd
    alt startup signals present
        Shell->>Shell: set backend cwd, _backend_session_ready=True, shell_phase="editing", skip sleep
    else fallback
        Shell->>Shell: perform legacy sleep delay
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Reuse PTY startup handshake during shell startup #128: Modifies PTYManager startup readiness exposure and PTYAIShell._setup_pty to reuse handshake signals (strong code-level overlap).

Poem

🐰
Handshake hummed, the prompt awakes,
I filter hops and gentle takes,
Deltas final, TTFT sings—
No idle sleeps, the shell now springs,
A tiny rabbit thumbs new things.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 35.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title "fix: final answer rendering and timing leaks" directly addresses the main problem solved by this PR: preventing tool-call preview content and nested events from leaking into final-answer rendering and TTFT timing.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

⚔️ Resolve merge conflicts

Resolve merge conflict in branch fix/final-answer-rendering-ttft

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/aish/shell/runtime/app.py`:
- Around line 1130-1136: The branch only sets _backend_session_ready when
_pty_manager.startup_ready is true, which misses the case where
PTYManager.start() consumed session_ready but didn't set startup_ready; change
the condition to set self._backend_session_ready = True if either
self._pty_manager.startup_ready or self._pty_manager.session_ready is truthy
(i.e., check both flags/properties), so replace the existing if-block that sets
_backend_session_ready and _shell_phase (or set _backend_session_ready
independently) to preserve session readiness even when startup_ready is false.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a4869219-4a1a-47dc-9cec-8924aba13564

📥 Commits

Reviewing files that changed from the base of the PR and between e53da97 and dc21d3b.

📒 Files selected for processing (6)

src/aish/llm/agents.py
src/aish/shell/runtime/app.py
src/aish/terminal/pty/manager.py
tests/shell/runtime/test_shell_pty_core.py
tests/terminal/pty/test_pty_control_protocol.py
tests/tools/test_final_answer.py

F16shen added 4 commits April 23, 2026 10:14

Reuse PTY startup handshake

df1416c

Fix PR review issues

26f45dc

Harden startup handshake test

7dcefa4

fix final answer rendering and timing leaks

dc21d3b

github-actions Bot added the tests label Apr 23, 2026

github-actions Bot added size: M experienced-contributor labels Apr 23, 2026

F16shen changed the title ~~Fix final answer rendering and timing leaks~~ fix: final answer rendering and timing leaks Apr 23, 2026

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread src/aish/shell/runtime/app.py

F16shen added 2 commits April 23, 2026 13:13

fix: preserve startup session readiness

1483815

merge main into fix/final-answer-rendering-ttft

fa997be

F16shen merged commit 77638d3 into AI-Shell-Team:main Apr 23, 2026
10 checks passed

F16shen deleted the fix/final-answer-rendering-ttft branch April 23, 2026 06:27

coderabbitai Bot mentioned this pull request Apr 23, 2026

fix: close rust parity gaps for update streaming and timeouts #133

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: final answer rendering and timing leaks#129

fix: final answer rendering and timing leaks#129
F16shen merged 6 commits intoAI-Shell-Team:mainfrom
F16shen:fix/final-answer-rendering-ttft

F16shen commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

F16shen commented Apr 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change Type

Scope

User-visible Changes

Compatibility

Testing

Checklist

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

F16shen commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading