Event driven architecture#115
Conversation
|
@patricka3125 feel free to have a look at this PR too. |
|
@tuanknguyen is this document still correct ? the change wont impact the user experience right ? https://github.com/awslabs/cli-agent-orchestrator/tree/main/examples/assign ? also would you mind fixing the unit testing errors ? |
|
@haofeif that's correct, this does not affect the assign or handoff or send message pattern at all. All unit tests also passed. Separately, we need to consolidate the provider implementation. there's quite a bit of code duplication across the different providers |
| terminal_id = terminal_id_from_topic(event["topic"]) | ||
| log_path = TERMINAL_LOG_DIR / f"{terminal_id}.log" | ||
| with open(log_path, "a") as f: | ||
| f.write(event["data"]["data"]) |
There was a problem hiding this comment.
could consider using asyncio.to_thread here to minimize blocking in event loop
There was a problem hiding this comment.
This is a sync file write inside an async loop, every output will block the event loop briefly
There was a problem hiding this comment.
good catch! I'll update it.
|
|
||
| Returns: | ||
| bool: True if a message was sent, False otherwise | ||
| def deliver_pending(self, terminal_id: str) -> None: |
There was a problem hiding this comment.
would we ever consider delivering all messages at once here? Or perhaps add some way for user to specify message size as a flag?
There was a problem hiding this comment.
sure thing. I'll add number of messages as an optional param default to 1
|
@tuanknguyen thanks for it. also it looks like the current change breaks the handoff function. In my test using
Let me retry the handoff to the Code Reviewer Agent.
Running tool handoff with the param (from mcp server: cao-mcp-server)
⋮ {
⋮ "agent_profile": "reviewer",
⋮ "message": "You are the Code Reviewer Agent. Please read the review task description at the following absolute path and perform a code review:\n\n/Users/haofeif/Amazon-WorkDocs/Code/AIMLAoD/AIAgents/OpenshiftToEKS/cli-agent-orchestrator/cli-agent-orchestrator/tasks/hello-world-review.md\n\nRead the code file referenced in the task and provide your review. End with a clear verdict: APPROVED or NEEDS CHANGES (with specific feedback)."
⋮ }
When testing the Assign examples, similar issues seem to be hanging the handoff session. |
|
@tuanknguyen also i did the e2e test results for 3 providers, i guess let's test the kiro and claude code in your end and see how it goes E2E Test Results —
|
| Provider | main (baseline) |
feat/event-driven-messaging |
Regression |
|---|---|---|---|
| Kiro CLI | 8/8 ✅ | 0/8 ❌ | Yes — COMPLETED never detected |
| Claude Code | 8/8 ✅ | 0/8 ❌ | Yes — IDLE never detected during init |
| Kimi CLI | 8/8 ✅ | 6-7/8 |
Partial — 1 consistent fail, 1 flaky |
Test Location
All E2E tests are in test/e2e/:
| Test File | Test Classes | What It Tests |
|---|---|---|
test_assign.py |
TestKiroCliAssign, TestClaudeCodeAssign, TestKimiCliAssign |
Create worker terminal, send task, verify COMPLETED, extract output |
test_handoff.py |
TestKiroCliHandoff, TestClaudeCodeHandoff, TestKimiCliHandoff |
Create terminal, send task, poll for COMPLETED, extract response |
test_send_message.py |
TestKiroCliSendMessage, TestClaudeCodeSendMessage, TestKimiCliSendMessage |
Create 2 terminals, send inbox message, verify delivery |
test_supervisor_orchestration.py |
TestKiroCliSupervisorOrchestration, TestClaudeCodeSupervisorOrchestration, TestKimiCliSupervisorOrchestration |
Supervisor delegates via |
| handoff/assign MCP tools |
Failure Details
Kiro CLI — 0/8 (all failed)
All 8 tests fail at wait_for_status(terminal_id, "completed"). Terminal creation and init succeed (IDLE detected), message is sent successfully, but COMPLETED is never detected after the agent finishes.
| Test | Error |
|---|---|
| test_assign.py::TestKiroCliAssign::test_assign_data_analyst | Worker did not reach COMPLETED within 180s (provider=kiro_cli) |
| test_assign.py::TestKiroCliAssign::test_assign_report_generator | Worker did not reach COMPLETED within 180s (provider=kiro_cli) |
| test_assign.py::TestKiroCliAssign::test_assign_with_callback | Worker did not reach COMPLETED within 180s (provider=kiro_cli) |
| test_handoff.py::TestKiroCliHandoff::test_handoff_simple_function | Terminal did not reach COMPLETED within 180s (provider=kiro_cli) |
| test_handoff.py::TestKiroCliHandoff::test_handoff_second_task` | `Terminal did not reach COMPLETED within 180s (provider=kiro_cli) |
| test_send_message.py::TestKiroCliSendMessage::test_send_message_to_inbox | Receiver should have transitioned from IDLE after inbox delivery within 60s, got: idle |
| test_supervisor_orchestration.py::TestKiroCliSupervisorOrchestration::test_supervisor_handoff | Supervisor did not reach COMPLETED within 300s. Last status: idle |
| test_supervisor_orchestration.py::TestKiroCliSupervisorOrchestration::test_supervisor_assign_and_handoff | Supervisor did not reach COMPLETED within 300s. Last status: idle |
Possible Root cause: StatusMonitor returns idle instead of completed. Kiro CLI's get_status() needs both a green arrow pattern (response) AND an idle prompt after it to return COMPLETED. The 8KB rolling buffer loses the green arrow as the response grows, leaving only the idle prompt → returns IDLE.
Claude Code — 0/8 (all failed)
All 8 tests fail at create_terminal() — the terminal never initializes.
| Test | Error |
|---|---|
test_assign.py::TestClaudeCodeAssign::test_assign_data_analyst |
Claude Code initialization timed out after 30 seconds |
test_assign.py::TestClaudeCodeAssign::test_assign_report_generator |
Claude Code initialization timed out after 30 seconds |
test_assign.py::TestClaudeCodeAssign::test_assign_with_callback |
Claude Code initialization timed out after 30 seconds |
test_handoff.py::TestClaudeCodeHandoff::test_handoff_simple_function |
Claude Code initialization timed out after 30 seconds |
test_handoff.py::TestClaudeCodeHandoff::test_handoff_second_task |
Claude Code initialization timed out after 30 seconds |
test_send_message.py::TestClaudeCodeSendMessage::test_send_message_to_inbox |
Claude Code initialization timed out after 30 seconds |
test_supervisor_orchestration.py::TestClaudeCodeSupervisorOrchestration::test_supervisor_handoff |
Claude Code initialization timed out after 30 seconds |
test_supervisor_orchestration.py::TestClaudeCodeSupervisorOrchestration::test_supervisor_assign_and_handoff |
Claude Code initialization timed out after 30 seconds |
Root cause: StatusMonitor never detects IDLE during provider.initialize(). The FIFO → EventBus → StatusMonitor pipeline doesn't deliver output fast enough (or at all) for Claude Code's get_status() to match the IDLE prompt pattern within the 30s timeout.
Architectural Possible Root Cause
On main, get_terminal() calls provider.get_status() which reads fresh tmux scrollback (tmux capture-pane) on every poll — full history available.
On feat/event-driven-messaging, get_terminal() calls status_monitor.get_status() which returns a cached status derived from an 8KB rolling buffer fed by the FIFO pipeline.
This causes two problems:
- Buffer truncation: Long agent responses push early patterns (e.g., Kiro CLI's green arrow) out of the 8KB window, breaking COMPLETED detection
- Pipeline timing: The FIFO → EventBus → StatusMonitor pipeline may not deliver output fast enough during provider initialization, breaking IDLE detection for Claude Code
|
@haofeif thanks for flagging the issue. I noticed that the issue was with the regex of the COMPLETE status detection. We now read from the buffer directly which means that there are raw control characters or ANSI escape. It's not due to sequencing or event bus not delivering fast enough. I'll update and push the changes. |
|
Any updates on the progress of this PR? I have also been ecountering issues mentioned in #131 on a main branch build and this seems a promising fix 👀 |
yes this will be continued worked on |
Notes to Reviewers
This PR replaces the watchdog-based polling architecture with an event-driven pub/sub system for terminal output processing, status detection, and inbox message delivery. Terminal output now streams through named FIFOs into an in-process event bus, eliminating filesystem polling and expensive tmux subprocess calls.
event_bus.py,fifo_reader.py,status_monitor.py,log_writer.pybase.py, all 5 providers,terminal.py(utils)time.sleep→await asyncio.sleepthroughout init chaininbox_service.py,terminal_service.py,main.py(lifespan)session_service.py,cleanup_service.py,terminal_service.delete_terminalevent-driven-architecture.md,CODEBASE.md,constants.py,pyproject.tomltest_codex_provider_unit.py,test_gemini_cli_unit.py,test_inbox_service.pyStart here: Read
docs/event-driven-architecture.mdfor the full architecture overview, ASCII data-flow diagram, and component roles before diving into code.Key changes:
PollingObserverwith FIFO readers (named pipes +os.read()) for real-time terminal output streaminginitialize(),wait_for_shell(),wait_until_status())watchdogandaiofilesdependenciesBy submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.