Skip to content

feat(agents): handle TASK_FAILED events to prevent DAG deadlocks#182

Merged
mvillmow merged 5 commits intomainfrom
87-auto-impl
Apr 24, 2026
Merged

feat(agents): handle TASK_FAILED events to prevent DAG deadlocks#182
mvillmow merged 5 commits intomainfrom
87-auto-impl

Conversation

@mvillmow
Copy link
Copy Markdown
Collaborator

Summary

  • Adds ActionType::TASK_FAILED to the KIM protocol so subordinate agents can explicitly signal failure to their parent
  • CoordinationState now tracks failures separately via recordFailure() — failures count toward the all-received threshold so deadlocked WAITING states are impossible
  • LeadAgentBase::processMessage() dispatches TASK_FAILED messages to a new processSubordinateFailure() hook (with default implementation that transitions to ERROR)
  • TaskAgent sets action_type = TASK_FAILED instead of plain response when an exception occurs during command execution
  • synthesizeResults() / synthesizeComponentResult() return descriptive error messages when in ERROR state

Test plan

  • 7 new CoordinationState failure-tracking tests (Category 7 in test_coordination_state.cpp)
  • 4 new ModuleLeadAgent DAG-deadlock-prevention tests in test_module_lead_agent.cpp
  • All changed headers pass -fsyntax-only with no errors
  • Full test suite blocked by pre-existing spdlog/fmt/fmt.h not found — this is fixed on main (commit fix(cmake): split combined target_link_libraries and fix spdlog visibility) but not yet merged into this branch

Closes #87

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 23, 2026

Security Scan Results

  • ❌ Secret Scanning: Potential secrets found
  • ✅ SAST: Completed (check Security tab for details)
  • ✅ Dependency Scanning: Completed
  • ✅ C++ Static Analysis: Completed
  • ⚠️ Docker Image Scanning: No results

Recommendations

  • Review findings in the GitHub Security tab
  • Check artifact uploads for detailed reports
  • Address critical Docker vulnerabilities immediately

Workflow: Security Scanning

@mvillmow mvillmow enabled auto-merge (rebase) April 23, 2026 14:17
@mvillmow mvillmow force-pushed the 87-auto-impl branch 2 times, most recently from 5ad6b7f to 0fb3ed2 Compare April 24, 2026 13:49
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 24, 2026

✅ Dependency Audit

Severity Count
Critical 0
High 0
Medium 0
Low 0

See the Security tab for detailed findings.


Workflow: Dependency Audit

mvillmow and others added 5 commits April 24, 2026 15:40
When a subordinate task fails, the parent Lead Agent previously had no
mechanism to record the failure as a terminal event. Pending downstream
tasks would remain in WAITING state forever — the DAG deadlocked with
no recovery path.

Changes:
- Add ActionType::TASK_FAILED to KIM protocol (core/message.hpp)
- Add recordFailure(), hasFailures(), getFailureCount(),
  getFailureMessages() to CoordinationState — failures count toward
  the total-received threshold, so isComplete() fires correctly
- Add processSubordinateFailure() virtual hook to LeadAgentBase with
  default implementation that records the failure and transitions to
  ERROR state when all results (success + failure) are received
- Dispatch ActionType::TASK_FAILED messages in LeadAgentBase::processMessage()
  before the normal result path (prevents false WAITING stall)
- TaskAgent now sends action_type = TASK_FAILED instead of a plain
  "response" message on exception, so parent Lead Agents receive the
  correct signal
- ModuleLeadAgent::synthesizeResults() and
  ComponentLeadAgent::synthesizeComponentResult() return descriptive
  error strings when current state is ERROR
- 11 new unit tests: 7 CoordinationState failure-tracking tests,
  4 ModuleLeadAgent DAG-deadlock-prevention tests

Closes #87

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rewrite multi-line lambda in emplace_back to single-line form that
clang-format-18 (CI) and clang-format-22 (local) both accept.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@mvillmow mvillmow merged commit 76ea01e into main Apr 24, 2026
22 of 23 checks passed
@mvillmow mvillmow deleted the 87-auto-impl branch April 24, 2026 22:48
@github-actions
Copy link
Copy Markdown

CI Summary

Check Status
Code Quality ✅ success
Python Tests ✅ success
Python Quality (mypy) ✅ success
Sanitizers (ASan, UBSan, TSan, LSan, MSan) ✅ success
Benchmarks ✅ success
Coverage ✅ success

View full run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle task.failed events to prevent DAG deadlocks

1 participant