Skip to content

Add comprehensive TDD implementation plan with 4-layer HMAS architecture#1

Merged
mvillmow merged 4 commits intomainfrom
claude/plan-architect-orchestrator-01HfTcHYACkRCX5gX5L5uMxg
Nov 17, 2025
Merged

Add comprehensive TDD implementation plan with 4-layer HMAS architecture#1
mvillmow merged 4 commits intomainfrom
claude/plan-architect-orchestrator-01HfTcHYACkRCX5gX5L5uMxg

Conversation

@mvillmow
Copy link
Copy Markdown
Collaborator

Summary

Complete implementation plan for ProjectKeystone - a high-performance C++20 Hierarchical Multi-Agent System (HMAS). This PR establishes the architectural foundation, development methodology, and detailed roadmap for building the system.

Architecture: 4-Layer Hierarchy

The system uses a 4-layer hierarchical architecture that mirrors the agent development structure:

  • Level 0: Chief Architect Agent (strategic decisions, system-wide coordination)
  • Level 1: Component Lead Agent (component-level architecture, module coordination)
  • Level 2: Module Lead Agent (task decomposition, result synthesis, code review)
  • Level 3: Task Agent (concrete execution, code implementation, testing)

This creates a self-similar organization where the system structure reflects the development process.

Development Approach: TDD with E2E Testing

Test-Driven Development with End-to-End tests as primary validation:

  • Write failing E2E test first (complete workflow)
  • Implement minimal code to pass test
  • Refactor and optimize
  • Incremental complexity: 2 agents → 3 layers → 4 layers → full system
  • Every commit has working E2E tests

Key Documents

Primary Implementation Guides

  1. TDD_FOUR_LAYER_ROADMAP.md - Complete 14-week TDD roadmap with E2E test scenarios
  2. FOUR_LAYER_ARCHITECTURE.md - Complete 4-layer architecture specification
  3. README.md - Overview, navigation, and quick start

Supporting Documentation

  1. modules.md - C++20 module structure (Keystone.Core, Protocol, Agents, Integration)
  2. build-system.md - CMake 3.28+ configuration with C++20 module support
  3. testing-strategy.md - Testing frameworks (GoogleTest, benchmarks, coverage)
  4. risks.md - Risk analysis and mitigation strategies

Implementation Timeline

Total: 14 weeks

  • Phase 1 (Weeks 1-3): L0 + L3 only - Core infrastructure validation
  • Phase 2 (Weeks 4-6): Add L2 (Module Lead) - Task synthesis and retry logic
  • Phase 3 (Weeks 7-9): Add L1 (Component Lead) - Multi-module coordination
  • Phase 4 (Weeks 10-12): Full multi-component system - Parallel execution
  • Phase 5 (Weeks 13-14): Performance optimization and chaos testing

Technology Stack

  • Language: C++20 (modules, coroutines, concepts)
  • Concurrency: Actor Model with concurrentqueue (lock-free)
  • Serialization: Cista (zero-copy internal), Protobuf (external gRPC)
  • AI Integration: ONNX Runtime (local), gRPC clients (remote)
  • Build: CMake 3.28+, vcpkg
  • Testing: GoogleTest, Google Benchmark

Performance Targets

  • Throughput: >1M messages/second
  • Latency: <1ms internal (p99), <10ms external (p99)
  • Scalability: Linear to 100+ concurrent agents
  • Test Coverage: >95% core, >90% agents

What's Next

Start implementation following TDD_FOUR_LAYER_ROADMAP.md:

  1. Week 1: Set up E2E test infrastructure
  2. Week 2-3: Implement L0 + L3 with basic message passing
  3. Week 4-6: Add L2 for task synthesis
  4. Continue through phases to full system

Commits Included

  • ee86a48 - Initial comprehensive implementation plan (7 documents)
  • f1f297d - Add TDD-first approach with two-agent initial model
  • 06fb634 - Update to 4-layer architecture matching dev agent structure
  • 5c38806 - Remove legacy 3-layer architecture documents

Ready for review and merge to begin implementation! 🚀

Add complete implementation plan for the high-performance C++20
Hierarchical Multi-Agent System (HMAS) including:

- Overall implementation roadmap (README.md)
- System architecture and design patterns (architecture.md)
- Detailed 8-phase implementation timeline (phases.md)
- C++20 module structure and dependencies (modules.md)
- Build system configuration with CMake 3.28+ (build-system.md)
- Comprehensive testing strategy with TDD approach (testing-strategy.md)
- Risk analysis and mitigation strategies (risks.md)

Key Features:
- 3-layer agent hierarchy (Root/Branch/Leaf)
- Actor-model with C++20 coroutines
- Lock-free message passing via concurrentqueue
- Zero-copy serialization (Cista) internally
- gRPC/Protobuf for external AI services
- 20-week implementation timeline
- >95% test coverage targets

The plan emphasizes modern C++20 features including modules,
coroutines, and advanced concurrency primitives while maintaining
high performance (1M+ msg/sec throughput, <1ms latency).
Major update to implementation strategy:

1. TDD with E2E Testing (TDD_APPROACH.md):
   - E2E tests as primary validation method (not unit tests first)
   - Red-Green-Refactor cycle with complete workflows
   - Write failing E2E test → implement → refactor
   - Every phase starts with E2E test scenario

2. Two-Agent Initial Model (TWO_AGENT_ARCHITECTURE.md):
   - Start simple: Coordinator + Worker (not 3 layers)
   - Validate ALL core infrastructure before scaling
   - Incremental path: 2 agents → multiple workers → 3 layers
   - Message passing, coroutines, state machines proven early

3. Revised Roadmap (README.md):
   - Phase 0: E2E test infrastructure (Week 1)
   - Phase 1: Message passing (Weeks 2-3)
   - Phase 2: Coordinator-Worker pattern (Weeks 4-5)
   - Phase 3: Performance validation (Week 6)
   - Phase 4: External AI integration (Weeks 7-8)
   - Phase 5: Expand to 3-layer hierarchy (Weeks 9-11)
   - Phase 6: Production hardening (Weeks 12-14)

Key Benefits:
✅ Fast validation - working system in 5 weeks
✅ Reduced risk - complexity added incrementally
✅ Always executable - every commit has passing E2E tests
✅ Performance proven early - benchmarks from Week 6
✅ Easier debugging - simple 2-agent system initially

This approach prioritizes "working software over comprehensive
documentation" while maintaining rigorous TDD discipline.
Major architecture update from 3-layer to 4-layer hierarchy:

Architecture Change:
OLD (3 layers):
  L1: Root Agent → L2: Branch Agent → L3: Leaf Agent

NEW (4 layers):
  L0: Chief Architect Agent (strategic decisions)
  L1: Component Lead Agent (component coordination)
  L2: Module Lead Agent (module synthesis)
  L3: Task Agent (concrete execution)

Rationale:
- Mirrors the agent development structure used to build the system
- Clear separation: strategic (L0) → tactical (L1) → operational (L2) → execution (L3)
- Original "Branch" layer was doing 2 jobs (now split into L1 + L2)
- Self-similar architecture (system structure = development structure)

New Documents:
1. FOUR_LAYER_ARCHITECTURE.md
   - Complete 4-layer specification
   - Agent type definitions for each layer
   - Message flow examples
   - Comparison to original 3-layer plan

2. TDD_FOUR_LAYER_ROADMAP.md
   - Incremental build path: 2 agents → 3 layers → 4 layers
   - E2E tests for each phase
   - Phase 1 (Weeks 1-3): L0 + L3 only
   - Phase 2 (Weeks 4-6): Add L2 (Module Lead)
   - Phase 3 (Weeks 7-9): Add L1 (Component Lead)
   - Phase 4 (Weeks 10-12): Full multi-component system
   - Phase 5 (Weeks 13-14): Performance & chaos testing

Updated README.md:
- Points to 4-layer as primary architecture
- Updated documentation structure with new docs

Timeline: 14 weeks (vs original 20 weeks)
Agents: Up to 100+ agents (4 components × 5 modules × 5 tasks)

This architecture provides better scalability, clearer separation of
concerns, and more intuitive development workflow.
Clean up legacy documentation as requested:

Removed:
- architecture.md (3-layer hierarchy - superseded by 4-layer)
- phases.md (original unit-test-first phases - superseded by TDD roadmap)
- TDD_APPROACH.md (general TDD methodology - merged into TDD_FOUR_LAYER_ROADMAP.md)
- TWO_AGENT_ARCHITECTURE.md (redundant - covered in TDD_FOUR_LAYER_ROADMAP.md Phase 1)

Current Documentation:
1. TDD_FOUR_LAYER_ROADMAP.md - Primary implementation guide
2. FOUR_LAYER_ARCHITECTURE.md - System architecture specification
3. README.md - Overview and navigation
4. modules.md - C++20 module structure
5. build-system.md - CMake and toolchain
6. testing-strategy.md - Testing frameworks
7. risks.md - Risk analysis

Benefits:
✅ No legacy/outdated documentation
✅ Single source of truth for architecture (4-layer)
✅ Single source of truth for implementation approach (TDD with E2E)
✅ Clearer navigation for developers

The codebase now contains only the current 4-layer architecture plan
with TDD/E2E methodology.
@mvillmow mvillmow merged commit 97a0e09 into main Nov 17, 2025
@mvillmow mvillmow deleted the claude/plan-architect-orchestrator-01HfTcHYACkRCX5gX5L5uMxg branch November 19, 2025 23:48
mvillmow added a commit that referenced this pull request Nov 25, 2025
BUILD_DIR was referenced on lines 73, 91, 112, and 119 but never defined.
Now defaults to build/release/bin to match Phase 3 CMake output structure.

Fixes minor issue #1 from code review

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
mvillmow added a commit that referenced this pull request Nov 25, 2025
BUILD_DIR was referenced on lines 73, 91, 112, and 119 but never defined.
Now defaults to build/release/bin to match Phase 3 CMake output structure.

Fixes minor issue #1 from code review

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
mvillmow added a commit that referenced this pull request Nov 25, 2025
BUILD_DIR was referenced on lines 73, 91, 112, and 119 but never defined.
Now defaults to build/release/bin to match Phase 3 CMake output structure.

Fixes minor issue #1 from code review

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
mvillmow added a commit that referenced this pull request Nov 27, 2025
This PR fixes 9 security vulnerabilities identified during comprehensive code review:

## CRITICAL Vulnerabilities Fixed

1. **Use-after-free in ProfilingSession** (profiling.cpp:83-121)
   - Root cause: Map pointer captured before lock release, invalidated by rehashing
   - Fix: Hold shared_lock during entire section access to prevent map modifications
   - Impact: Prevents crash in multi-threaded profiling scenarios
   - File: src/core/profiling.cpp

2. **Integer overflow in LeadAgentBase** (lead_agent_base_impl.hpp:215-225)
   - Root cause: size_t to int cast without bounds checking (size_t can be 2^64-1, int max is 2^31-1)
   - Fix: Check subtasks.size() > INT_MAX before cast, return error if exceeded
   - Impact: Prevents overflow causing negative coordination count
   - File: include/agents/lead_agent_base_impl.hpp

3. **Null pointer dereference + TOCTOU race in PullOrSteal** (pull_or_steal.cpp:43-65)
   - Root cause: queues.size() captured before loop, vector could shrink; null pointers not checked
   - Fix: Capture num_workers once, add combined bounds+null check before access
   - Impact: Prevents segfault in work-stealing scheduler
   - File: src/concurrency/pull_or_steal.cpp

## HIGH Priority Vulnerabilities Fixed

4. **Agent ID space exhaustion** (agent_id_interning.cpp:30-36)
   - Root cause: uint32_t wraps to 0 after 4,294,967,295, causing ID collisions
   - Fix: Check next_id_ == UINT32_MAX before increment, throw overflow_error
   - Impact: Prevents silent ID collision catastrophic failure
   - File: src/core/agent_id_interning.cpp

5. **Configuration validation missing** (config.hpp:147-160)
   - Root cause: Float percentage truncated to size_t without validation
   - Fix: Add static_assert compile-time checks for watermark percentage and result
   - Impact: Catches invalid configuration at compile time
   - File: include/core/config.hpp

## MEDIUM Priority Vulnerabilities Fixed

6. **Memory cleanup failure** (metrics.cpp:46-70)
   - Root cause: Time-based cleanup might not remove enough entries under flood
   - Fix: Add forced removal of oldest entries if time-based cleanup insufficient
   - Impact: Prevents unbounded memory growth under message flood
   - File: src/core/metrics.cpp

7. **Modulo by zero** (pull_or_steal.cpp:43-49)
   - Root cause: num_workers could be 0, causing (worker_index + i) % 0
   - Fix: Check num_workers == 0, return nullopt early
   - Impact: Prevents FPE crash (fixed with CRITICAL #3)
   - File: src/concurrency/pull_or_steal.cpp

## Testing

- Added comprehensive security regression test suite (test_security_regression.cpp)
- 10 new test cases covering all 7 vulnerabilities
- Stress tests: 100 threads × 100 sections × 10 records (ProfilingSession)
- Memory flood test: 20,000 messages (2x limit) for metrics cleanup
- All 476 tests pass with AddressSanitizer + UndefinedBehaviorSanitizer
- Test runtime: 66.33 seconds

## Files Changed

- CMakeLists.txt: Added security regression test file
- src/core/profiling.cpp: Fixed use-after-free with shared_lock
- include/agents/lead_agent_base_impl.hpp: Added integer overflow check
- src/concurrency/pull_or_steal.cpp: Fixed null pointer + TOCTOU + modulo by zero
- src/core/agent_id_interning.cpp: Added agent ID overflow check
- include/core/config.hpp: Added compile-time validation with static_assert
- src/core/metrics.cpp: Added forced cleanup for memory leak prevention
- tests/unit/test_security_regression.cpp: NEW - Comprehensive regression tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
mvillmow added a commit that referenced this pull request Nov 27, 2025
This PR fixes 9 security vulnerabilities identified during comprehensive code review:

## CRITICAL Vulnerabilities Fixed

1. **Use-after-free in ProfilingSession** (profiling.cpp:83-121)
   - Root cause: Map pointer captured before lock release, invalidated by rehashing
   - Fix: Hold shared_lock during entire section access to prevent map modifications
   - Impact: Prevents crash in multi-threaded profiling scenarios
   - File: src/core/profiling.cpp

2. **Integer overflow in LeadAgentBase** (lead_agent_base_impl.hpp:215-225)
   - Root cause: size_t to int cast without bounds checking (size_t can be 2^64-1, int max is 2^31-1)
   - Fix: Check subtasks.size() > INT_MAX before cast, return error if exceeded
   - Impact: Prevents overflow causing negative coordination count
   - File: include/agents/lead_agent_base_impl.hpp

3. **Null pointer dereference + TOCTOU race in PullOrSteal** (pull_or_steal.cpp:43-65)
   - Root cause: queues.size() captured before loop, vector could shrink; null pointers not checked
   - Fix: Capture num_workers once, add combined bounds+null check before access
   - Impact: Prevents segfault in work-stealing scheduler
   - File: src/concurrency/pull_or_steal.cpp

## HIGH Priority Vulnerabilities Fixed

4. **Agent ID space exhaustion** (agent_id_interning.cpp:30-36)
   - Root cause: uint32_t wraps to 0 after 4,294,967,295, causing ID collisions
   - Fix: Check next_id_ == UINT32_MAX before increment, throw overflow_error
   - Impact: Prevents silent ID collision catastrophic failure
   - File: src/core/agent_id_interning.cpp

5. **Configuration validation missing** (config.hpp:147-160)
   - Root cause: Float percentage truncated to size_t without validation
   - Fix: Add static_assert compile-time checks for watermark percentage and result
   - Impact: Catches invalid configuration at compile time
   - File: include/core/config.hpp

## MEDIUM Priority Vulnerabilities Fixed

6. **Memory cleanup failure** (metrics.cpp:46-70)
   - Root cause: Time-based cleanup might not remove enough entries under flood
   - Fix: Add forced removal of oldest entries if time-based cleanup insufficient
   - Impact: Prevents unbounded memory growth under message flood
   - File: src/core/metrics.cpp

7. **Modulo by zero** (pull_or_steal.cpp:43-49)
   - Root cause: num_workers could be 0, causing (worker_index + i) % 0
   - Fix: Check num_workers == 0, return nullopt early
   - Impact: Prevents FPE crash (fixed with CRITICAL #3)
   - File: src/concurrency/pull_or_steal.cpp

## Testing

- Added comprehensive security regression test suite (test_security_regression.cpp)
- 10 new test cases covering all 7 vulnerabilities
- Stress tests: 100 threads × 100 sections × 10 records (ProfilingSession)
- Memory flood test: 20,000 messages (2x limit) for metrics cleanup
- All 476 tests pass with AddressSanitizer + UndefinedBehaviorSanitizer
- Test runtime: 66.33 seconds

## Files Changed

- CMakeLists.txt: Added security regression test file
- src/core/profiling.cpp: Fixed use-after-free with shared_lock
- include/agents/lead_agent_base_impl.hpp: Added integer overflow check
- src/concurrency/pull_or_steal.cpp: Fixed null pointer + TOCTOU + modulo by zero
- src/core/agent_id_interning.cpp: Added agent ID overflow check
- include/core/config.hpp: Added compile-time validation with static_assert
- src/core/metrics.cpp: Added forced cleanup for memory leak prevention
- tests/unit/test_security_regression.cpp: NEW - Comprehensive regression tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants