Skip to content

C++ framework: docs, CI integration tests, health agent improvements#425

Merged
kovtcharov merged 21 commits intomainfrom
feature/cpp-framework-updates
Mar 3, 2026
Merged

C++ framework: docs, CI integration tests, health agent improvements#425
kovtcharov merged 21 commits intomainfrom
feature/cpp-framework-updates

Conversation

@kovtcharov
Copy link
Copy Markdown
Collaborator

@kovtcharov kovtcharov commented Mar 2, 2026

Summary

Documentation

  • Rename simple_agent to health_agent across all C++ guide pages
  • Fix architecture diagram: replace clipboard/paste approach with file-write + Notepad open, add GPU check
  • Fix Get-WmiObject to Get-CimInstance (correct PowerShell API)
  • Add CleanConsole and test_tool_integration to project structure listing
  • Update test suite description: six modules to eight
  • Fix doc code snippets to match actual source (PipeCloser, isSafeShellArg, kDiagnosticMenu)
  • Add shell injection prevention section to wifi-agent guide
  • Clarify Lemonade Server as recommended/tested LLM backend
  • Split multi-command code blocks for easy copy-paste
  • Clean up C++ intro page (remove redundant LLM Backend and Integration sections)

CI/CD (build_cpp.yml)

  • Add integration test suite on STX hardware: LLM + MCP + WiFi + Health tests
  • Use Qwen3-4B-Instruct-2507-GGUF model (matches wifi and health agents)
  • Add uvx verification step before test execution
  • Fix MinGW portability: _dupenv_s guards, _WIN32 vs _MSC_VER, _putenv_s

Health Agent (health_agent.cpp)

  • Simplify menu: option 1 is quick console-only summary (4 metrics, no Notepad), option 15 is full diagnostics + Notepad report
  • Replace clipboard/paste with direct file-write + Start-Process notepad approach
  • Use array-of-lines pattern with [Environment]::NewLine for proper newlines
  • Increase context size to 32K for comprehensive diagnostics (12+ tool calls)
  • Remove HTML report generation (unreliable with small LLMs)

Agent Core (agent.cpp)

  • Fix loop detection false positive: compare both tool name AND arguments (was name-only, triggering on consecutive mcp_windows_Shell calls with different args)
  • Reduce tool result truncation from 20K to 4K chars to prevent context overflow

New Files

  • cpp/include/gaia/clean_console.h + cpp/src/clean_console.cpp — polished TUI with ANSI colors and word-wrap
  • cpp/examples/health_agent.cpp — Windows system health agent using MCP
  • cpp/tests/test_clean_console.cpp — CleanConsole unit tests
  • cpp/tests/test_tool_integration.cpp — Tool registry integration tests
  • cpp/tests/integration/test_integration_mcp.cpp — MCP connectivity tests
  • cpp/tests/integration/test_integration_wifi.cpp — WiFi diagnostic tests
  • cpp/tests/integration/test_integration_health.cpp — Health monitoring tests

Test plan

  • All 6 cloud CI jobs pass (ubuntu + windows: mock tests, install test, shared lib)
  • STX integration tests build and run (LLM + MCP + WiFi + Health)
  • health_agent.exe option 1: quick console summary (no Notepad)
  • health_agent.exe option 15: full diagnostics + Notepad report
  • wifi_agent.exe full network diagnostic works
  • Mock tests pass locally: gaia_tests.exe --gtest_color=yes
  • Documentation renders correctly on Mintlify

Karim13014 and others added 2 commits February 27, 2026 10:21
…roadmap

- Split cpp.mdx into landing page, quickstart.mdx, and expanded overview.mdx
- Add Error Handling, Thread Safety, Security Model, Production Deployment,
  and API Quick Reference sections to overview.mdx
- Add Step 7 (Embedding in Your Application) to custom-agent.mdx with
  headless, background thread, and custom OutputHandler patterns
- Hyperlink all Lemonade references to https://lemonade-server.ai
- Add C++ Framework Production Readiness to Q2 2026 roadmap
wifi-agent.mdx:
- Update runShell() snippet to match source (PipeCloser struct)
- Add isSafeShellArg() security validation to ping_host example
- Add "Input Validation" subsection documenting shell injection prevention
- Fix system prompt excerpt to use real section headers (AVAILABLE
  DIAGNOSTIC SEQUENCE, FIXING ISSUES, FINAL ANSWER)
- Replace non-existent mapMenuSelection() with actual inline menu logic

overview.mdx:
- Mark streaming config field as planned/not yet implemented

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added documentation Documentation changes cpp labels Mar 2, 2026
Karim13014 and others added 4 commits March 2, 2026 12:09
…tion guide

- cpp.mdx: Replace inline FetchContent snippet with link to integration
  guide; add LLM Backend section noting Lemonade is tested/recommended
- overview.mdx: Update baseUrl description to recommend Lemonade and
  note other servers are untested

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace getenv with _dupenv_s in types.h and test_integration_llm.cpp
to fix C4996 warnings-as-errors on MSVC. Split multi-command code
blocks in quickstart, wifi-agent, and integration docs so each
command can be copied individually.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create unified tests_integration binary with interactive menu + CLI flags
  (--llm, --mcp, --wifi, --health, --all, --model, --url)
- Add MCP integration tests: connection, tool discovery, reconnect, prompt rebuild
- Add WiFi agent tests: real PowerShell diagnostics (netsh, ipconfig, DNS, connectivity)
- Add Health agent tests: full LLM + MCP + PowerShell stack (memory, CPU, disk)
- Rename test binaries: tests_mock (158 mock) + tests_integration (17 integration)
- Enable GAIA_BUILD_INTEGRATION_TESTS=ON by default
- Increase integration test timeout to 300s for agent tests
…n STX

- Disable integration tests on cloud runners (no Lemonade available)
- Use Qwen3-4B-Instruct-2507-GGUF model (matches wifi/health agent defaults)
- Add uvx verification step for MCP/Health tests
- Increase STX timeout to 20 minutes for broader test coverage
- Update job names and summary labels to reflect full scope

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added the devops DevOps/infrastructure changes label Mar 3, 2026
Karim13014 and others added 8 commits March 2, 2026 16:35
New files referenced by CMakeLists.txt but not previously committed:
- clean_console.h/.cpp: ANSI color TUI with shared gaia::color namespace
- health_agent.cpp: Windows system health agent using MCP
- test_clean_console.cpp: CleanConsole unit tests
- test_tool_integration.cpp: Tool registry integration tests
- Remove simple_agent.cpp (replaced by health_agent)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The loop detector only compared tool names, causing false positives when
the same MCP tool (e.g. mcp_windows_Shell) was called with different
arguments. Now requires both name AND args to match 3 consecutive times
before triggering the loop break.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- test_main.cpp: Use _WIN32 guard instead of _MSC_VER for _putenv_s
  (MinGW on STX defines _WIN32 but not _MSC_VER)
- health_agent.cpp: Replace clipboard+paste Notepad approach with direct
  file write + open. Use array-of-lines pattern to avoid literal \n chars.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add _MSC_VER guards to wifi/mcp/health integration tests for _dupenv_s
  (MinGW uses getenv instead)
- Increase health agent contextSize to 32768 for "Run ALL diagnostics"
- Tighten tool result truncation (4K chars) to prevent context overflow

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- CtxSize 8192 -> 32768 for health agent multi-step tests
- Timeout 20 -> 30 minutes for model loading + 17 integration tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Integration tests run short queries (1-3 tool calls each), not the
multi-step "Run ALL" flow. 32K context was slowing model loading.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…5 is full Notepad report

- Option 1: renamed to "Quick health check (console summary)" - gathers 4 core
  metrics (memory, disk, CPU, GPU) and gives a text summary in terminal only
- Option 15: unchanged "Run ALL diagnostics + generate report" - gathers all 12
  metrics and writes a formatted plain-text report to Notepad
- Removed HTML report generation from system prompt
- Moved Notepad report pattern to comprehensive diagnostics protocol
…e diagram

- Rename simple_agent references to health_agent across all C++ guide pages
- Fix health_agent architecture diagram: replace clipboard/paste approach with
  file-write + Notepad open, add GPU check step
- Fix Get-WmiObject to Get-CimInstance (correct PowerShell API)
- Update project structure: add clean_console and test_tool_integration files
- Update test suite description: six modules to eight
- Update health_agent description: console summary or Notepad report
@kovtcharov kovtcharov self-assigned this Mar 3, 2026
@kovtcharov kovtcharov changed the title Fix C++ framework doc accuracy issues from pre-release audit C++ framework: docs, CI integration tests, health agent improvements Mar 3, 2026
Health and WiFi integration tests had maxSteps=10, but each test only
needs 1-2 tool calls. With 17 tests each creating a new agent (and MCP
connection for health tests), the cumulative LLM inference time exceeded
the 20-minute CI timeout on STX hardware.

Also clean up C++ intro page (remove redundant LLM Backend and
Integration sections already covered in subpages).
The STX integration tests were timing out because:
1. Each MCP test spawns uvx windows-mcp which downloads 117 Python
   packages (~30s per test, 8 MCP-dependent tests = ~4 min wasted)
2. Health/WiFi tests had maxSteps=10 but only need 1-2 tool calls

Fixes:
- Add pre-warm step in CI to install windows-mcp packages once before
  tests run, so subsequent uvx invocations use the cached environment
- Reduce maxSteps from 10 to 3 for health and WiFi integration tests
uvx creates a temporary venv and installs 117 packages on every
invocation. Using uv tool install puts windows-mcp in a persistent
location so uvx reuses it instead of reinstalling each time.
uv writes progress messages to stderr, which PowerShell treats as
NativeCommandError and kills the step. Capture output and check
$LASTEXITCODE instead.
PowerShell 5.1 treats any stderr output from native commands as a
terminating NativeCommandError, even when captured with 2>&1. Use
$ErrorActionPreference = SilentlyContinue and 2>$null instead.
@kovtcharov-amd
Copy link
Copy Markdown
Collaborator

@claude review this PR.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 3, 2026

Claude finished @kovtcharov-amd's task —— View job


PR Review: C++ Framework Updates

Overall Assessment: 🟢 Approved — This is a well-structured PR with comprehensive improvements to the C++ framework.


Summary

This PR delivers significant enhancements across three areas:

  1. Documentation improvements — Renamed simple_agent to health_agent, fixed code snippets to match source, added shell injection prevention section
  2. CI/CD enhancements — Added STX integration tests (LLM + MCP + WiFi + Health) with proper model selection
  3. Health Agent rewrite — Simplified menu, direct file-write approach, improved reliability with small LLMs

Code Review Findings

🟢 Well-Done Sections

1. Loop Detection Fix (cpp/src/agent.cpp:507-521)
The loop detection now correctly compares both tool name AND arguments:

if (toolCallHistory[i].first != toolName ||
    toolCallHistory[i].second != toolArgs) {
    allSame = false;
    break;
}

This prevents false positives when consecutive mcp_windows_Shell calls have different arguments — a common pattern for health diagnostics.

2. Tool Result Truncation (cpp/src/agent.cpp:541-543)
Reduced from 20K to 4K characters to prevent context overflow. Good balance between preserving useful output and managing context limits for models like Qwen3-4B.

3. Shell Injection Prevention (cpp/examples/wifi_agent.cpp:87-96)
The isSafeShellArg() function properly rejects dangerous metacharacters:

if (c == ';' || c == '|' || c == '&' || c == '`' || c == '$'
    || c == '(' || c == ')' || c == '{' || c == '}' || c == '<'
    || c == '>' || c == '"' || c == '\n' || c == '\r') {
    return false;
}

4. CleanConsole Implementation (cpp/src/clean_console.cpp)

  • Well-designed word-wrapping with ANSI color preservation
  • Proper **bold** marker parsing for styled output
  • Output preview truncation with line count indicators

5. CI Workflow (build_cpp.yml:267-294)
Good handling of PowerShell 5.1 stderr quirk with uv tool install:

$ErrorActionPreference = "SilentlyContinue"
uv tool install windows-mcp 2>$null
$installExit = $LASTEXITCODE
$ErrorActionPreference = "Stop"

6. MinGW Portability
Proper _dupenv_s guards with #ifdef _MSC_VER in test files, allowing MinGW builds.


🟡 Suggestions (Non-Blocking)

1. Context Size Hardcoded
In cpp/examples/health_agent.cpp:198, context size is hardcoded to 32K:

config.contextSize = 32768; // 32K needed for "Run ALL diagnostics"

Consider making this configurable or at least documenting why this value was chosen.

2. Test Integration Module Count
The documentation mentions "eight modules" in test suite description. Verify this matches the actual count in CMakeLists.txt:184-192 (currently lists 8 files: test_types, test_tool_registry, test_json_utils, test_agent, test_mcp_client, test_console, test_clean_console, test_tool_integration).


🟢 Test Coverage

The test_tool_integration.cpp file provides excellent coverage:

  • WiFi agent tool registration and execution
  • Input validation (safe vs unsafe hostnames)
  • Health agent JSON parsing for all PowerShell outputs
  • CleanConsole rendering pipeline
  • Full diagnostic tool chains

The integration tests in test_integration_health.cpp properly skip on non-Windows platforms.


Files Reviewed

File Status
.github/workflows/build_cpp.yml ✅ CI enhancements look good
cpp/src/agent.cpp ✅ Loop detection and truncation fixes
cpp/examples/health_agent.cpp ✅ Clean rewrite with proper MCP integration
cpp/examples/wifi_agent.cpp ✅ Shell injection prevention
cpp/include/gaia/clean_console.h ✅ Well-designed API
cpp/src/clean_console.cpp ✅ Proper ANSI handling
cpp/tests/test_tool_integration.cpp ✅ Comprehensive tests
cpp/tests/integration/test_integration_health.cpp ✅ Good LLM+MCP coverage
cpp/CMakeLists.txt ✅ Proper build structure
docs/guides/cpp/overview.mdx ✅ Clear architecture docs
docs/guides/cpp/wifi-agent.mdx ✅ Shell injection section added

Conclusion

This PR is ready to merge. The code is well-structured, properly tested, and the documentation accurately reflects the implementation changes.


uv tool install creates a broken Python venv on the STX runner
(pyvenv.cfg missing), causing all MCP and Health tests to fail.
Revert to letting uvx manage its own temporary environments and
uninstall any broken persistent installation from previous runs.
@kovtcharov kovtcharov added this pull request to the merge queue Mar 3, 2026
Merged via the queue into main with commit 1ed7c00 Mar 3, 2026
56 of 57 checks passed
@kovtcharov kovtcharov deleted the feature/cpp-framework-updates branch March 3, 2026 16:15
itomek pushed a commit that referenced this pull request Mar 12, 2026
…425)

## Summary

### Documentation
- Rename `simple_agent` to `health_agent` across all C++ guide pages
- Fix architecture diagram: replace clipboard/paste approach with
file-write + Notepad open, add GPU check
- Fix `Get-WmiObject` to `Get-CimInstance` (correct PowerShell API)
- Add CleanConsole and test_tool_integration to project structure
listing
- Update test suite description: six modules to eight
- Fix doc code snippets to match actual source (`PipeCloser`,
`isSafeShellArg`, `kDiagnosticMenu`)
- Add shell injection prevention section to wifi-agent guide
- Clarify Lemonade Server as recommended/tested LLM backend
- Split multi-command code blocks for easy copy-paste
- Clean up C++ intro page (remove redundant LLM Backend and Integration
sections)

### CI/CD (build_cpp.yml)
- Add integration test suite on STX hardware: LLM + MCP + WiFi + Health
tests
- Use `Qwen3-4B-Instruct-2507-GGUF` model (matches wifi and health
agents)
- Add uvx verification step before test execution
- Fix MinGW portability: `_dupenv_s` guards, `_WIN32` vs `_MSC_VER`,
`_putenv_s`

### Health Agent (health_agent.cpp)
- Simplify menu: option 1 is quick console-only summary (4 metrics, no
Notepad), option 15 is full diagnostics + Notepad report
- Replace clipboard/paste with direct file-write + `Start-Process
notepad` approach
- Use array-of-lines pattern with `[Environment]::NewLine` for proper
newlines
- Increase context size to 32K for comprehensive diagnostics (12+ tool
calls)
- Remove HTML report generation (unreliable with small LLMs)

### Agent Core (agent.cpp)
- Fix loop detection false positive: compare both tool name AND
arguments (was name-only, triggering on consecutive `mcp_windows_Shell`
calls with different args)
- Reduce tool result truncation from 20K to 4K chars to prevent context
overflow

### New Files
- `cpp/include/gaia/clean_console.h` + `cpp/src/clean_console.cpp` —
polished TUI with ANSI colors and word-wrap
- `cpp/examples/health_agent.cpp` — Windows system health agent using
MCP
- `cpp/tests/test_clean_console.cpp` — CleanConsole unit tests
- `cpp/tests/test_tool_integration.cpp` — Tool registry integration
tests
- `cpp/tests/integration/test_integration_mcp.cpp` — MCP connectivity
tests
- `cpp/tests/integration/test_integration_wifi.cpp` — WiFi diagnostic
tests
- `cpp/tests/integration/test_integration_health.cpp` — Health
monitoring tests

## Test plan
- [ ] All 6 cloud CI jobs pass (ubuntu + windows: mock tests, install
test, shared lib)
- [ ] STX integration tests build and run (LLM + MCP + WiFi + Health)
- [ ] `health_agent.exe` option 1: quick console summary (no Notepad)
- [ ] `health_agent.exe` option 15: full diagnostics + Notepad report
- [ ] `wifi_agent.exe` full network diagnostic works
- [ ] Mock tests pass locally: `gaia_tests.exe --gtest_color=yes`
- [ ] Documentation renders correctly on Mintlify

---------

Co-authored-by: Claude Code <claude-code@anthropic.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cpp devops DevOps/infrastructure changes documentation Documentation changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants