You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The installer matrix in Child 3 (#989) verifies install-time correctness — does GAIA install and reach state: ready across the OS / UV / Lemonade combinations. It does not exercise what users actually do with GAIA after install: run a custom agent, optionally connected to an MCP server.
We have unit tests for MCP under tests/mcp/servers/test_* but no test that drives an installed GAIA binary running a custom agent through real MCP and non-MCP paths. A regression in the MCP wiring or the agent runtime would not be caught by Child 3.
Scope
Build a custom-agent + MCP test harness that runs against the installed binary on the same self-hosted runners as Child 3:
Dummy MCP server — minimal HTTP/stdio server that exposes a known set of fake tools (e.g., echo, add_two_numbers, mock_search). Lives in tests/fixtures/mcp/dummy_server/ and is launched as a subprocess by the test.
Custom agent fixture — a small custom agent definition (skill spec or equivalent) that the test installs into the running GAIA instance.
Three test cases:
Custom agent + dummy MCP — agent invokes a dummy MCP tool, harness asserts the tool was called with the right args and the result reached the agent
Custom agent + real MCP (one of the bundled servers, e.g., the file MCP) — sanity check that the dummy harness isn't masking a real-MCP-only failure
Custom agent + no MCP — agent runs a code-only path, harness asserts no MCP traffic
Run on every cell of Child 3 (or a smaller subset — TBD) so we know the agent runtime works on every OS the matrix covers.
Acceptance criteria
Dummy MCP server fixture exists, has its own pytest test for sanity, and is invokable both as a CLI and as a pytest fixture
Custom agent fixture exists and can be installed into a running GAIA instance via the supported install flow (skill install, agent registration, whatever the supported entry point is — not hand-editing files)
All three test cases (with-MCP, dummy-MCP, no-MCP) pass on Windows + Ubuntu
An intentional break (return wrong type from dummy MCP, remove the agent, kill the MCP server mid-run) produces a clear failure with diagnosable logs
Each test cell uploads agent logs + MCP server logs as artifacts on failure
Open questions
Subset or full coverage? Running the agent harness on all 8 Child 3 cells doubles matrix runtime. Default proposal: run only on the {UV installed, Lemonade installed} cell of each OS — since this issue is about agent runtime, not install preconditions. Open for discussion.
Parent: #989
Blocked by: #989 Child 1 (pre-release artifact handling) and Child 2 (self-hosted runners)
Problem
The installer matrix in Child 3 (#989) verifies install-time correctness — does GAIA install and reach
state: readyacross the OS / UV / Lemonade combinations. It does not exercise what users actually do with GAIA after install: run a custom agent, optionally connected to an MCP server.We have unit tests for MCP under
tests/mcp/servers/test_*but no test that drives an installed GAIA binary running a custom agent through real MCP and non-MCP paths. A regression in the MCP wiring or the agent runtime would not be caught by Child 3.Scope
Build a custom-agent + MCP test harness that runs against the installed binary on the same self-hosted runners as Child 3:
echo,add_two_numbers,mock_search). Lives intests/fixtures/mcp/dummy_server/and is launched as a subprocess by the test.Acceptance criteria
Open questions
tests/fixtures/mcp/deliberately.