Rectify: TestRunner Protocol Returns Lossy Bare Tuple; Non-Pytest Runners Produce False Negatives by Trecek · Pull Request #520 · TalonT-Org/AutoSkillit

Trecek · 2026-03-27T00:53:07Z

Summary

The TestRunner Protocol (core/_type_protocols.py) returned a bare positional tuple[bool, str] — the only Protocol in the codebase that used a positional tuple for complex return data rather than a named dataclass. This structural gap made it impossible to carry stderr through the testing pipeline without a breaking change, which is why DefaultTestRunner.run() silently discarded result.stderr. The downstream consequence was twofold: (1) check_test_passed() operated on stdout-only data, and (2) its closed-world assumption (CWA) — returning False when no pytest summary was found — fired unconditionally for all non-pytest runners (cargo nextest, go test, custom scripts) that pass via exit code 0 with no pytest-format output.

The fix introduces a named TestResult dataclass (matching the established pattern of SkillResult, CleanupResult, CIRunScope in _type_results.py) and promotes it to the Protocol return type. This makes it structurally impossible to silently drop stderr, forces every caller to name the fields they consume, and eliminates the implicit "bare tuple = stdout-only" assumption. The adjudication logic is inverted to be runner-agnostic: trust exit code when no recognized parser (pytest) matches.

Closes #513

Architecture Impact

Process Flow Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    START([test_check MCP call])
    PASS([PASS])
    FAIL([FAIL])

    subgraph Entry ["MCP Entry — ● tools_workspace.py"]
        TESTER{"tester<br/>configured?"}
        NOCFG["return error<br/>━━━━━━━━━━<br/>not configured"]
    end

    subgraph Execution ["Subprocess Execution — ● testing.py"]
        DTR["● DefaultTestRunner.run(cwd)<br/>━━━━━━━━━━<br/>reads config.test_check.command<br/>reads config.test_check.timeout"]
        RUNNER["SubprocessRunner<br/>━━━━━━━━━━<br/>executes command in cwd<br/>captures stdout + stderr"]
        SPR["SubprocessResult<br/>━━━━━━━━━━<br/>{returncode, stdout, stderr}"]
    end

    subgraph Adjudication ["● Adjudication — check_test_passed()"]
        RC{"returncode != 0?"}
        COMBINE["build combined<br/>━━━━━━━━━━<br/>stdout + '\n' + stderr"]
        PPS["● parse_pytest_summary(combined)<br/>━━━━━━━━━━<br/>2-pass: =-delimited + bare -q<br/>searches stdout AND stderr"]
        MATCH{"pytest summary<br/>found?"}
        TRUST["● Trust exit code<br/>━━━━━━━━━━<br/>returncode == 0 → True<br/>runner-agnostic path"]
        COUNTS{"failed > 0<br/>or error > 0?"}
    end

    subgraph Result ["● TestResult Contract — core/_type_results.py"]
        TR["● TestResult<br/>━━━━━━━━━━<br/>{passed: bool,<br/>stdout: str,<br/>stderr: str}"]
    end

    subgraph Response ["● MCP Response — tools_workspace.py"]
        RESP["● test_check response<br/>━━━━━━━━━━<br/>{passed, stdout, stderr}<br/>truncated text"]
    end

    subgraph MergeGate ["perform_merge() Test Gates — ● git.py"]
        GATE{"test_gate_on_merge?"}
        PRETEST["● Pre-rebase gate<br/>━━━━━━━━━━<br/>tester.run(worktree_path)<br/>blocks on failure"]
        REBASE["git rebase<br/>━━━━━━━━━━<br/>onto base branch"]
        POSTTEST["● Post-rebase gate<br/>━━━━━━━━━━<br/>tester.run(worktree_path)<br/>blocks on regression"]
        MERGE["git merge --ff-only<br/>━━━━━━━━━━<br/>fast-forward merge"]
    end

    MERGEPASS([MERGE SUCCESS])
    MERGEFAIL([MERGE BLOCKED])

    START --> TESTER
    TESTER -->|"yes"| DTR
    TESTER -->|"no"| NOCFG
    DTR --> RUNNER
    RUNNER --> SPR
    SPR -->|"stdout + stderr forwarded"| TR
    TR -->|"passed=?"| RC
    RC -->|"yes"| FAIL
    RC -->|"no"| COMBINE
    COMBINE --> PPS
    PPS --> MATCH
    MATCH -->|"no match"| TRUST
    MATCH -->|"pytest found"| COUNTS
    TRUST -->|"rc == 0"| PASS
    TRUST -->|"rc != 0"| FAIL
    COUNTS -->|"clean"| PASS
    COUNTS -->|"failures"| FAIL
    PASS --> TR
    FAIL --> TR
    TR --> RESP

    GATE -->|"yes"| PRETEST
    GATE -->|"no"| MERGE
    PRETEST -->|"passed"| REBASE
    PRETEST -->|"failed"| MERGEFAIL
    REBASE --> POSTTEST
    POSTTEST -->|"passed"| MERGE
    POSTTEST -->|"failed"| MERGEFAIL
    MERGE --> MERGEPASS

    class START terminal;
    class PASS,FAIL terminal;
    class MERGEPASS,MERGEFAIL terminal;
    class NOCFG detector;
    class DTR handler;
    class RUNNER,SPR stateNode;
    class TR newComponent;
    class RC,MATCH,TESTER,GATE,COUNTS stateNode;
    class COMBINE,PPS phase;
    class TRUST newComponent;
    class RESP output;
    class PRETEST,POSTTEST handler;
    class REBASE,MERGE phase;

Module Dependency Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%%
graph TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    subgraph L0 ["L0 — CORE (Zero autoskillit imports)"]
        direction LR
        RESULTS["● core/_type_results.py<br/>━━━━━━━━━━<br/>TestResult dataclass added<br/>SkillResult, CleanupResult, etc."]
        PROTOCOLS["● core/_type_protocols.py<br/>━━━━━━━━━━<br/>TestRunner Protocol<br/>returns TestResult (was tuple)"]
        INIT["● core/__init__.py<br/>━━━━━━━━━━<br/>re-exports TestResult<br/>public surface"]
    end

    subgraph L1 ["L1 — EXECUTION / PIPELINE"]
        direction LR
        TESTING["● execution/testing.py<br/>━━━━━━━━━━<br/>DefaultTestRunner<br/>check_test_passed()"]
        CTX["pipeline/context.py<br/>━━━━━━━━━━<br/>tester: TestRunner | None<br/>DI container"]
    end

    subgraph L3 ["L3 — SERVER"]
        direction LR
        WORKSPACE["● server/tools_workspace.py<br/>━━━━━━━━━━<br/>test_check MCP handler<br/>returns {passed, stdout, stderr}"]
        GIT["● server/git.py<br/>━━━━━━━━━━<br/>perform_merge()<br/>pre + post rebase gates"]
        FACTORY["server/_factory.py<br/>━━━━━━━━━━<br/>wires DefaultTestRunner<br/>into ToolContext.tester"]
    end

    subgraph Tests ["TESTS (cross-layer)"]
        direction LR
        CONFTEST["● tests/conftest.py<br/>━━━━━━━━━━<br/>StatefulMockTester<br/>uses TestResult"]
        TTEST["● tests/execution/test_testing.py<br/>━━━━━━━━━━<br/>check_test_passed tests<br/>non-pytest runner tests"]
        CONTRACTS["● tests/contracts/test_protocol_satisfaction.py<br/>━━━━━━━━━━<br/>TestRunner return type<br/>verified as TestResult"]
        WTEST["● tests/server/test_tools_workspace.py<br/>━━━━━━━━━━<br/>schema: {passed,stdout,stderr}<br/>cargo nextest style test"]
        GTEST["● tests/server/test_git.py<br/>━━━━━━━━━━<br/>StatefulMockTester calls<br/>updated to TestResult"]
        TGTEST["● tests/server/test_tools_git.py<br/>━━━━━━━━━━<br/>ls-files result added<br/>merge gate tests"]
    end

    RESULTS -->|"imported by"| PROTOCOLS
    RESULTS -->|"re-exported via"| INIT
    RESULTS -->|"TestResult"| TESTING
    PROTOCOLS -->|"TestRunner Protocol"| CTX
    CTX -->|"ToolContext.tester"| WORKSPACE
    CTX -->|"ToolContext.tester"| GIT
    TESTING -->|"wired as tester"| FACTORY
    FACTORY -->|"injects"| CTX
    RESULTS -->|"TestResult"| CONFTEST
    PROTOCOLS -->|"TestRunner"| CONTRACTS
    TESTING -->|"check_test_passed"| TTEST
    CONFTEST -->|"StatefulMockTester"| WTEST
    CONFTEST -->|"StatefulMockTester"| GTEST
    GIT -->|"perform_merge"| TGTEST

    class RESULTS newComponent;
    class PROTOCOLS,INIT handler;
    class TESTING handler;
    class CTX stateNode;
    class WORKSPACE,GIT phase;
    class FACTORY phase;
    class CONFTEST,TTEST,CONTRACTS,WTEST,GTEST,TGTEST output;

State Lifecycle Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;

    subgraph SubprocessContract ["SUBPROCESS RESULT CONTRACT (immutable source)"]
        direction LR
        RC["SubprocessResult.returncode<br/>━━━━━━━━━━<br/>int — exit code<br/>INIT_ONLY: read once"]
        STDOUT["SubprocessResult.stdout<br/>━━━━━━━━━━<br/>str — process stdout<br/>INIT_ONLY: forwarded as-is"]
        STDERR["SubprocessResult.stderr<br/>━━━━━━━━━━<br/>str — process stderr<br/>INIT_ONLY: was DROPPED; now forwarded"]
    end

    subgraph AdjudicationGate ["● ADJUDICATION GATE — check_test_passed()"]
        direction TB
        RCGATE{"returncode != 0?<br/>━━━━━━━━━━<br/>FAIL-FAST gate"}
        COMBINE["combined = stdout + stderr<br/>━━━━━━━━━━<br/>DERIVED: ephemeral union<br/>used for parsing only"]
        PARSEGATE{"parse_pytest_summary(combined)<br/>━━━━━━━━━━<br/>counts found?"}
        TRUSTGATE["● trust exit code path<br/>━━━━━━━━━━<br/>no counts → rc==0 means PASS<br/>replaces old CWA: return False"]
        COUNTGATE{"failed > 0 or error > 0?<br/>━━━━━━━━━━<br/>pytest count validation"}
    end

    subgraph TestResultContract ["● TESTRESULT CONTRACT — core/_type_results.py"]
        direction LR
        TR_PASSED["● TestResult.passed<br/>━━━━━━━━━━<br/>bool: INIT_ONLY<br/>set by adjudication gate"]
        TR_STDOUT["● TestResult.stdout<br/>━━━━━━━━━━<br/>str: INIT_ONLY<br/>raw subprocess stdout"]
        TR_STDERR["● TestResult.stderr<br/>━━━━━━━━━━<br/>str: INIT_ONLY (was ABSENT)<br/>raw subprocess stderr — now carried"]
    end

    subgraph ProtocolGate ["● PROTOCOL CONFORMANCE GATE — core/_type_protocols.py"]
        PROTO["● TestRunner Protocol<br/>━━━━━━━━━━<br/>async run(cwd) → TestResult<br/>(was tuple[bool, str])"]
        MYPY["mypy type checker<br/>━━━━━━━━━━<br/>enforces return type<br/>at static analysis time"]
    end

    subgraph ConsumerContracts ["● CONSUMER FIELD CONTRACTS"]
        direction LR
        TC_RESP["● test_check response<br/>━━━━━━━━━━<br/>{passed, stdout, stderr}<br/>stderr: was ABSENT in response"]
        MERGE_PRE["● perform_merge pre-gate<br/>━━━━━━━━━━<br/>{test_stdout, test_stderr}<br/>stderr: was ABSENT in error dict"]
        MERGE_POST["● perform_merge post-gate<br/>━━━━━━━━━━<br/>test_result.passed<br/>(was: passed, _ = ...)"]
    end

    PASS_STATE(["passed = True"])
    FAIL_STATE(["passed = False"])

    RC -->|"read once"| RCGATE
    STDOUT -->|"forwarded"| COMBINE
    STDERR -->|"NOW forwarded"| COMBINE
    RCGATE -->|"yes: fail-fast"| FAIL_STATE
    RCGATE -->|"no"| COMBINE
    COMBINE --> PARSEGATE
    PARSEGATE -->|"no match"| TRUSTGATE
    PARSEGATE -->|"pytest found"| COUNTGATE
    TRUSTGATE -->|"rc == 0"| PASS_STATE
    TRUSTGATE -->|"rc != 0"| FAIL_STATE
    COUNTGATE -->|"clean"| PASS_STATE
    COUNTGATE -->|"failures"| FAIL_STATE
    PASS_STATE -->|"passed=True"| TR_PASSED
    FAIL_STATE -->|"passed=False"| TR_PASSED
    STDOUT -->|"stdout="| TR_STDOUT
    STDERR -->|"stderr="| TR_STDERR
    TR_PASSED --> PROTO
    TR_STDOUT --> PROTO
    TR_STDERR --> PROTO
    PROTO --> MYPY
    TR_PASSED -->|"test_result.passed"| TC_RESP
    TR_STDOUT -->|"truncated"| TC_RESP
    TR_STDERR -->|"truncated (NEW)"| TC_RESP
    TR_PASSED -->|"pre-gate check"| MERGE_PRE
    TR_STDOUT -->|"test_stdout"| MERGE_PRE
    TR_STDERR -->|"test_stderr (NEW)"| MERGE_PRE
    TR_PASSED -->|"post-gate check"| MERGE_POST

    class RC,STDOUT stateNode;
    class STDERR gap;
    class RCGATE,PARSEGATE,COUNTGATE stateNode;
    class COMBINE phase;
    class TRUSTGATE newComponent;
    class TR_PASSED,TR_STDOUT newComponent;
    class TR_STDERR newComponent;
    class PROTO,MYPY detector;
    class TC_RESP,MERGE_PRE,MERGE_POST output;
    class PASS_STATE,FAIL_STATE cli;

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/remediation-20260326-162603-381072/.autoskillit/temp/rectify/rectify_test_runner_protocol_lossy_tuple_2026-03-26_000000.md

🤖 Generated with Claude Code via AutoSkillit

- Add TestResult dataclass to core/_type_results.py (passed, stdout, stderr) - Export TestResult from core/__init__.py and types.py - Update TestRunner Protocol return type from tuple[bool, str] to TestResult - Invert check_test_passed() CWA: no pytest summary → trust exit code (non-pytest runners) - Add stderr parameter to check_test_passed(); combine stdout+stderr for pytest detection - DefaultTestRunner.run() now returns TestResult with full stdout and stderr - test_check MCP handler returns {passed, stdout, stderr} instead of {passed, output} - perform_merge() uses TestResult attribute access at both test gate call sites - Update StatefulMockTester to accept list[TestResult] and return TestResult - Update all StatefulMockTester construction sites in test_git.py to use TestResult - Flip two CWA test assertions (empty output / log-only now expect True) - Add 9 new tests: non-pytest runner paths, TestResult fields, schema, cargo nextest style - Add test_test_runner_return_type_is_test_result() contract test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…lt in test_tools_git

Trecek

AutoSkillit PR Review — Verdict: changes_requested

6 blocking findings (1 critical, 5 warnings). See inline comments.

Trecek · 2026-03-27T01:02:02Z

    if returncode != 0:
        return False
-    counts = parse_pytest_summary(stdout)
+    combined = stdout + ("\n" + stderr if stderr else "")


[warning] bugs: stderr concatenation may cause incorrect regex match across the join boundary — if stdout does not end with a newline, the concatenated string could form a spurious pytest summary line. Safer to call parse_pytest_summary on stdout and stderr independently and merge the resulting count dicts.

Investigated — this is intentional. The concatenation always inserts a literal "\n" between stdout and stderr when stderr is non-empty (stdout + ("\n" + stderr if stderr else "")). Since parse_pytest_summary processes lines via splitlines(), the inserted "\n" guarantees stdout and stderr always land on separate lines — a cross-boundary spurious match cannot form. Pass 1 requires a line that both starts and ends with =; Pass 2 requires a in N.Ns timing suffix. Neither can be synthesized by concatenating the tail of stdout with the head of stderr across a newline boundary.

Trecek · 2026-03-27T01:02:02Z

+    assert check_test_passed(1, "", "") is False
+
+
+def test_check_test_passed_parses_pytest_summary_in_stderr() -> None:


[warning] tests: test_check_test_passed_parses_pytest_summary_in_stderr overlaps with test_check_test_passed_true_when_no_summary_stderr_only (L283) — both assert check_test_passed(0, "", ) is True. Consider merging into a parametrized test to avoid silent divergence.

Investigated — this is intentional. These two tests cover distinct code branches: test_check_test_passed_true_when_no_summary_stderr_only (L283) passes "test result: ok. 42 passed; 0 failed\n" (cargo/nextest-style, no pytest summary pattern) — it verifies the non-pytest fallback path where no summary is found and exit code is trusted. test_check_test_passed_parses_pytest_summary_in_stderr (L304) passes "= 5 passed in 1.2s =\n" — a canonical pytest summary line, verifying that pytest summary detection works when the summary appears in stderr. Merging them would collapse coverage of two separate code paths into one test.

Trecek · 2026-03-27T01:02:03Z

+        assert "PASS [0.5s] 5 tests" in result["stderr"]
+
+    @pytest.mark.anyio
+    async def test_response_schema_includes_stderr(self, tool_ctx):


[info] tests: test_response_schema_includes_stderr and test_does_not_expose_summary (L154) both assert set(result.keys()) == {"passed", "stdout", "stderr"}. The schema assertion is now duplicated across two tests.

Trecek · 2026-03-27T01:02:03Z

            )

+    @pytest.mark.anyio
+    async def test_cargo_nextest_style_passes(self, tool_ctx):


[info] tests: test_cargo_nextest_style_passes checks result["stderr"] with in rather than equality. If the implementation appends extra text the assertion still passes. Use == for deterministic output validation.

…efore summary

…failure dict

… of Path("/tmp")

Trecek and others added 2 commits March 26, 2026 17:16

fix: use public core import for TestResult; add missing ls-files resu…

76555f9

…lt in test_tools_git

Trecek commented Mar 27, 2026

View reviewed changes

Trecek added 4 commits March 26, 2026 18:10

fix(review): document known limitation for pytest crash with exit 0 b…

c399053

…efore summary

fix(review): expose test_stdout/test_stderr in post-rebase test gate …

93a7fde

…failure dict

fix(review): remove duplicate empty-output test; use tmp_path instead…

afa85c1

… of Path("/tmp")

fix(review): assert test_stdout/test_stderr in post-rebase failure test

432cfdb

Trecek added this pull request to the merge queue Mar 27, 2026

Merged via the queue into integration with commit 604a860 Mar 27, 2026
2 checks passed

Trecek deleted the test-check-returns-false-negative-for-non-pytest-test-runner/513 branch March 27, 2026 01:24

This was referenced Mar 27, 2026

patch-bump-integration.yml races when merge queue batches multiple PRs #529

Closed

Promote integration to main (66 PRs, 56 issues, 39 fixes, 30 features) #575

Closed

Promote integration to main (68 PRs, 57 issues, 40 fixes, 31 features) #580

Merged

Trecek mentioned this pull request Apr 14, 2026

Promote integration to main (198 PRs, 182 issues, 102 fixes, 107 features, 1 infra, 7 tests, 4 docs) #925

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rectify: TestRunner Protocol Returns Lossy Bare Tuple; Non-Pytest Runners Produce False Negatives#520

Rectify: TestRunner Protocol Returns Lossy Bare Tuple; Non-Pytest Runners Produce False Negatives#520
Trecek merged 6 commits intointegrationfrom
test-check-returns-false-negative-for-non-pytest-test-runner/513

Trecek commented Mar 27, 2026

Uh oh!

Trecek left a comment

Uh oh!

Uh oh!

Trecek Mar 27, 2026

Uh oh!

Trecek Mar 27, 2026

Uh oh!

Uh oh!

Uh oh!

Trecek Mar 27, 2026

Uh oh!

Trecek Mar 27, 2026

Uh oh!

Uh oh!

Trecek Mar 27, 2026

Uh oh!

Trecek Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		assert check_test_passed(1, "", "") is False


		def test_check_test_passed_parses_pytest_summary_in_stderr() -> None:

Conversation

Trecek commented Mar 27, 2026

Summary

Architecture Impact

Process Flow Diagram

Module Dependency Diagram

State Lifecycle Diagram

Implementation Plan

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Trecek Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Trecek Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Trecek Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Trecek Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Trecek Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Trecek Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant