Fix Misleading Test Names and Stale Logic (groupF) by Trecek · Pull Request #560 · TalonT-Org/AutoSkillit

Trecek · 2026-03-28T16:52:06Z

Summary

Eight targeted fixes across seven test files: four test/class renames that resolve name–assertion mismatches (2.19, 2.20, 2.21, 2.23), a rewrite of SmokeExecutor._run_with_retry and its routing logic to use the current flat retries: / on_exhausted: schema (2.24/2.30) along with updates to the two tests that exercised the old schema, deletion of a single stale set-equality assertion (2.25), and strengthening of two contract-test assertions that use overly permissive string matching (2.27, 2.28). No production code changes; only test code.

Architecture Impact

Process Flow Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;

    START([SmokeExecutor.run])

    subgraph Loop ["Step Execution Loop (max_steps=30)"]
        direction TB
        ACTION{"action?<br/>━━━━━━━━━━<br/>stop / route / python / default"}
        STOP["stop<br/>━━━━━━━━━━<br/>return terminal_step + message"]
        ROUTE_ACTION["route (action)<br/>━━━━━━━━━━<br/>jump to on_success"]
        PYTHON["python:<br/>━━━━━━━━━━<br/>_execute_python → run_python MCP"]
    end

    subgraph Dispatch ["● _execute (test_smoke_pipeline.py)"]
        direction TB
        HAS_RETRIES{"retries in step_def?<br/>━━━━━━━━━━<br/>flat schema check"}
        DIRECT_CALL["direct call<br/>━━━━━━━━━━<br/>_TOOL_MAP[tool](**args)"]
        DELEGATE["delegate<br/>━━━━━━━━━━<br/>→ _run_with_retry"]
    end

    subgraph Retry ["● _run_with_retry (test_smoke_pipeline.py)"]
        direction TB
        FIRST_CALL["initial attempt<br/>━━━━━━━━━━<br/>tool_fn(**args)"]
        SUCCESS_CHECK{"● _is_success?<br/>━━━━━━━━━━<br/>tool-specific check"}
        RETRY_LOOP["retry loop<br/>━━━━━━━━━━<br/>for _ in range(max_retries)"]
        RETRY_SUCCESS{"_is_success?<br/>━━━━━━━━━━<br/>after re-attempt"}
        EXHAUSTED["● _retries_exhausted<br/>━━━━━━━━━━<br/>result[_retries_exhausted]=True"]
    end

    subgraph Routing ["● _route (test_smoke_pipeline.py)"]
        direction TB
        ON_RESULT{"on_result block?<br/>━━━━━━━━━━<br/>field-value dispatch"}
        EXHAUSTION{"● on_exhausted<br/>━━━━━━━━━━<br/>_retries_exhausted flagged?"}
        SF_CHECK{"_is_success?<br/>━━━━━━━━━━<br/>success / failure"}
    end

    CAPTURE["_capture<br/>━━━━━━━━━━<br/>extract key=value into context"]
    NEXT_STEP["advance to next_step<br/>━━━━━━━━━━<br/>loop iteration"]

    DONE([DONE / terminal_step])

    START --> ACTION
    ACTION -->|"stop"| STOP
    ACTION -->|"route"| ROUTE_ACTION
    ACTION -->|"python:"| PYTHON
    ACTION -->|"default"| HAS_RETRIES
    STOP --> DONE
    ROUTE_ACTION --> NEXT_STEP
    PYTHON --> CAPTURE

    HAS_RETRIES -->|"no"| DIRECT_CALL
    HAS_RETRIES -->|"yes"| DELEGATE
    DIRECT_CALL --> CAPTURE
    DELEGATE --> FIRST_CALL

    FIRST_CALL --> SUCCESS_CHECK
    SUCCESS_CHECK -->|"pass"| CAPTURE
    SUCCESS_CHECK -->|"fail"| RETRY_LOOP
    RETRY_LOOP --> RETRY_SUCCESS
    RETRY_SUCCESS -->|"pass"| CAPTURE
    RETRY_SUCCESS -->|"fail → loop"| RETRY_LOOP
    RETRY_LOOP -->|"exhausted"| EXHAUSTED
    EXHAUSTED --> CAPTURE

    CAPTURE --> ON_RESULT
    ON_RESULT -->|"match → target step"| NEXT_STEP
    ON_RESULT -->|"no match"| EXHAUSTION
    EXHAUSTION -->|"yes → on_exhausted"| NEXT_STEP
    EXHAUSTION -->|"no"| SF_CHECK
    SF_CHECK -->|"pass → on_success"| NEXT_STEP
    SF_CHECK -->|"fail → on_failure"| NEXT_STEP
    NEXT_STEP --> ACTION

    class START,DONE terminal;
    class ACTION,HAS_RETRIES,SUCCESS_CHECK,RETRY_SUCCESS,ON_RESULT,EXHAUSTION,SF_CHECK stateNode;
    class STOP,ROUTE_ACTION,PYTHON,DIRECT_CALL,DELEGATE,FIRST_CALL,RETRY_LOOP,NEXT_STEP handler;
    class EXHAUSTED,CAPTURE phase;

Color Legend:

Color	Category	Description
Dark Blue	Terminal	Start and completion states
Teal	Decision	Routing and branching decisions
Orange	Handler	Tool invocation and step execution
Purple	Phase	State mutation (capture, exhaustion sentinel)

Scenarios Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart LR
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    subgraph S1 ["SCENARIO 1: Test Name Accuracy (REQ 2.19–2.23)"]
        direction LR
        S1A["● test_core.py<br/>━━━━━━━━━━<br/>test_atomic_write_<br/>docstring_contains_<br/>atomic_keyword"]
        S1B["● test_config.py<br/>━━━━━━━━━━<br/>class TestRunSkill<br/>ConfigFields"]
        S1C["● test_skills.py<br/>━━━━━━━━━━<br/>test_58_skills_in_<br/>skills_extended"]
        S1D["● test_headless_add_dirs.py<br/>━━━━━━━━━━<br/>test_raw_skills_extended_<br/>excluded_from_run_skill_<br/>add_dirs"]
        S1E["pytest assertion<br/>━━━━━━━━━━<br/>name == assertion content"]
    end

    subgraph S2 ["SCENARIO 2: SmokeExecutor Retry — Flat Schema (REQ 2.24, 2.25, 2.30)"]
        direction LR
        S2A["step_def<br/>━━━━━━━━━━<br/>retries: 3<br/>on_exhausted: escalate"]
        S2B["● _execute<br/>━━━━━━━━━━<br/>checks 'retries'<br/>in step_def"]
        S2C["● _run_with_retry<br/>━━━━━━━━━━<br/>initial attempt<br/>+ retry loop"]
        S2D["● _is_success<br/>━━━━━━━━━━<br/>tool-specific<br/>pass check"]
        S2E["● _retries_exhausted<br/>━━━━━━━━━━<br/>sentinel flag<br/>in result"]
        S2F["● _route<br/>━━━━━━━━━━<br/>on_exhausted path<br/>or on_success/failure"]
    end

    subgraph S3 ["SCENARIO 3: Contract Assertion Strength (REQ 2.27, 2.28)"]
        direction LR
        S3A["● test_pr_traceability<br/>_contracts.py<br/>━━━━━━━━━━<br/>reads SKILL.md text"]
        S3B["● heading check<br/>━━━━━━━━━━<br/>'## requirements'<br/>in normalized"]
        S3C["● test_triage<br/>_contracts.py<br/>━━━━━━━━━━<br/>reads SKILL.md text"]
        S3D["● scoped bypass<br/>━━━━━━━━━━<br/>'already has' AND<br/>'## requirements' in lower"]
        S3E["assert has_req<br/>━━━━━━━━━━<br/>heading-structure<br/>validated"]
    end

    %% SCENARIO 1 FLOW %%
    S1A --> S1E
    S1B --> S1E
    S1C --> S1E
    S1D --> S1E

    %% SCENARIO 2 FLOW %%
    S2A --> S2B
    S2B -->|"yes"| S2C
    S2C --> S2D
    S2D -->|"fail"| S2C
    S2D -->|"exhausted"| S2E
    S2D -->|"pass"| S2F
    S2E --> S2F

    %% SCENARIO 3 FLOW %%
    S3A --> S3B
    S3B --> S3E
    S3C --> S3D
    S3D --> S3E

    %% CLASS ASSIGNMENTS %%
    class S1A,S1B,S1C,S1D handler;
    class S1E output;
    class S2A cli;
    class S2B,S2C phase;
    class S2D stateNode;
    class S2E detector;
    class S2F output;
    class S3A,S3C handler;
    class S3B,S3D phase;
    class S3E output;

Color Legend:

Color	Category	Description
Dark Blue	Entry	Step definition inputs
Orange	Handler	Modified test files
Purple	Process	Modified execution/assertion logic
Teal	Decision	Success/state checks
Red	Sentinel	Exhaustion detection
Dark Teal	Output	Assertion outcomes

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/impl-groupF-20260328-084624-134986/.autoskillit/temp/make-plan/fix_misleading_tests_groupF_plan_2026-03-28_084900.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step	input	output	cached	count	time
group	2.7k	25.0k	721.4k	1	9m 3s
plan	462	84.7k	14.8M	9	41m 2s
verify	98	55.7k	3.3M	6	18m 12s
implement	536	45.2k	10.3M	9	25m 25s
fix	60	10.0k	1.6M	2	11m 53s
Total	3.8k	220.7k	30.7M		1h 45m

Trecek

AutoSkillit PR Review — Verdict: changes_requested (cannot REQUEST_CHANGES on own PR, posting as COMMENT)

Trecek · 2026-03-28T17:03:27Z

        result = json.loads(raw_result)
-        if not result.get(retry_field, False):
-            return result  # succeeded on first try
+        if self._is_success(step_def, result):


[warning] tests: _is_success called in _run_with_retry but its implementation is not shown in the diff. If _is_success has a trivial/always-true impl, retry logic will not be exercised meaningfully. Verify _is_success actually checks failure conditions.

Investigated — this is intentional. _is_success is fully defined at lines 178-185: returns result.get('passed', False) for test_check, 'error' not in result for merge_worktree/classify_fix, and result.get('success', True) as fallback. The test_executor_retry_logic test (line 324) mocks run_skill to return success:False on the first call and success:True on the second, confirming retry logic is exercised. The implementation was present but not in the diff hunk shown to the reviewer (false_positive_intentional_pattern).

Trecek · 2026-03-28T17:03:27Z

    text = _read("pipeline-summary")
-    has_req = "## Requirements" in text or "requirements" in text.lower()
+    normalized = text.lower()
+    has_req = "## requirements" in normalized or "# requirements" in normalized


[warning] tests: Requirement check tightened to require a markdown heading (## requirements or # requirements). This could produce false negatives if the skill documents requirements without a heading. Confirm the actual skill content matches this stricter expectation.

Investigated — this is intentional. pipeline-summary/SKILL.md uses ## Requirements as an explicit heading in three distinct locations: line 28 (argument description referencing extraction), line 135 (Step 5b: 'Extract the ## Requirements section'), and line 167 (PR body template includes '## Requirements'). The normalized heading check ('## requirements' in normalized) correctly matches all occurrences. Commit f1b5644 intentionally replaced the bare 'requirements' in text.lower() fallback with this stricter heading check, and the skill content was already conformant. No false negative risk exists — skill documents requirements exclusively via the ## heading format (false_positive_intentional_pattern).

Trecek · 2026-03-28T17:03:27Z

@@ -376,12 +348,13 @@ async def mock_run_skill(**kwargs: object) -> str:
        assert result["success"] is True

    async def test_executor_max_attempts_zero_routes_to_on_exhausted(self) -> None:


[warning] tests: test_executor_max_attempts_zero_routes_to_on_exhausted: the diff shows test setup but not the assertions. If the test does not assert the route returned is 'retry_wt' (on_exhausted), the test title is misleading and the behavior is untested.

Investigated — this is intentional. Lines 388-391 contain explicit assertions: assert len(call_log) == 1 (verifying exactly one call with retries=0) and assert terminal_step == 'retry_wt' (verifying the on_exhausted route 'retry_wt' is returned). The reviewer claimed assertions were missing, which is factually incorrect — they are present in the actual file at lines 388-391 (stale_comment — assertions were present but outside the diff hunk shown to the reviewer).

Trecek · 2026-03-28T17:03:27Z

                return result

-        return result  # exhausted — always defined
+        result["_retries_exhausted"] = True


[warning] cohesion: _retries_exhausted sentinel uses a private-style underscore prefix but is consumed in _route() outside _run_with_retry. Rename to 'retries_exhausted' to match the flat public schema convention used by 'retries' and 'on_exhausted'.

Valid observation — flagged for design decision. _retries_exhausted (line 126) is set in _run_with_retry and consumed only in _route (line 171). The underscore prefix distinguishes it as an internal inter-method sentinel injected into the result dict, not a user-facing recipe schema field. The reviewer's point about naming consistency with flat schema fields ('retries', 'on_exhausted') is valid but debatable: those are recipe YAML keys, while _retries_exhausted is a runtime dict annotation. Whether to expose it without underscore or keep it private-style is a deliberate design choice that warrants human review before changing.

Trecek

AutoSkillit review found 3 blocking issues. See inline comments. Verdict: changes_requested

- Rename test_core_io_module_has_docstring → test_atomic_write_docstring_contains_atomic_keyword - Rename class TestRunSkillRetryConfigFields → TestRunSkillConfigFields - Rename test_57_skills_in_skills_extended → test_58_skills_in_skills_extended; fix docstring count - Rename test_run_skill_passes_ephemeral_session_dir_as_add_dir → test_raw_skills_extended_excluded_from_run_skill_add_dirs

…quality assertion (REQ 2.24, 2.25, 2.30) - _execute: check "retries" in step_def instead of "retry" - _run_with_retry: rewrite using flat retries:/on_exhausted with _retries_exhausted sentinel - _route: use step_def.get("on_exhausted") + result["_retries_exhausted"] instead of nested retry block - test_executor_retry_logic: update step_def to flat retries: 3 / on_exhausted format - test_executor_max_attempts_zero_routes_to_on_exhausted: update to retries: 0; fix docstring - Remove expected_steps set definition and set-equality assertion; retain all tool-type assertions

- test_pr_traceability_contracts: replace bare "requirements" in text.lower() fallback with heading-structure check ("## requirements" or "# requirements" in normalized) - test_triage_contracts: scope "already has" bypass to require "## requirements" context

Trecek commented Mar 28, 2026

View reviewed changes

Trecek enabled auto-merge March 28, 2026 17:24

Trecek added 3 commits March 28, 2026 10:34

Trecek force-pushed the test-audit/552/groupF branch from f1b5644 to 0976de5 Compare March 28, 2026 17:36

Trecek added this pull request to the merge queue Mar 28, 2026

Merged via the queue into integration with commit 7a2fd4f Mar 28, 2026
2 checks passed

Trecek deleted the test-audit/552/groupF branch March 28, 2026 17:41

This was referenced Apr 2, 2026

Promote integration to main (66 PRs, 56 issues, 39 fixes, 30 features) #575

Closed

Promote integration to main (68 PRs, 57 issues, 40 fixes, 31 features) #580

Merged

Trecek mentioned this pull request Apr 14, 2026

Promote integration to main (198 PRs, 182 issues, 102 fixes, 107 features, 1 infra, 7 tests, 4 docs) #925

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Misleading Test Names and Stale Logic (groupF)#560

Fix Misleading Test Names and Stale Logic (groupF)#560
Trecek merged 3 commits intointegrationfrom
test-audit/552/groupF

Trecek commented Mar 28, 2026 •

edited

Loading

Uh oh!

Trecek left a comment

Uh oh!

Trecek Mar 28, 2026

Uh oh!

Trecek Mar 28, 2026

Uh oh!

Trecek Mar 28, 2026

Uh oh!

Trecek Mar 28, 2026

Uh oh!

Trecek Mar 28, 2026

Uh oh!

Trecek Mar 28, 2026

Uh oh!

Trecek Mar 28, 2026

Uh oh!

Trecek Mar 28, 2026

Uh oh!

Trecek left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -376,12 +348,13 @@ async def mock_run_skill(**kwargs: object) -> str:
		assert result["success"] is True

		async def test_executor_max_attempts_zero_routes_to_on_exhausted(self) -> None:

Conversation

Trecek commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture Impact

Process Flow Diagram

Scenarios Diagram

Implementation Plan

Token Usage Summary

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Trecek commented Mar 28, 2026 •

edited

Loading