Fix Misleading Test Names and Stale Logic (groupF)#560
Conversation
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit PR Review — Verdict: changes_requested (cannot REQUEST_CHANGES on own PR, posting as COMMENT)
| result = json.loads(raw_result) | ||
| if not result.get(retry_field, False): | ||
| return result # succeeded on first try | ||
| if self._is_success(step_def, result): |
There was a problem hiding this comment.
[warning] tests: _is_success called in _run_with_retry but its implementation is not shown in the diff. If _is_success has a trivial/always-true impl, retry logic will not be exercised meaningfully. Verify _is_success actually checks failure conditions.
There was a problem hiding this comment.
Investigated — this is intentional. _is_success is fully defined at lines 178-185: returns result.get('passed', False) for test_check, 'error' not in result for merge_worktree/classify_fix, and result.get('success', True) as fallback. The test_executor_retry_logic test (line 324) mocks run_skill to return success:False on the first call and success:True on the second, confirming retry logic is exercised. The implementation was present but not in the diff hunk shown to the reviewer (false_positive_intentional_pattern).
| text = _read("pipeline-summary") | ||
| has_req = "## Requirements" in text or "requirements" in text.lower() | ||
| normalized = text.lower() | ||
| has_req = "## requirements" in normalized or "# requirements" in normalized |
There was a problem hiding this comment.
[warning] tests: Requirement check tightened to require a markdown heading (## requirements or # requirements). This could produce false negatives if the skill documents requirements without a heading. Confirm the actual skill content matches this stricter expectation.
There was a problem hiding this comment.
Investigated — this is intentional. pipeline-summary/SKILL.md uses ## Requirements as an explicit heading in three distinct locations: line 28 (argument description referencing extraction), line 135 (Step 5b: 'Extract the ## Requirements section'), and line 167 (PR body template includes '## Requirements'). The normalized heading check ('## requirements' in normalized) correctly matches all occurrences. Commit f1b5644 intentionally replaced the bare 'requirements' in text.lower() fallback with this stricter heading check, and the skill content was already conformant. No false negative risk exists — skill documents requirements exclusively via the ## heading format (false_positive_intentional_pattern).
| @@ -376,12 +348,13 @@ async def mock_run_skill(**kwargs: object) -> str: | |||
| assert result["success"] is True | |||
|
|
|||
| async def test_executor_max_attempts_zero_routes_to_on_exhausted(self) -> None: | |||
There was a problem hiding this comment.
[warning] tests: test_executor_max_attempts_zero_routes_to_on_exhausted: the diff shows test setup but not the assertions. If the test does not assert the route returned is 'retry_wt' (on_exhausted), the test title is misleading and the behavior is untested.
There was a problem hiding this comment.
Investigated — this is intentional. Lines 388-391 contain explicit assertions: assert len(call_log) == 1 (verifying exactly one call with retries=0) and assert terminal_step == 'retry_wt' (verifying the on_exhausted route 'retry_wt' is returned). The reviewer claimed assertions were missing, which is factually incorrect — they are present in the actual file at lines 388-391 (stale_comment — assertions were present but outside the diff hunk shown to the reviewer).
| return result | ||
|
|
||
| return result # exhausted — always defined | ||
| result["_retries_exhausted"] = True |
There was a problem hiding this comment.
[warning] cohesion: _retries_exhausted sentinel uses a private-style underscore prefix but is consumed in _route() outside _run_with_retry. Rename to 'retries_exhausted' to match the flat public schema convention used by 'retries' and 'on_exhausted'.
There was a problem hiding this comment.
Valid observation — flagged for design decision. _retries_exhausted (line 126) is set in _run_with_retry and consumed only in _route (line 171). The underscore prefix distinguishes it as an internal inter-method sentinel injected into the result dict, not a user-facing recipe schema field. The reviewer's point about naming consistency with flat schema fields ('retries', 'on_exhausted') is valid but debatable: those are recipe YAML keys, while _retries_exhausted is a runtime dict annotation. Whether to expose it without underscore or keep it private-style is a deliberate design choice that warrants human review before changing.
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit review found 3 blocking issues. See inline comments. Verdict: changes_requested
- Rename test_core_io_module_has_docstring → test_atomic_write_docstring_contains_atomic_keyword - Rename class TestRunSkillRetryConfigFields → TestRunSkillConfigFields - Rename test_57_skills_in_skills_extended → test_58_skills_in_skills_extended; fix docstring count - Rename test_run_skill_passes_ephemeral_session_dir_as_add_dir → test_raw_skills_extended_excluded_from_run_skill_add_dirs
…quality assertion (REQ 2.24, 2.25, 2.30)
- _execute: check "retries" in step_def instead of "retry"
- _run_with_retry: rewrite using flat retries:/on_exhausted with _retries_exhausted sentinel
- _route: use step_def.get("on_exhausted") + result["_retries_exhausted"] instead of nested retry block
- test_executor_retry_logic: update step_def to flat retries: 3 / on_exhausted format
- test_executor_max_attempts_zero_routes_to_on_exhausted: update to retries: 0; fix docstring
- Remove expected_steps set definition and set-equality assertion; retain all tool-type assertions
- test_pr_traceability_contracts: replace bare "requirements" in text.lower() fallback
with heading-structure check ("## requirements" or "# requirements" in normalized)
- test_triage_contracts: scope "already has" bypass to require "## requirements" context
f1b5644 to
0976de5
Compare
Summary
Eight targeted fixes across seven test files: four test/class renames that resolve name–assertion mismatches (2.19, 2.20, 2.21, 2.23), a rewrite of
SmokeExecutor._run_with_retryand its routing logic to use the current flatretries:/on_exhausted:schema (2.24/2.30) along with updates to the two tests that exercised the old schema, deletion of a single stale set-equality assertion (2.25), and strengthening of two contract-test assertions that use overly permissive string matching (2.27, 2.28). No production code changes; only test code.Architecture Impact
Process Flow Diagram
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; START([SmokeExecutor.run]) subgraph Loop ["Step Execution Loop (max_steps=30)"] direction TB ACTION{"action?<br/>━━━━━━━━━━<br/>stop / route / python / default"} STOP["stop<br/>━━━━━━━━━━<br/>return terminal_step + message"] ROUTE_ACTION["route (action)<br/>━━━━━━━━━━<br/>jump to on_success"] PYTHON["python:<br/>━━━━━━━━━━<br/>_execute_python → run_python MCP"] end subgraph Dispatch ["● _execute (test_smoke_pipeline.py)"] direction TB HAS_RETRIES{"retries in step_def?<br/>━━━━━━━━━━<br/>flat schema check"} DIRECT_CALL["direct call<br/>━━━━━━━━━━<br/>_TOOL_MAP[tool](**args)"] DELEGATE["delegate<br/>━━━━━━━━━━<br/>→ _run_with_retry"] end subgraph Retry ["● _run_with_retry (test_smoke_pipeline.py)"] direction TB FIRST_CALL["initial attempt<br/>━━━━━━━━━━<br/>tool_fn(**args)"] SUCCESS_CHECK{"● _is_success?<br/>━━━━━━━━━━<br/>tool-specific check"} RETRY_LOOP["retry loop<br/>━━━━━━━━━━<br/>for _ in range(max_retries)"] RETRY_SUCCESS{"_is_success?<br/>━━━━━━━━━━<br/>after re-attempt"} EXHAUSTED["● _retries_exhausted<br/>━━━━━━━━━━<br/>result[_retries_exhausted]=True"] end subgraph Routing ["● _route (test_smoke_pipeline.py)"] direction TB ON_RESULT{"on_result block?<br/>━━━━━━━━━━<br/>field-value dispatch"} EXHAUSTION{"● on_exhausted<br/>━━━━━━━━━━<br/>_retries_exhausted flagged?"} SF_CHECK{"_is_success?<br/>━━━━━━━━━━<br/>success / failure"} end CAPTURE["_capture<br/>━━━━━━━━━━<br/>extract key=value into context"] NEXT_STEP["advance to next_step<br/>━━━━━━━━━━<br/>loop iteration"] DONE([DONE / terminal_step]) START --> ACTION ACTION -->|"stop"| STOP ACTION -->|"route"| ROUTE_ACTION ACTION -->|"python:"| PYTHON ACTION -->|"default"| HAS_RETRIES STOP --> DONE ROUTE_ACTION --> NEXT_STEP PYTHON --> CAPTURE HAS_RETRIES -->|"no"| DIRECT_CALL HAS_RETRIES -->|"yes"| DELEGATE DIRECT_CALL --> CAPTURE DELEGATE --> FIRST_CALL FIRST_CALL --> SUCCESS_CHECK SUCCESS_CHECK -->|"pass"| CAPTURE SUCCESS_CHECK -->|"fail"| RETRY_LOOP RETRY_LOOP --> RETRY_SUCCESS RETRY_SUCCESS -->|"pass"| CAPTURE RETRY_SUCCESS -->|"fail → loop"| RETRY_LOOP RETRY_LOOP -->|"exhausted"| EXHAUSTED EXHAUSTED --> CAPTURE CAPTURE --> ON_RESULT ON_RESULT -->|"match → target step"| NEXT_STEP ON_RESULT -->|"no match"| EXHAUSTION EXHAUSTION -->|"yes → on_exhausted"| NEXT_STEP EXHAUSTION -->|"no"| SF_CHECK SF_CHECK -->|"pass → on_success"| NEXT_STEP SF_CHECK -->|"fail → on_failure"| NEXT_STEP NEXT_STEP --> ACTION class START,DONE terminal; class ACTION,HAS_RETRIES,SUCCESS_CHECK,RETRY_SUCCESS,ON_RESULT,EXHAUSTION,SF_CHECK stateNode; class STOP,ROUTE_ACTION,PYTHON,DIRECT_CALL,DELEGATE,FIRST_CALL,RETRY_LOOP,NEXT_STEP handler; class EXHAUSTED,CAPTURE phase;Color Legend:
Scenarios Diagram
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart LR %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; subgraph S1 ["SCENARIO 1: Test Name Accuracy (REQ 2.19–2.23)"] direction LR S1A["● test_core.py<br/>━━━━━━━━━━<br/>test_atomic_write_<br/>docstring_contains_<br/>atomic_keyword"] S1B["● test_config.py<br/>━━━━━━━━━━<br/>class TestRunSkill<br/>ConfigFields"] S1C["● test_skills.py<br/>━━━━━━━━━━<br/>test_58_skills_in_<br/>skills_extended"] S1D["● test_headless_add_dirs.py<br/>━━━━━━━━━━<br/>test_raw_skills_extended_<br/>excluded_from_run_skill_<br/>add_dirs"] S1E["pytest assertion<br/>━━━━━━━━━━<br/>name == assertion content"] end subgraph S2 ["SCENARIO 2: SmokeExecutor Retry — Flat Schema (REQ 2.24, 2.25, 2.30)"] direction LR S2A["step_def<br/>━━━━━━━━━━<br/>retries: 3<br/>on_exhausted: escalate"] S2B["● _execute<br/>━━━━━━━━━━<br/>checks 'retries'<br/>in step_def"] S2C["● _run_with_retry<br/>━━━━━━━━━━<br/>initial attempt<br/>+ retry loop"] S2D["● _is_success<br/>━━━━━━━━━━<br/>tool-specific<br/>pass check"] S2E["● _retries_exhausted<br/>━━━━━━━━━━<br/>sentinel flag<br/>in result"] S2F["● _route<br/>━━━━━━━━━━<br/>on_exhausted path<br/>or on_success/failure"] end subgraph S3 ["SCENARIO 3: Contract Assertion Strength (REQ 2.27, 2.28)"] direction LR S3A["● test_pr_traceability<br/>_contracts.py<br/>━━━━━━━━━━<br/>reads SKILL.md text"] S3B["● heading check<br/>━━━━━━━━━━<br/>'## requirements'<br/>in normalized"] S3C["● test_triage<br/>_contracts.py<br/>━━━━━━━━━━<br/>reads SKILL.md text"] S3D["● scoped bypass<br/>━━━━━━━━━━<br/>'already has' AND<br/>'## requirements' in lower"] S3E["assert has_req<br/>━━━━━━━━━━<br/>heading-structure<br/>validated"] end %% SCENARIO 1 FLOW %% S1A --> S1E S1B --> S1E S1C --> S1E S1D --> S1E %% SCENARIO 2 FLOW %% S2A --> S2B S2B -->|"yes"| S2C S2C --> S2D S2D -->|"fail"| S2C S2D -->|"exhausted"| S2E S2D -->|"pass"| S2F S2E --> S2F %% SCENARIO 3 FLOW %% S3A --> S3B S3B --> S3E S3C --> S3D S3D --> S3E %% CLASS ASSIGNMENTS %% class S1A,S1B,S1C,S1D handler; class S1E output; class S2A cli; class S2B,S2C phase; class S2D stateNode; class S2E detector; class S2F output; class S3A,S3C handler; class S3B,S3D phase; class S3E output;Color Legend:
Implementation Plan
Plan file:
/home/talon/projects/autoskillit-runs/impl-groupF-20260328-084624-134986/.autoskillit/temp/make-plan/fix_misleading_tests_groupF_plan_2026-03-28_084900.md🤖 Generated with Claude Code via AutoSkillit
Token Usage Summary