Skip to content

Implementation Plan: Ensure Research Experiments Include Test Infrastructure#797

Merged
Trecek merged 8 commits intointegrationfrom
ensure-research-experiments-include-test-infrastructure/786
Apr 13, 2026
Merged

Implementation Plan: Ensure Research Experiments Include Test Infrastructure#797
Trecek merged 8 commits intointegrationfrom
ensure-research-experiments-include-test-infrastructure/786

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented Apr 13, 2026

Summary

The research recipe's test_check gate runs after generate_report and expects pytest-discoverable tests to exist in the experiment worktree. Currently neither plan-experiment nor implement-experiment directs the agent to create test files — so test_check reliably fails on research projects, routing through fix_tests → retest → escalate_stop.

The fix is entirely upstream of test_check. Two SKILL.md files need additions:

  1. plan-experiment/SKILL.md — add a tests/ folder to the directory layout template, add pytest to the environment specification, and add a "Phase 5: Test Infrastructure" to the Implementation Phases template.
  2. implement-experiment/SKILL.md — add test-file creation to the ALWAYS list and to Step 4, and allow pytest --collect-only as a verification step (but not the full suite).

No recipe YAML changes are needed — the test_check gate is already correctly placed.

Architecture Impact

Scenarios Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart LR
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    %% ─────────────────────────────────────────── %%
    %% SCENARIO 1: Plan Experiment                 %%
    %% ─────────────────────────────────────────── %%
    subgraph S1 ["SCENARIO 1: Plan Experiment → Test Infrastructure Plan"]
        direction LR
        S1_IN["Scope Report<br/>━━━━━━━━━━<br/>Input to skill"]
        S1_SKILL["● plan-experiment<br/>━━━━━━━━━━<br/>SKILL.md"]
        S1_ASSESS["Feasibility<br/>Subagents<br/>━━━━━━━━━━<br/>measure · data · env"]
        S1_LAYOUT["Dir Layout<br/>━━━━━━━━━━<br/>tests/<br/>conftest.py<br/>test_{script}.py"]
        S1_PHASE["Test Infrastructure<br/>Phase<br/>━━━━━━━━━━<br/>create · write · collect"]
        S1_ENV["environment.yml<br/>━━━━━━━━━━<br/>pytest required"]
        S1_OUT["experiment_plan<br/>━━━━━━━━━━<br/>temp/ token emitted"]
    end

    S1_IN -->|reads| S1_SKILL
    S1_SKILL -->|launches| S1_ASSESS
    S1_ASSESS -->|informs plan| S1_LAYOUT
    S1_LAYOUT -->|includes| S1_PHASE
    S1_PHASE -->|declares dep| S1_ENV
    S1_ENV -->|saves| S1_OUT

    %% ─────────────────────────────────────────── %%
    %% SCENARIO 2: Implement Experiment with Tests %%
    %% ─────────────────────────────────────────── %%
    subgraph S2 ["SCENARIO 2: Implement Experiment → Test File Creation"]
        direction LR
        S2_PLAN["experiment_plan<br/>━━━━━━━━━━<br/>from Scenario 1"]
        S2_SKILL["● implement-experiment<br/>━━━━━━━━━━<br/>SKILL.md"]
        S2_WT["Git Worktree<br/>━━━━━━━━━━<br/>isolated branch"]
        S2_SCRIPT["Scripts<br/>━━━━━━━━━━<br/>analysis.py etc."]
        S2_TESTS["test_{script}.py<br/>━━━━━━━━━━<br/>per script in tests/"]
        S2_COLLECT["pytest --collect-only<br/>━━━━━━━━━━<br/>verify discovery"]
        S2_COMMIT["Commit<br/>━━━━━━━━━━<br/>test suite + scripts"]
    end

    S2_PLAN -->|inputs to| S2_SKILL
    S2_SKILL -->|creates| S2_WT
    S2_WT -->|implements| S2_SCRIPT
    S2_SCRIPT -->|also creates| S2_TESTS
    S2_TESTS -->|validated by| S2_COLLECT
    S2_COLLECT -->|on success| S2_COMMIT

    %% ─────────────────────────────────────────── %%
    %% SCENARIO 3: Contract Guard — plan-experiment%%
    %% ─────────────────────────────────────────── %%
    subgraph S3 ["SCENARIO 3: Contract Guard — plan-experiment"]
        direction LR
        S3_PYTEST["pytest<br/>━━━━━━━━━━<br/>task test-check"]
        S3_CONTRACT["● test_plan_experiment<br/>_contracts.py<br/>━━━━━━━━━━<br/>6 contract tests"]
        S3_SKILL["plan-experiment<br/>━━━━━━━━━━<br/>SKILL.md (on disk)"]
        S3_CHECKS["Assertions<br/>━━━━━━━━━━<br/>data_manifest · tests/<br/>Test Infra phase · pytest"]
        S3_RESULT["PASS / FAIL<br/>━━━━━━━━━━<br/>contract enforced"]
    end

    S3_PYTEST -->|runs| S3_CONTRACT
    S3_CONTRACT -->|reads| S3_SKILL
    S3_SKILL -->|content checked by| S3_CHECKS
    S3_CHECKS -->|adjudicates| S3_RESULT

    %% ───────────────────────────────────────────────── %%
    %% SCENARIO 4: Contract Guard — implement-experiment %%
    %% ───────────────────────────────────────────────── %%
    subgraph S4 ["SCENARIO 4: Contract Guard — implement-experiment (NEW)"]
        direction LR
        S4_PYTEST["pytest<br/>━━━━━━━━━━<br/>task test-check"]
        S4_CONTRACT["★ test_implement_experiment<br/>_contracts.py<br/>━━━━━━━━━━<br/>3 contract tests"]
        S4_SKILL["implement-experiment<br/>━━━━━━━━━━<br/>SKILL.md (on disk)"]
        S4_CHECKS["Assertions<br/>━━━━━━━━━━<br/>ALWAYS write tests<br/>test_ naming<br/>--collect-only"]
        S4_RESULT["PASS / FAIL<br/>━━━━━━━━━━<br/>contract enforced"]
    end

    S4_PYTEST -->|runs| S4_CONTRACT
    S4_CONTRACT -->|reads| S4_SKILL
    S4_SKILL -->|content checked by| S4_CHECKS
    S4_CHECKS -->|adjudicates| S4_RESULT

    %% CLASS ASSIGNMENTS %%
    class S1_IN,S2_PLAN,S3_PYTEST,S4_PYTEST cli;
    class S1_SKILL,S2_SKILL handler;
    class S1_ASSESS,S1_PHASE,S2_COLLECT phase;
    class S1_LAYOUT,S1_ENV,S2_WT,S2_SCRIPT stateNode;
    class S1_OUT,S2_TESTS,S2_COMMIT output;
    class S3_CONTRACT,S4_CHECKS,S3_CHECKS detector;
    class S3_SKILL,S4_SKILL stateNode;
    class S3_RESULT,S4_RESULT output;
    class S4_CONTRACT newComponent;
Loading

State Lifecycle Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    START([SKILL INVOCATION])

    subgraph PlanDirectives ["● PLAN-EXPERIMENT SKILL.MD — DIRECTIVE STATE (INIT_ONLY)"]
        direction TB
        TestsDir["● tests/ in Directory Layout<br/>━━━━━━━━━━<br/>INIT_ONLY: planned once<br/>enforce: test_plan_experiment_includes_tests_directory_in_layout"]
        TestPhase["● Test Infrastructure Phase<br/>━━━━━━━━━━<br/>INIT_ONLY: planned once<br/>enforce: test_plan_experiment_has_test_infrastructure_phase"]
        PytestEnv["● pytest in Environment Spec<br/>━━━━━━━━━━<br/>INIT_ONLY: environment.yml<br/>enforce: test_plan_experiment_environment_mentions_pytest"]
        DataManifest["● data_manifest Schema<br/>━━━━━━━━━━<br/>INIT_ONLY: source_type, acquisition<br/>verification, hypothesis<br/>enforce: test_data_manifest_required_fields"]
    end

    subgraph PlanGates ["PLAN VALIDATION GATES (V1–V9)"]
        direction TB
        V1V4["V1–V4 ERROR Gates<br/>━━━━━━━━━━<br/>baseline present, contrast set<br/>statistical_plan present, spec_path set<br/>FAIL → suppress frontmatter, emit errors"]
        V5V8["V5–V8 WARNING Gates<br/>━━━━━━━━━━<br/>primary metric uniqueness<br/>NEW canonical names, numeric thresholds<br/>CONTINUE → inline YAML comments"]
        V9["V9 ERROR Gate<br/>━━━━━━━━━━<br/>data_manifest completeness<br/>hypothesis coverage<br/>FAIL → suppress frontmatter"]
    end

    subgraph PlanOutput ["PLAN OUTPUT STATE"]
        direction TB
        FrontmatterBlock["Frontmatter Block<br/>━━━━━━━━━━<br/>MUTABLE → written iff V1-V4/V9 pass<br/>INIT_ONLY once written to file"]
        ExperimentPlanFile["experiment_plan token<br/>━━━━━━━━━━<br/>INIT_PRESERVE: last line of output<br/>pipeline captures for handoff"]
    end

    subgraph ImplDirectives ["● IMPLEMENT-EXPERIMENT SKILL.MD — DIRECTIVE STATE"]
        direction TB
        AlwaysWrite["● ALWAYS: write test_*.py<br/>━━━━━━━━━━<br/>INIT_ONLY directive<br/>enforce: test_implement_experiment_always_includes_test_creation"]
        Step4Tests["● Step 4: test_*.py per script<br/>━━━━━━━━━━<br/>MUTABLE: one file per script created<br/>enforce: test_implement_experiment_step4_mentions_test_files"]
        CollectOnly["● pytest --collect-only gate<br/>━━━━━━━━━━<br/>MUTABLE: runs after each test file<br/>enforce: test_implement_experiment_allows_pytest_collect_only"]
    end

    subgraph CheckpointState ["IMPLEMENT RESUME CHECKPOINT STATE"]
        direction TB
        EarlyTokens["worktree_path + branch_name tokens<br/>━━━━━━━━━━<br/>INIT_PRESERVE: emitted Step 1 cont.<br/>captured by pipeline before context limit"]
        BaseBranchFile["base-branch checkpoint file<br/>━━━━━━━━━━<br/>INIT_PRESERVE: written immediately<br/>survives context exhaustion"]
        WorktreeOnDisk["Worktree on disk<br/>━━━━━━━━━━<br/>INIT_PRESERVE: never deleted by skill<br/>orchestrator discovers via tokens"]
    end

    subgraph TestsState ["APPEND_ONLY TEST ARTIFACT STATE"]
        direction TB
        TestsDirectory["tests/ directory<br/>━━━━━━━━━━<br/>APPEND_ONLY: grows per script<br/>NEVER removed or overwritten"]
        GitCommits["Git commit history<br/>━━━━━━━━━━<br/>APPEND_ONLY: per-phase commits<br/>NEVER force-pushed by skill"]
    end

    subgraph ImplGates ["IMPLEMENT VALIDATION GATES"]
        direction TB
        DiscoveryGate["pytest --collect-only<br/>━━━━━━━━━━<br/>MUST pass before each commit<br/>FAIL → fix imports, re-run"]
        PrecommitGate["pre-commit run --all-files<br/>━━━━━━━━━━<br/>Step 6 final gate<br/>FAIL → fix, re-stage, retry"]
    end

    subgraph ContractEnforcement ["CONTRACT ENFORCEMENT LAYER"]
        direction TB
        PlanContracts["● test_plan_experiment_contracts.py<br/>━━━━━━━━━━<br/>6 tests: data_manifest schema<br/>tests/ layout, test phase, pytest env"]
        ImplContracts["★ test_implement_experiment_contracts.py<br/>━━━━━━━━━━<br/>3 tests: ALWAYS write tests<br/>test_ filenames, collect-only gate"]
    end

    END([HANDOFF REPORT])

    %% FLOW %%
    START --> PlanDirectives
    PlanDirectives --> PlanGates
    V1V4 --> FrontmatterBlock
    V5V8 --> FrontmatterBlock
    V9 --> FrontmatterBlock
    FrontmatterBlock --> ExperimentPlanFile
    ExperimentPlanFile --> ImplDirectives

    AlwaysWrite --> EarlyTokens
    Step4Tests --> TestsDirectory
    CollectOnly --> DiscoveryGate
    EarlyTokens --> BaseBranchFile
    BaseBranchFile --> WorktreeOnDisk
    DiscoveryGate --> TestsDirectory
    TestsDirectory --> GitCommits
    GitCommits --> PrecommitGate
    PrecommitGate --> END

    %% CONTRACT LAYER VALIDATES DIRECTIVE STATE %%
    PlanContracts -.->|validates directives present| PlanDirectives
    ImplContracts -.->|validates directives present| ImplDirectives

    %% CLASS ASSIGNMENTS %%
    class START,END terminal;
    class TestsDir,TestPhase,PytestEnv,DataManifest gap;
    class V1V4,V9 detector;
    class V5V8 handler;
    class FrontmatterBlock,ExperimentPlanFile stateNode;
    class AlwaysWrite,Step4Tests,CollectOnly gap;
    class EarlyTokens,BaseBranchFile,WorktreeOnDisk phase;
    class TestsDirectory,GitCommits handler;
    class DiscoveryGate,PrecommitGate detector;
    class PlanContracts output;
    class ImplContracts newComponent;
Loading

Closes #786

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/impl-786-20260413-002705-648780/.autoskillit/temp/make-plan/ensure_research_experiments_test_infra_plan_2026-04-13_000000.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step uncached output cache_read cache_write count time
plan 199 11.4k 1.1M 76.9k 1 3m 44s
verify 60 7.8k 213.5k 51.0k 1 2m 53s
implement 298 15.0k 1.7M 49.7k 1 4m 51s
prepare_pr 68 5.0k 200.5k 22.0k 1 2m 20s
run_arch_lenses 2.0k 13.0k 319.9k 48.4k 2 5m 22s
compose_pr 59 7.2k 188.1k 27.4k 1 1m 53s
Total 2.7k 59.4k 3.7M 275.5k 21m 5s

Comment thread tests/skills_extended/test_implement_experiment_contracts.py Outdated
Comment thread tests/skills_extended/test_implement_experiment_contracts.py Outdated
"""plan-experiment/SKILL.md Experiment Directory Layout must include a tests/ folder."""
content = SKILL_PATH.read_text()
# The directory layout template must show a tests/ directory
assert "tests/" in content, (
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] tests: assert 'tests/' in content will pass trivially if the string 'tests/' appears anywhere in the file (prose, path reference, or prior section), not necessarily in the Experiment Directory Layout template. Consider scoping to the layout section by splitting on a known heading.

def test_plan_experiment_environment_mentions_pytest() -> None:
"""plan-experiment/SKILL.md must mention pytest in the environment specification."""
content = SKILL_PATH.read_text()
assert "pytest" in content, (
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] tests: assert 'pytest' in content is document-wide and passes as long as 'pytest' appears anywhere in the file. The docstring says this validates the environment specification section, but the test does not enforce that. Scope to the environment section for the assertion to be meaningful.

def test_implement_experiment_allows_pytest_collect_only() -> None:
"""implement-experiment must allow running pytest --collect-only as a verification step."""
text = SKILL_PATH.read_text()
assert "collect-only" in text or "collect_only" in text, (
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] tests: assert 'collect-only' in text or 'collect_only' in text checks two spellings, but the SKILL.md only uses the canonical CLI spelling 'collect-only'. The 'collect_only' branch is dead code that can never be triggered by the current SKILL.md content.

Comment thread tests/contracts/test_plan_experiment_contracts.py
Comment thread tests/contracts/test_implement_experiment_contracts.py
- Verify metrics are collected correctly
- Confirm end-to-end pipeline works before committing to full runs

### Phase 5: Test Infrastructure
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] cohesion: plan-experiment labels test work 'Phase 5: Test Infrastructure' (a separate dedicated phase), but implement-experiment's detail section calls it 'Test creation (required alongside each script phase)' and ties it to Phases 2 and 3. The two skills use different mental models for when test infrastructure is produced (separate phase vs. inline with each script phase), creating a planning/implementation mismatch.

- Put all experiment artifacts in one self-contained `research/` subfolder
- Commit per phase with descriptive messages
- Leave the worktree intact when done
- Write `tests/test_{script_name}.py` alongside each experiment script created in Step 4
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] slop: The ALWAYS bullet at L62-L63 duplicates instructions restated in full detail in the 'Test creation' block at L192-L205. Both deliver the same directive; the ALWAYS bullet is redundant given the expanded block below.

def test_plan_experiment_includes_tests_directory_in_layout() -> None:
"""plan-experiment/SKILL.md Experiment Directory Layout must include a tests/ folder."""
content = SKILL_PATH.read_text()
# The directory layout template must show a tests/ directory
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] slop: Inline comment '# The directory layout template must show a tests/ directory' restates exactly what the assertion and the function name already communicate. It adds no information.

def test_implement_experiment_always_includes_test_creation() -> None:
"""implement-experiment ALWAYS list must include a directive to write test files."""
text = SKILL_PATH.read_text()
# The ALWAYS block must reference writing test files
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] slop: Inline comment '# The ALWAYS block must reference writing test files' restates the obvious — the assertion immediately below it checks exactly that. The comment is redundant.

)


def test_plan_experiment_environment_mentions_pytest() -> None:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] cohesion: test_plan_experiment_environment_mentions_pytest checks that 'pytest' appears anywhere in the file. The parallel implement-experiment contract test_implement_experiment_allows_pytest_collect_only checks for 'collect-only' specifically. The two tests use different assertion granularity for analogous coverage, which is asymmetric across the paired skills.

)


def test_implement_experiment_step4_mentions_test_files() -> None:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] cohesion: test_implement_experiment_step4_mentions_test_files asserts 'test_' appears anywhere in the SKILL.md. This does not verify the string is in or near Step 4, contrary to the test name and docstring. The plan-experiment tests use more precise section-scoped assertions.

- Merge the worktree branch into any branch
- Delete or remove the worktree
- Run the full test suite (the orchestrator handles testing)
- Run the full test suite — `pytest` with no args or targeting the entire repo
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] slop: The NEVER list item at L48-L49 rewording adds a parenthetical that duplicates reasoning stated at L205 ('Do NOT run the full test suite — that is the orchestrator's responsibility via test_check'). The constraint is now stated in two places within close proximity.

Comment thread src/autoskillit/skills_extended/implement-experiment/SKILL.md Outdated
**Option A — No custom environment needed:**
{The project's existing toolchain is sufficient because {reason}. No
environment.yml will be created.}
Verify that `pytest` is available in the existing toolchain (`pytest --version`). If not, note that test_check requires pytest to pass.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] slop: The sentence 'Verify that pytest is available in the existing toolchain (pytest --version). If not, note that test_check requires pytest to pass.' is appended inline to the Option A prose block without a line break. The same requirement is already covered by Phase 5 (L264-L270) and the Option B dependency list. The inline sentence is a redundant clarification.

Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit review found 4 blocking issues (verdict: changes_requested). See inline comments for details.

Actionable findings (requires fix):

  • tests/skills_extended/test_implement_experiment_contracts.py L21: weak compound assertion in test_implement_experiment_always_includes_test_creation
  • tests/skills_extended/test_implement_experiment_contracts.py L30: vacuous assert 'test_' in text in test_implement_experiment_step4_mentions_test_files
  • tests/skills_extended/test_implement_experiment_contracts.py L1: _contracts naming convention mismatched to tests/skills_extended/ placement
  • src/autoskillit/skills_extended/implement-experiment/SKILL.md L205: 'Do NOT run the full test suite' stated three times (NEVER/ALWAYS/inline)

Decision findings (human review requested):

  • tests/contracts/test_plan_experiment_contracts.py L34: asymmetric placement — plan-experiment contracts in tests/contracts/, implement-experiment contracts in tests/skills_extended/
  • src/autoskillit/skills_extended/plan-experiment/SKILL.md L264: planning/implementation mismatch — plan-experiment models tests as Phase 5 (separate), implement-experiment models as inline with Phases 2-3

@Trecek Trecek enabled auto-merge April 13, 2026 08:49
Trecek and others added 7 commits April 13, 2026 02:05
Add test_implement_experiment_contracts.py (3 tests) and 3 new functions
to test_plan_experiment_contracts.py. These verify that both SKILL.md files
include test infrastructure directives — tests directory layout, pytest env
mention, test infrastructure phase, ALWAYS list directive, Step 4 test_
references, and pytest --collect-only verification.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add tests/ folder to the Experiment Directory Layout template (with conftest.py
and test_{script}.py stubs), add pytest to the Option B environment.yml
dependencies and an availability note to Option A, and add Phase 5: Test
Infrastructure to the Implementation Phases template.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…L.md

Add two ALWAYS bullets directing the agent to write test_*.py files alongside
each experiment script and verify discovery with pytest --collect-only. Add
Step 4 test creation section with concrete instructions for tests/ layout,
conftest.py, per-script test files, and collect-only verification. Narrow
the NEVER bullet from "full test suite" to clarify pytest --collect-only and
per-directory verification runs are permitted.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tracts/

SKILL.md contract tests belong in tests/contracts/ per CLAUDE.md convention.
Move from tests/skills_extended/ to tests/contracts/ to match placement of
test_plan_experiment_contracts.py and align with the established pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…contracts

Replace broad token-membership checks with specific pattern assertions:
- test_implement_experiment_always_includes_test_creation: assert 'tests/test_'
  in the ALWAYS section instead of loose 'test'+'write/creat' compound check
- test_implement_experiment_step4_mentions_test_files: scope to Step 4 text
  and assert 'tests/test_' instead of document-wide 'test_' check

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ment-experiment SKILL.md

The constraint appeared three times (NEVER list, Step 4 footer, Step 6 footer).
Remove the two redundant inline repetitions; the NEVER list at the top is the
single authoritative statement, eliminating divergence risk.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Trecek Trecek force-pushed the ensure-research-experiments-include-test-infrastructure/786 branch from 1dbecec to 443b057 Compare April 13, 2026 09:05
@Trecek Trecek added this pull request to the merge queue Apr 13, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 13, 2026
@Trecek Trecek added this pull request to the merge queue Apr 13, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 13, 2026
@Trecek Trecek enabled auto-merge April 13, 2026 09:22
@Trecek Trecek added this pull request to the merge queue Apr 13, 2026
Merged via the queue into integration with commit ee9b1eb Apr 13, 2026
2 checks passed
@Trecek Trecek deleted the ensure-research-experiments-include-test-infrastructure/786 branch April 13, 2026 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant