Skip to content

Implementation Plan: Generalize Scope Skill for Non-Code Research#794

Merged
Trecek merged 2 commits intointegrationfrom
generalize-scope-skill-for-non-code-research/784
Apr 13, 2026
Merged

Implementation Plan: Generalize Scope Skill for Non-Code Research#794
Trecek merged 2 commits intointegrationfrom
generalize-scope-skill-for-non-code-research/784

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented Apr 13, 2026

Summary

The scope and plan-experiment skills are software-centric in three ways: (1) scope has a
fixed mandatory list of 5 subagents where 4 assume a software codebase, (2) both skills hardcode
src/metrics.rs paths, and (3) scope's report template section names and Known/Unknown matrix
rows assume software context. This plan replaces the fixed subagent list with a suggested menu
(agent selects ≥5), removes all src/metrics.rs hardcoding, renames report sections to domain-
agnostic language, and makes Metric Context conditional. One test contract assertion must be
updated to reflect the section rename.

Architecture Impact

Scenarios Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart LR
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    subgraph S1 ["SCENARIO 1: Code Research — Recipe Happy Path"]
        direction LR
        S1_RECIPE["research recipe<br/>━━━━━━━━━━<br/>orchestrates all phases<br/>scope → plan → review"]
        S1_SCOPE["● scope/SKILL.md<br/>━━━━━━━━━━<br/>parse question<br/>fetch GitHub issue<br/>≥5 parallel subagents"]
        S1_REPORT["scope_report<br/>━━━━━━━━━━<br/>known/unknown matrix<br/>captured as context token"]
        S1_PLAN["● plan-experiment/SKILL.md<br/>━━━━━━━━━━<br/>feasibility subagents A/B/C<br/>YAML frontmatter V1–V9"]
        S1_EXPPLAN["experiment_plan<br/>━━━━━━━━━━<br/>research design<br/>+ implementation phases"]
        S1_REVIEW["review-design<br/>━━━━━━━━━━<br/>verdict: GO / REVISE / STOP"]
    end

    subgraph S2 ["SCENARIO 2: Non-Code Domain Research (Generalized)"]
        direction LR
        S2_USER["domain question<br/>━━━━━━━━━━<br/>e.g. biology / chemistry<br/>social science"]
        S2_SCOPE["● scope/SKILL.md<br/>━━━━━━━━━━<br/>generic subagent menu<br/>domain-aware branches"]
        S2_SUBAGENTS["parallel subagents<br/>━━━━━━━━━━<br/>Prior Art (literature)<br/>Domain Context (structures)<br/>Eval Framework (scales/rubrics)<br/>Data Availability<br/>Complexity"]
        S2_SYNTH["synthesis<br/>━━━━━━━━━━<br/>scope_report written to<br/>temp/scope/"]
        S2_PLAN["● plan-experiment/SKILL.md<br/>━━━━━━━━━━<br/>generic measurement feasibility<br/>no hardcoded metrics.rs path"]
        S2_ARTIFACT["experiment_plan<br/>━━━━━━━━━━<br/>domain-agnostic design<br/>+ data_manifest"]
    end

    subgraph S3 ["SCENARIO 3: Design Revision Loop"]
        direction LR
        S3_REVIEW["review-design<br/>━━━━━━━━━━<br/>verdict = REVISE"]
        S3_GUIDANCE["revision_guidance<br/>━━━━━━━━━━<br/>feedback path token<br/>2nd positional arg"]
        S3_PLAN["● plan-experiment/SKILL.md<br/>━━━━━━━━━━<br/>reads 2nd path token<br/>incorporates feedback<br/>re-runs frontmatter V1–V9"]
        S3_NEW["revised experiment_plan<br/>━━━━━━━━━━<br/>retries ≤ 2"]
    end

    subgraph S4 ["SCENARIO 4: Contract Test Enforcement (CI Gate)"]
        direction LR
        S4_PYTEST["task test-check<br/>━━━━━━━━━━<br/>pytest -n4 --dist worksteal"]
        S4_CONTRACTS["● test_scope_contracts.py<br/>━━━━━━━━━━<br/>Computational Complexity<br/>section present + structured<br/>baseline instruction present"]
        S4_GENERIC["● test_skill_genericization.py<br/>━━━━━━━━━━<br/>no src/metrics.rs<br/>no test_metrics_assess<br/>no AutoSkillit-internal paths"]
        S4_SKILL["● scope/SKILL.md<br/>● plan-experiment/SKILL.md<br/>━━━━━━━━━━<br/>read via pkg_root()"]
        S4_PASS(["CI: PASS / FAIL"])
    end

    %% SCENARIO 1 FLOW %%
    S1_RECIPE -->|"triggers scope step"| S1_SCOPE
    S1_SCOPE -->|"writes"| S1_REPORT
    S1_REPORT -->|"$context.scope_report"| S1_PLAN
    S1_PLAN -->|"writes"| S1_EXPPLAN
    S1_EXPPLAN -->|"feeds"| S1_REVIEW

    %% SCENARIO 2 FLOW %%
    S2_USER -->|"invokes"| S2_SCOPE
    S2_SCOPE -->|"launches ≥5"| S2_SUBAGENTS
    S2_SUBAGENTS -->|"consolidates into"| S2_SYNTH
    S2_SYNTH -->|"scope_report path"| S2_PLAN
    S2_PLAN -->|"writes"| S2_ARTIFACT

    %% SCENARIO 3 FLOW %%
    S3_REVIEW -->|"emits"| S3_GUIDANCE
    S3_GUIDANCE -->|"2nd path token"| S3_PLAN
    S3_PLAN -->|"revised output"| S3_NEW
    S3_NEW -.->|"re-review (≤2x)"| S3_REVIEW

    %% SCENARIO 4 FLOW %%
    S4_PYTEST -->|"runs"| S4_CONTRACTS
    S4_PYTEST -->|"runs"| S4_GENERIC
    S4_CONTRACTS -->|"reads via pkg_root()"| S4_SKILL
    S4_GENERIC -->|"reads via pkg_root()"| S4_SKILL
    S4_CONTRACTS -->|"asserts"| S4_PASS
    S4_GENERIC -->|"asserts"| S4_PASS

    %% CLASS ASSIGNMENTS %%
    class S1_RECIPE,S2_USER,S4_PYTEST cli;
    class S1_REPORT,S2_SYNTH,S2_ARTIFACT,S3_GUIDANCE,S3_NEW,S1_EXPPLAN stateNode;
    class S1_SCOPE,S2_SCOPE,S1_PLAN,S2_PLAN,S3_PLAN,S3_REVIEW,S1_REVIEW handler;
    class S2_SUBAGENTS phase;
    class S4_CONTRACTS,S4_GENERIC,S4_SKILL output;
    class S4_PASS terminal;
Loading

State Lifecycle Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;

    START([INVOKE])

    subgraph Inputs ["INIT_ONLY — Set Once, Never Modified"]
        direction LR
        RQ["● research_question<br/>━━━━━━━━━━<br/>scope: free-text, issue ref,<br/>or domain topic<br/>[INIT_ONLY]"]
        SRP["scope_report_path<br/>━━━━━━━━━━<br/>plan-experiment: required<br/>path to scope output<br/>[INIT_ONLY]"]
    end

    subgraph ResumeTiers ["RESUME DETECTION — INIT_PRESERVE Gate"]
        direction TB
        RevGuid["revision_guidance path<br/>━━━━━━━━━━<br/>INIT_PRESERVE: read once,<br/>absent OR non-existent<br/>→ FIRST PASS<br/>present + exists → REVISE"]
        T1["Tier 1: Explicit<br/>━━━━━━━━━━<br/>revision_guidance present<br/>AND file exists<br/>→ REVISE mode"]
        T2["Tier 2: Default<br/>━━━━━━━━━━<br/>absent / empty / missing<br/>→ FIRST PASS mode"]
    end

    subgraph ScopePhase ["● scope/SKILL.md — Parallel Exploration (≥5 subagents)"]
        direction TB
        S1["● Prior Art / Literature<br/>━━━━━━━━━━<br/>codebase or domain survey<br/>[generic: any domain]"]
        S2["● External Research<br/>━━━━━━━━━━<br/>web search — tools,<br/>methods, papers"]
        S3["● Domain Context<br/>━━━━━━━━━━<br/>software arch OR domain<br/>structures/mechanisms<br/>[generalized]"]
        S4["● Evaluation Framework<br/>━━━━━━━━━━<br/>metrics/rubrics/scales;<br/>explicit absent-flag<br/>if none found"]
        S5["● Computational Complexity<br/>━━━━━━━━━━<br/>dominant op, scaling,<br/>bottlenecks, gotchas<br/>[conditional — skip if N/A]"]
    end

    subgraph ScopeOutput ["DERIVED — scope Output Artifact"]
        SR["scope_report.md<br/>━━━━━━━━━━<br/>Known/Unknown Matrix<br/>Hypotheses, Directions<br/>Metric Context (if found)<br/>[write-once token: scope_report]"]
    end

    subgraph MutablePlan ["MUTABLE — Frontmatter Fields (assembled in plan-experiment Step 3)"]
        direction LR
        ET["experiment_type<br/>━━━━━━━━━━<br/>benchmark | config_study |<br/>causal_inference |<br/>robustness_audit | exploratory"]
        EST["estimand<br/>━━━━━━━━━━<br/>treatment, outcome,<br/>population, contrast"]
        MET["metrics[]<br/>━━━━━━━━━━<br/>name, unit, canonical_name,<br/>collection_method, threshold,<br/>direction, primary"]
        BAS["baselines[]<br/>━━━━━━━━━━<br/>name, version, tuning_budget<br/>[required: benchmark/causal]"]
        SP["statistical_plan<br/>━━━━━━━━━━<br/>test, alpha, power_target,<br/>correction_method, MDE<br/>[waived: exploratory]"]
        ENV["environment<br/>━━━━━━━━━━<br/>type: standard | custom<br/>spec_path (when custom)"]
        SC["success_criteria<br/>━━━━━━━━━━<br/>conclusive_positive,<br/>conclusive_negative,<br/>inconclusive"]
        DM["● data_manifest[]<br/>━━━━━━━━━━<br/>hypothesis, source_type,<br/>description, acquisition,<br/>location, verification<br/>[generalized: any domain]"]
    end

    subgraph ValidationGates ["VALIDATION GATES — Applied in Order Before Frontmatter Write"]
        direction TB
        V1["V1 — baselines required<br/>━━━━━━━━━━<br/>benchmark/causal_inference<br/>→ len(baselines)≥1 + version<br/>[ERROR: abort frontmatter]"]
        V2["V2 — contrast required<br/>━━━━━━━━━━<br/>causal_inference<br/>→ estimand.contrast not null<br/>[ERROR: abort frontmatter]"]
        V3["V3 — statistical_plan<br/>━━━━━━━━━━<br/>!exploratory<br/>→ plan present, test not null<br/>[ERROR: abort frontmatter]"]
        V4["V4 — spec_path required<br/>━━━━━━━━━━<br/>environment.type=custom<br/>→ spec_path not null<br/>[ERROR: abort frontmatter]"]
        V5["V5 — primary metric<br/>━━━━━━━━━━<br/>len(metrics)≥2<br/>→ exactly one primary:true<br/>[WARNING: YAML comment]"]
        V6["V6 — NEW metrics<br/>━━━━━━━━━━<br/>any canonical_name=NEW<br/>→ flag unregistered metric<br/>[WARNING: YAML comment]"]
        V7["V7 — H1 threshold<br/>━━━━━━━━━━<br/>hypothesis_h1<br/>→ must have numeric threshold<br/>[WARNING: YAML comment]"]
        V8["V8 — criteria→metrics link<br/>━━━━━━━━━━<br/>conclusive_positive<br/>→ references ≥1 metric.name<br/>[WARNING: YAML comment]"]
        V9["V9 — data_manifest complete<br/>━━━━━━━━━━<br/>every hypothesis has entry;<br/>external has location+depends_on<br/>[ERROR: abort frontmatter]"]
    end

    subgraph ErrorAccum ["APPEND_ONLY — Error & Warning Accumulation"]
        direction LR
        ERRACC["## Frontmatter Validation Errors<br/>━━━━━━━━━━<br/>V1–V4, V9 ERRORs appended;<br/>NEVER overwritten;<br/>frontmatter OMITTED on any ERROR"]
        WARNACC["# WARNING: YAML comments<br/>━━━━━━━━━━<br/>V5–V8 inline on field lines;<br/>frontmatter still written;<br/>APPEND_ONLY per field"]
    end

    subgraph PlanOutput ["DERIVED — plan-experiment Output Artifact"]
        direction TB
        FMOUT["● experiment_plan.md (with frontmatter)<br/>━━━━━━━━━━<br/>YAML frontmatter + prose plan<br/>written to AUTOSKILLIT_TEMP/<br/>plan-experiment/<br/>[write-once token: experiment_plan]"]
        FMERR["experiment_plan.md (error-only)<br/>━━━━━━━━━━<br/>Prose plan ONLY<br/>+ ## Frontmatter Validation Errors<br/>[on V1–V4 or V9 failure]"]
    end

    subgraph ContractTests ["● Contract Tests — Validation Gates on Skill Content"]
        direction LR
        CT1["● test_scope_contracts.py<br/>━━━━━━━━━━<br/>Asserts Computational Complexity<br/>section exists with all 4 fields;<br/>baseline computation instruction<br/>present [APPEND_ONLY guard]"]
        CT2["● test_skill_genericization.py<br/>━━━━━━━━━━<br/>Asserts no src/metrics.rs,<br/>no test_metrics_assess,<br/>no AutoSkillit-specific paths<br/>[INIT_ONLY content guard]"]
    end

    %% FLOW %%
    START --> RQ
    START --> SRP
    START --> RevGuid

    RevGuid --> T1
    RevGuid --> T2

    RQ --> S1 & S2 & S3 & S4 & S5
    S1 & S2 & S3 & S4 & S5 --> SR

    T1 --> MutablePlan
    T2 --> MutablePlan
    SRP --> MutablePlan
    SR --> MutablePlan

    ET & EST & MET & BAS & SP & ENV & SC & DM --> V1
    V1 --> V2 --> V3 --> V4 --> V9
    V4 --> V5
    V9 --> V5
    V5 --> V6 --> V7 --> V8

    V1 & V2 & V3 & V4 & V9 -->|"ERROR"| ERRACC
    V5 & V6 & V7 & V8 -->|"WARNING"| WARNACC

    ERRACC --> FMERR
    WARNACC --> FMOUT
    V8 -->|"PASS: all gates"| FMOUT

    SR --> CT1
    ET & MET & DM --> CT2

    %% CLASS ASSIGNMENTS %%
    class START terminal;
    class RQ,SRP detector;
    class RevGuid gap;
    class T1,T2 cli;
    class S1,S2,S3,S4,S5 phase;
    class SR output;
    class ET,EST,MET,BAS,SP,ENV,SC,DM stateNode;
    class DM stateNode;
    class V1,V2,V3,V4,V9 detector;
    class V5,V6,V7,V8 gap;
    class ERRACC,WARNACC handler;
    class FMOUT,FMERR output;
    class CT1,CT2 newComponent;
Loading

Module Dependency Diagram

%%{init: {'flowchart': {'nodeSpacing': 55, 'rankSpacing': 75, 'curve': 'basis'}}}%%
graph TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    %% ─── SKILL ASSETS ─────────────────────────────────────────────── %%
    subgraph SkillAssets ["SKILL ASSETS (skills_extended/)"]
        direction LR
        SCOPE["● scope/SKILL.md<br/>━━━━━━━━━━<br/>Research scoping<br/>Generalized: no-code target<br/>+Computational Complexity §<br/>+Known/Unknown matrix"]
        PLAN["● plan-experiment/SKILL.md<br/>━━━━━━━━━━<br/>Experiment plan generator<br/>Generalized: non-code research<br/>YAML frontmatter extraction<br/>V1-V9 validation rules"]
    end

    %% ─── L0: CORE ─────────────────────────────────────────────────── %%
    subgraph L0 ["L0 — CORE (zero autoskillit imports)"]
        direction LR
        CORE["autoskillit.core<br/>━━━━━━━━━━<br/>pkg_root()<br/>io, types, paths<br/>Fan-in: ~104 files"]
    end

    %% ─── L1: WORKSPACE ────────────────────────────────────────────── %%
    subgraph L1 ["L1 — WORKSPACE"]
        direction LR
        RESOLVER["workspace/skills.py<br/>━━━━━━━━━━<br/>DefaultSkillResolver<br/>list_all() → scans skills/ + skills_extended/<br/>resolve(name) → SKILL.md path<br/>Source: BUNDLED_EXTENDED"]
    end

    %% ─── L2: RECIPE VALIDATION ────────────────────────────────────── %%
    subgraph L2 ["L2 — RECIPE VALIDATION"]
        direction LR
        RSCONTENT["recipe/rules_skill_content.py<br/>━━━━━━━━━━<br/>@semantic_rule validators<br/>no-autoskillit-import-in-skill<br/>output-section-no-markdown-directive<br/>hardcoded-origin-remote"]
        RSSKILLS["recipe/rules_skills.py<br/>━━━━━━━━━━<br/>@semantic_rule validators<br/>unknown-skill-command<br/>subset-disabled-skill"]
        RECIPE["recipes/research.yaml<br/>━━━━━━━━━━<br/>run_skill: scope (phase 0)<br/>run_skill: plan-experiment (phase 1)<br/>Chains the two modified skills"]
    end

    %% ─── TESTS ────────────────────────────────────────────────────── %%
    subgraph Tests ["TESTS"]
        direction LR
        TCONTRACTS["● test_scope_contracts.py<br/>━━━━━━━━━━<br/>TestComputationalComplexitySection<br/>5 tests: section exists, 4 fields<br/>present, ordering, regex checks<br/>Imports: autoskillit.core.pkg_root"]
        TGENERIC["● test_skill_genericization.py<br/>━━━━━━━━━━<br/>7 tests: no hardcoded paths<br/>no project-specific metrics<br/>no internal gate references<br/>Imports: pathlib only (stdlib)"]
    end

    %% ─── DEPENDENCY EDGES ─────────────────────────────────────────── %%

    %% Core provides pkg_root to workspace and tests
    CORE -->|"pkg_root()"| RESOLVER
    CORE -->|"pkg_root() import"| TCONTRACTS

    %% Workspace discovers skill assets from filesystem
    RESOLVER -->|"scans filesystem<br/>BUNDLED_EXTENDED"| SCOPE
    RESOLVER -->|"scans filesystem<br/>BUNDLED_EXTENDED"| PLAN

    %% Recipe validation uses workspace (deferred) to resolve skills
    RSCONTENT -.->|"deferred import<br/>DefaultSkillResolver"| RESOLVER
    RSSKILLS -.->|"deferred import<br/>DefaultSkillResolver.list_all()"| RESOLVER

    %% Recipe validation reads skill SKILL.md content (via resolver)
    RSCONTENT -->|"reads SKILL.md content<br/>to apply rules"| SCOPE
    RSCONTENT -->|"reads SKILL.md content<br/>to apply rules"| PLAN

    %% research.yaml references the two skills
    RECIPE -->|"run_skill: scope"| SCOPE
    RECIPE -->|"run_skill: plan-experiment"| PLAN

    %% rules_skills validates recipe skill references
    RSSKILLS -->|"validates skill names<br/>in recipe steps"| RECIPE

    %% Tests read SKILL.md files directly (pathlib / pkg_root)
    TCONTRACTS -->|"reads SKILL.md<br/>via pkg_root()"| SCOPE
    TGENERIC -->|"reads SKILL.md<br/>via pathlib"| SCOPE
    TGENERIC -->|"reads SKILL.md<br/>via pathlib"| PLAN

    %% recipe validation rules import from core
    RSCONTENT -->|"imports from"| CORE
    RSSKILLS -->|"imports from"| CORE

    %% CLASS ASSIGNMENTS %%
    class CORE stateNode;
    class RESOLVER handler;
    class RSCONTENT,RSSKILLS phase;
    class RECIPE output;
    class SCOPE,PLAN gap;
    class TCONTRACTS,TGENERIC detector;
Loading

Closes #784

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/impl-784-20260412-214908-319157/.autoskillit/temp/make-plan/generalize_scope_skill_plan_2026-04-12_000000.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step uncached output cache_read cache_write count time
plan 161 15.6k 685.9k 71.4k 1 6m 58s
verify 132 12.0k 740.6k 49.4k 1 3m 49s
implement 206 13.4k 1.4M 60.6k 1 4m 26s
prepare_pr 60 6.2k 182.5k 25.3k 1 1m 40s
run_arch_lenses 218 25.7k 724.8k 172.7k 3 11m 47s
compose_pr 67 10.0k 253.2k 35.2k 1 2m 31s
Total 844 82.9k 4.0M 414.6k 31m 14s

Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit PR Review — Verdict: changes_requested

)


def test_scope_has_no_hardcoded_metrics_rs() -> None:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] cohesion: test_scope_has_no_hardcoded_metrics_rs and test_plan_experiment_has_no_hardcoded_metrics_rs are scattered in test_skill_genericization.py but they validate skill-specific content that is already covered by the contracts layer (test_scope_contracts.py). Hardcoded-reference checks for individual skills belong either in a dedicated per-skill contract test file or in test_scope_contracts.py alongside the other scope structural assertions, not mixed into the genericization test module.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. The tests/CLAUDE.md placement convention defines tests/skills/ as covering 'skill contract and compliance tests' and test_skill_genericization.py (line 1 docstring) is explicitly for verifying SKILL.md files contain no project-specific AutoSkillit internals. All existing tests (REQ-GEN-001 through REQ-GEN-004) follow the identical pattern: checking SKILL.md content for forbidden strings. The new tests extend REQ-GEN-005 and belong here. test_scope_contracts.py covers structural section-layout assertions for scope, not cross-cutting forbidden-string regression guards — there is no overlap.

"scope/SKILL.md hardcodes 'src/metrics.rs'. "
"Use generic evaluation framework search (REQ-GEN-005)."
)
assert "test_metrics_assess" not in content, (
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] tests: test_scope_has_no_hardcoded_metrics_rs asserts that 'test_metrics_assess' is absent from scope/SKILL.md, but there is no corresponding assertion for 'test_metrics_assess' in test_plan_experiment_has_no_hardcoded_metrics_rs. If 'test_metrics_assess' is a forbidden hardcoded identifier (REQ-GEN-005), the coverage is inconsistent across the two tests.

)

def test_section_between_technical_context_and_hypotheses(self) -> None:
def test_section_between_domain_context_and_hypotheses(self) -> None:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] tests: test_section_between_domain_context_and_hypotheses relies on str.index() which raises ValueError (not AssertionError) if a section heading is missing. A missing '## Domain Context', '## Computational Complexity', or '## Hypotheses' heading would produce an unhandled exception rather than a clear test failure message, making failures harder to diagnose.

{Which canonical metrics from src/metrics.rs apply to this research question.
List each metric name, quality dimension (Accuracy/Parity/Performance), and
current threshold value. Note any gaps where no canonical metric exists.}
## Metric Context *(include only when an evaluation framework was found)*
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] cohesion: '## Metric Context' in scope output template is conditionally emitted (include only when evaluation framework found), but plan-experiment SKILL.md always emits a metrics table and WARNING for NEW metrics. The two skills handle evaluation-framework absence asymmetrically — scope silently omits the section while plan-experiment always renders it. This will cause confusion when both skills are composed in the same workflow.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. The asymmetry serves different pipeline stages: scope is an early-discovery step where omitting an empty Metric Context section is correct (commit 502d2d9 explicitly lists 'make Metric Context conditional' as a design goal). Plan-experiment always renders a Dependent Variables table because an experiment must define what it measures — using 'NEW' as the canonical name handles the no-framework case gracefully. Plan-experiment Subagent A explicitly instructs 'Cross-reference against the scope report's Metric Context section if present; if absent, proceed without it and note the gap', demonstrating deliberate awareness of the conditional handoff between the two skills.

> specific structures, relationships, mechanisms, and processes that are central to
> the research question.

**[EVALUATION FRAMEWORK — Metrics or Assessment]**
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] cohesion: Subagent menu label '[EVALUATION FRAMEWORK — Metrics or Assessment]' differs from the output section it populates ('## Metric Context'). All other menu entries map consistently (e.g. '[PRIOR ART …]' → '## Prior Art', '[DOMAIN CONTEXT …]' → '## Domain Context'), so this mismatch breaks traceability symmetry.

Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit review found 2 blocking issues. See inline comments.

Verdict: changes_requested

Actionable (warning, clear fix):

  • tests/skills/test_skill_genericization.py:72 [cohesion] — new hardcoded-reference tests belong in contracts layer, not genericization module
  • tests/skills/test_skill_genericization.py:74 [tests] — duplicate inline skill_dir path computation across both new tests (brittle path)

Needs decision (warning, ambiguous intent):

  • src/autoskillit/skills_extended/scope/SKILL.md:172 [cohesion] — scope omits ## Metric Context when no framework found, but plan-experiment always renders it; asymmetric behavior needs alignment

Info only:

  • tests/skills/test_skill_genericization.py:80 [tests] — missing test_metrics_assess assertion in plan-experiment test
  • tests/contracts/test_scope_contracts.py:41 [tests] — str.index() raises ValueError not AssertionError on missing headings
  • src/autoskillit/skills_extended/scope/SKILL.md:96 [cohesion] — EVALUATION FRAMEWORK menu label → Metric Context output section name mismatch

@Trecek Trecek added this pull request to the merge queue Apr 13, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 13, 2026
@Trecek Trecek force-pushed the generalize-scope-skill-for-non-code-research/784 branch from 6817f65 to 8950095 Compare April 13, 2026 06:41
@Trecek Trecek enabled auto-merge April 13, 2026 06:54
Trecek and others added 2 commits April 13, 2026 00:12
…ode research

Replace scope's fixed 5-subagent list with a suggested menu (≥5 required),
rename software-centric report sections to domain-agnostic equivalents
(Technical Context→Domain Context, Prior Art in Codebase→Prior Art, make
Metric Context conditional), remove all src/metrics.rs hardcoding from
both scope and plan-experiment skills, and add regression guards to
test_skill_genericization.py enforcing REQ-GEN-005.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… inline path duplication

Removes the duplicated Path(__file__).parent.parent.parent / 'src/autoskillit/skills_extended'
expression in both new tests, replacing it with a module-level SKILLS_EXTENDED_DIR constant
following the existing SKILLS_DIR pattern. Addresses reviewer comment #3071115612 (REQ-GEN-005).
@Trecek Trecek force-pushed the generalize-scope-skill-for-non-code-research/784 branch from 8950095 to 58a5cd7 Compare April 13, 2026 07:12
@Trecek Trecek added this pull request to the merge queue Apr 13, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 13, 2026
@Trecek Trecek added this pull request to the merge queue Apr 13, 2026
Merged via the queue into integration with commit 81bdef3 Apr 13, 2026
2 checks passed
@Trecek Trecek deleted the generalize-scope-skill-for-non-code-research/784 branch April 13, 2026 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant