diff --git a/src/skills/daily-maintenance/skill.md b/src/skills/daily-maintenance/skill.md index c176006..6dac319 100644 --- a/src/skills/daily-maintenance/skill.md +++ b/src/skills/daily-maintenance/skill.md @@ -60,6 +60,38 @@ for (tool, prefix), n in groups.most_common(30): **The predicate.** An actionable audit finding is **friction-pattern × not-already-covered**, not raw frequency. Frequency alone is noise — it just says Sam used a tool a lot. Friction × not-covered says Sam paid time-cost from the lack of documentation and no existing rule helps. +### Skill usage scan + +A separate aggregation — independent of the friction-pattern scan above. Goal: see which skills Sam actually *discovered* yesterday (i.e., `read_file` on `src/skills//skill.md`). A skill never read across multiple sessions where its `when_to_use` plausibly applied is either undiscovered (catalog problem) or unneeded (delete-it candidate). + +```bash +gcloud storage cat gs://${GCS_DATA_BUCKET:-dembrane-sameer-cli-sam-data}/tool_calls/$(date -u -d yesterday +%F).jsonl \ + | python3 -c " +import sys, json, re, collections +pat = re.compile(r'src/skills/([a-z0-9-]+)/skill\.md') +reads = collections.Counter() +for line in sys.stdin: + line = line.strip() + if not line: continue + try: d = json.loads(line) + except: continue + if d.get('tool') != 'read_file': continue + fp = (d.get('args') or {}).get('file_path') or '' + m = pat.search(fp) + if m: reads[m.group(1)] += 1 +for name, n in reads.most_common(30): + print(f'{n:3d} {name}') +" +``` + +Compare against the list of all skills (`ls src/skills/`). Skills with zero reads yesterday are not automatically a problem — many skills only fire on specific triggers. But over a multi-day window, a skill with consistent zeros while its `when_to_use` triggers obviously fired in the audit log is a real signal: + +- Catalog issue → the skill's `name` / `description` / `when_to_use` isn't surfacing the trigger Sam should pattern-match on. Open a Tier 1 PR refining the frontmatter. +- Skill obsolete → the pattern it codifies has been absorbed into a capability, or the workflow no longer exists. Open a Tier 1 PR to remove the skill (skills accumulate; deleting them is fine). +- Sam genuinely missed it → the skill is right but Sam didn't catch the trigger. Less common; harder to fix with prose. Note in journal, watch over a week before acting. + +§4's daily synthesis appends the per-skill count under `### Skill usage` so future-Sam (and the operator) can see the trend. + ## 2. Propose changes if there's substance If reflection surfaces something concrete to codify, first decide where it belongs using `src/capabilities/self-maintenance.md` ("Where does a change belong?"), then open self-PRs via the same file's flow. **No artificial cap on how many** — open one per distinct concept. @@ -138,6 +170,7 @@ Then append a `## Daily synthesis` section to **yesterday's** journal entry (clo - proposed PRs (from §2 and §3 combined) — one line each. Lead with the behavior change Sam is proposing so future-Sam can grep on intent. The PR number/title is the reference at the end of the line, not the headline. - open threads to pick up today — one line each, named by what Sam should do today, not what happened yesterday. - if the active-blocker count changed yesterday, one line: `blockers: active (see Linear)`. Don't re-list blocker details — they're in Linear. +- a sub-section **`### Skill usage`** — the per-skill `read_file` counts from §1's skill-usage scan, plus any notable zero-usage skills with their flag (catalog issue / obsolete / Sam missed it). One short paragraph; if every skill got expected use, just `(typical usage shape)`. If none of the items above has content, skip the section entirely. Don't write a "nothing to say" placeholder. diff --git a/tests/eval/test_structural.py b/tests/eval/test_structural.py index 28c119c..50b5465 100644 --- a/tests/eval/test_structural.py +++ b/tests/eval/test_structural.py @@ -156,6 +156,11 @@ def test_skill_creator_visible_in_catalog(): skill catalog. If Sam can't find the skill, Sam can't apply its sample-before-codifying rule — and the next refactored predicate hallucinates the way PR #46 did. + + Kept as a named test (not just covered by the parametrized + `test_every_skill_visible_in_catalog` below) because skill-creator + is load-bearing for *other* skills — losing it from the catalog + would silently degrade the whole skill-authoring loop. """ sp = assemble_system_prompt() assert "skill-creator" in sp, ( @@ -164,6 +169,51 @@ def test_skill_creator_visible_in_catalog(): ) +def _all_skill_dirs() -> list: + """Enumerate every src/skills// with a skill.md. Used by the + parametrized catalog-presence test below. Computed from `__file__` + (not from SAM_SRC) because pytest parametrization runs at + collection time, before the autouse SAM_SRC fixture fires.""" + from pathlib import Path + repo_root = Path(__file__).resolve().parent.parent.parent + skills_dir = repo_root / "src" / "skills" + if not skills_dir.exists(): + return [] + return sorted( + p for p in skills_dir.iterdir() + if p.is_dir() + and not p.name.startswith("_") + and (p / "skill.md").exists() + ) + + +@pytest.mark.parametrize("skill_dir", _all_skill_dirs(), ids=lambda p: p.name) +def test_every_skill_visible_in_catalog(skill_dir): + """Every skill in `src/skills//` must appear in the assembled + system prompt's catalog. A skill that's been added to the file + system but doesn't surface in the prompt is functionally invisible + — Sam can't decide to apply a skill it doesn't know exists. + + Catalog discoverability is the load-bearing invariant: skills are + catalog-only by design (bodies are read on demand) so the catalog + entry IS the discovery surface. Parametrized so adding a new skill + automatically gets defended — no manual test addition needed. + + The trigger for adding this test: PR #70 shipped a new `exa-search` + skill that wasn't covered by any catalog-presence test. Without + this parametrized check, the next new skill could ship invisible + the same way. + """ + sp = assemble_system_prompt() + name = skill_dir.name + assert name in sp, ( + f"skill {name!r} exists at {skill_dir} but doesn't appear in " + "the assembled system prompt's catalog — Sam won't discover it. " + "Check the skill's frontmatter (name/description/when_to_use) " + "and the catalog assembly in `src/runtime/prompts.py`." + ) + + def test_skill_creator_still_carries_sampling_rule(): """The body of `src/skills/skill-creator/skill.md` must still contain the sample-before-codifying rule. The catalog only carries the