feat(skill): exa-search — semantic web search + fetch_url decision rule (SAM-43)#70
Merged
Conversation
… decision rule Operator added EXA_API_KEY to .env; this skill is the operating manual. The decision rule lives in 'when_to_use' so Sam sees it from the catalog without reading the skill body: have the URL → fetch_url; need to find URLs → exa-search → fetch_url on the chosen one. API shape verified against https://exa.ai/docs/reference/search: - POST https://api.exa.ai/search - Auth via 'x-api-key' header (or Bearer) - numResults param is camelCase (corrected from the initial draft) - contents.highlights: true gives snippets without a separate /contents endpoint (which doesn't exist — confirmed via docs) Cost discipline section is load-bearing: Exa charges per query, and fan-out is the most common token-trap. One query per task; refine don't paginate; includeDomains when the source family is known. Dual-path: src/skills/exa-search/skill.md + .agents/skills/exa-search/ SKILL.md symlink (SAM-45 dual-compat pattern from PR #66). Closes SAM-43.
2 tasks
spashii
added a commit
that referenced
this pull request
May 24, 2026
…ized catalog test Two pieces of skill-usage evaluation, both small: 1. Daily-maintenance §1 'Skill usage scan' subsection. Aggregates yesterday's audit log for read_file calls on src/skills/<name>/skill.md paths → per-skill discovery counts. Operator decision rule: consistent zero-reads with obvious triggers = catalog issue (refine frontmatter), obsolete skill (delete), or genuinely missed (note + watch). Counts land in §4 journal synthesis under a new '### Skill usage' sub-section so the trend is queryable. 2. Parametrized 'every skill in src/skills/<name>/' must appear in the assembled system prompt's catalog. Added at the eval-harness structural layer. Trigger: PR #70 shipped exa-search without any catalog-presence test — under the prior single-skill (test_skill_creator_visible_in_catalog), a new skill could ship invisible. Parametrize fixes it for every future skill automatically. Plus .gitignore entry for mining/ so the blog scratch from session-jsonl mining doesn't keep leaking into PRs (the entry on the ask_operator branch in PR #71 hasn't merged yet). Together (1) + (2) cover the operator's catalog-presence and discovery-observability questions on SAM-43. The deeper Opus-as-judge ('did Sam apply the right skill') stays as a separate follow-up. Tests: 9 skills defended by the parametrize. 16 eval tests pass (was 8). Full suite: 143 passed locally.
spashii
added a commit
that referenced
this pull request
May 24, 2026
## What this enables Adds `EXA_API_KEY` to `infra/config.yaml`'s `secrets:` map. Terraform creates the GCP Secret Manager resource; the CI deploy binds it as a Cloud Run env var via the existing `--set-secrets` flag. ## Why PR #70 shipped the `exa-search` skill. It calls `api.exa.ai` using `$EXA_API_KEY` from env. Without this binding, the skill loads in the catalog but the bash call would fail at runtime — Sam would see the missing env var and report it as a blocker. ## Runbook for the operator (you) Two commands locally after merge: ```bash cd infra && terraform apply # creates GCP Secret Manager resource (idempotent) ./infra/scripts/upload-secrets.sh # uploads EXA_API_KEY value from local .env ``` Then the next deploy (any merge to main) picks it up. ## Tier Tier 3 (`infra/`). One-line config addition. No code change. ## Test plan - [x] Diff is one line — declarative config, no logic - [ ] After merge + your two commands + next deploy: Sam invokes the exa-search skill in a test thread and the curl returns valid JSON (not 401) Part of SAM-43.
spashii
added a commit
that referenced
this pull request
May 24, 2026
) ## What this enables Two pieces of skill-usage evaluation, addressing the SAM-43 questions on catalog verification + audit-log-based usage measurement: ### 1. Skill usage scan in daily-maintenance §1 New subsection that aggregates yesterday's audit log for `read_file` calls on `src/skills/<name>/skill.md` paths → per-skill discovery counts. Counts land in §4 journal synthesis under a new `### Skill usage` sub-section so the trend is queryable over time. Decision rule when a skill has consistent zero reads: catalog issue (refine frontmatter), obsolete skill (delete), or genuinely missed (note + watch). Sam-as-LLM makes the call. ### 2. Parametrized catalog-presence test `test_every_skill_visible_in_catalog` in `tests/eval/test_structural.py` parametrizes over every `src/skills/<name>/` directory. Asserts the skill name appears in the assembled system prompt. Adding a new skill automatically gets defended — no manual test addition needed. **Trigger:** PR #70 shipped exa-search without any catalog-presence test. Under the prior single-skill check, a new skill could ship invisible. This parametrize fixes that for every future skill. ## Consequences - Operator sees per-skill usage trends daily without doing any querying. - Any future skill that doesn't appear in the catalog fails CI immediately (parametrize includes its case automatically). - The deeper "did Sam apply the right skill" eval (Opus-as-judge) stays as a follow-up — it's a meta-eval rubric design problem, not a build task. ## What this doesn't cover - Doesn't measure whether Sam applied the skill *correctly* once it was read. Just discovery. - Doesn't fire alerts when a skill is consistently zero-used; that's the operator's daily-synthesis call. ## How to verify - `pytest tests/eval/test_structural.py` — 16 passed (was 8); 9 new parametrized cases, one per skill. - After merge + a real cron fire: §4 journal synthesis includes a `### Skill usage` block with per-skill counts. ## Bonus `.gitignore` adds `mining/` so blog-scratch files from session-jsonl mining don't keep leaking into PRs (the entry on PR #71 hasn't merged yet). ## Tier Tier 1 (skill prose + tests). No runtime changes. Closes the catalog-verification + skill-usage-observability part of SAM-43.
spashii
added a commit
that referenced
this pull request
May 24, 2026
## Why PR #70 (merged) landed a single `.agents/skills/exa-search/SKILL.md` symlink as a preview of the OpenCode dual-path pattern, ahead of the runtime support that was supposed to come in via #66. Review of #66 found the SKILL.md files there aren't real symlinks (see #66 (comment)), so #66 needs rework. In the meantime this lone entry is: - **Inconsistent** — the only `.agents/skills/` member on main, while every other skill lives only at `src/skills/<name>/skill.md`. - **Inert** — `src/runtime/prompts.py::_build_skill_catalog` globs `src/skills/` only, so the `.agents/skills/` symlink isn't read by the daemon. Remove it now to keep the tree consistent. Reintroduce the full dual-path set in one go when #66 is reworked with real symlinks + matching runtime support. ## Diff One file: `.agents/skills/exa-search/SKILL.md` (symlink) deleted. `src/skills/exa-search/skill.md` (the real source) is untouched, so the skill itself still works exactly as on main today. ## How to verify - `git ls-tree HEAD .agents/` → empty / no exa-search dir - `pytest tests/runtime/test_source_integrity.py` — exa-search picked up via `src/skills/exa-search/skill.md` like every other skill ## Bypass note Admin-merging to land before #66 is reworked; the change is a one-file deletion of an inert symlink.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this enables
Sam learns when to reach for Exa (semantic URL discovery) vs
fetch_url(read a URL you already have). The decision rule lives in the skill'swhen_to_usefield so Sam sees it from the catalog without reading the body.Consequences
fetch_urlagainst URLs it doesn't have — that pattern doesn't actually work.includeDomainswhen the source family is known.src/skills/exa-search/skill.md+.agents/skills/exa-search/SKILL.mdsymlink for OpenCode compat (SAM-45 pattern).How to verify
pytest tests/runtime/test_source_integrity.py— 27 passed (the new skill picks up parametrized frontmatter + cron tests).API shape verified
POST https://api.exa.ai/search,
x-api-keyheader,numResults(camelCase),contents.highlights: true. Confirmed against https://exa.ai/docs/reference/search. The initial draft I had usednum_results(snake_case) and would have failed at runtime.Tier
Tier 1 (skill + frontmatter). No runtime changes.
Closes SAM-43.