Skip to content

feat(skill): exa-search — semantic web search + fetch_url decision rule (SAM-43)#70

Merged
spashii merged 2 commits into
mainfrom
sam/exa-search-skill
May 24, 2026
Merged

feat(skill): exa-search — semantic web search + fetch_url decision rule (SAM-43)#70
spashii merged 2 commits into
mainfrom
sam/exa-search-skill

Conversation

@spashii
Copy link
Copy Markdown
Member

@spashii spashii commented May 24, 2026

What this enables

Sam learns when to reach for Exa (semantic URL discovery) vs fetch_url (read a URL you already have). The decision rule lives in the skill's when_to_use field so Sam sees it from the catalog without reading the body.

Consequences

  • Sam stops grepping the open web via fetch_url against URLs it doesn't have — that pattern doesn't actually work.
  • New cost-discipline rules in the skill body keep Exa queries lean: one query per task, refine don't paginate, includeDomains when the source family is known.
  • Dual-path: src/skills/exa-search/skill.md + .agents/skills/exa-search/SKILL.md symlink for OpenCode compat (SAM-45 pattern).

How to verify

  • pytest tests/runtime/test_source_integrity.py — 27 passed (the new skill picks up parametrized frontmatter + cron tests).
  • Next time Sam needs to find a doc page it doesn't have a URL for — catalog should surface this skill.

API shape verified

POST https://api.exa.ai/search, x-api-key header, numResults (camelCase), contents.highlights: true. Confirmed against https://exa.ai/docs/reference/search. The initial draft I had used num_results (snake_case) and would have failed at runtime.

Tier

Tier 1 (skill + frontmatter). No runtime changes.

Closes SAM-43.

… decision rule

Operator added EXA_API_KEY to .env; this skill is the operating manual.

The decision rule lives in 'when_to_use' so Sam sees it from the
catalog without reading the skill body: have the URL → fetch_url;
need to find URLs → exa-search → fetch_url on the chosen one.

API shape verified against https://exa.ai/docs/reference/search:
- POST https://api.exa.ai/search
- Auth via 'x-api-key' header (or Bearer)
- numResults param is camelCase (corrected from the initial draft)
- contents.highlights: true gives snippets without a separate /contents
  endpoint (which doesn't exist — confirmed via docs)

Cost discipline section is load-bearing: Exa charges per query, and
fan-out is the most common token-trap. One query per task; refine
don't paginate; includeDomains when the source family is known.

Dual-path: src/skills/exa-search/skill.md + .agents/skills/exa-search/
SKILL.md symlink (SAM-45 dual-compat pattern from PR #66).

Closes SAM-43.
@spashii spashii enabled auto-merge May 24, 2026 16:35
@linear
Copy link
Copy Markdown

linear Bot commented May 24, 2026

SAM-43

spashii added a commit that referenced this pull request May 24, 2026
…ized catalog test

Two pieces of skill-usage evaluation, both small:

1. Daily-maintenance §1 'Skill usage scan' subsection. Aggregates
   yesterday's audit log for read_file calls on src/skills/<name>/skill.md
   paths → per-skill discovery counts. Operator decision rule:
   consistent zero-reads with obvious triggers = catalog issue
   (refine frontmatter), obsolete skill (delete), or genuinely missed
   (note + watch). Counts land in §4 journal synthesis under a new
   '### Skill usage' sub-section so the trend is queryable.

2. Parametrized 'every skill in src/skills/<name>/' must appear in
   the assembled system prompt's catalog. Added at the eval-harness
   structural layer. Trigger: PR #70 shipped exa-search without any
   catalog-presence test — under the prior single-skill
   (test_skill_creator_visible_in_catalog), a new skill could ship
   invisible. Parametrize fixes it for every future skill automatically.

Plus .gitignore entry for mining/ so the blog scratch from session-jsonl
mining doesn't keep leaking into PRs (the entry on the ask_operator
branch in PR #71 hasn't merged yet).

Together (1) + (2) cover the operator's catalog-presence and
discovery-observability questions on SAM-43. The deeper Opus-as-judge
('did Sam apply the right skill') stays as a separate follow-up.

Tests: 9 skills defended by the parametrize. 16 eval tests pass
(was 8). Full suite: 143 passed locally.
spashii added a commit that referenced this pull request May 24, 2026
## What this enables

Adds `EXA_API_KEY` to `infra/config.yaml`'s `secrets:` map. Terraform
creates the GCP Secret Manager resource; the CI deploy binds it as a
Cloud Run env var via the existing `--set-secrets` flag.

## Why

PR #70 shipped the `exa-search` skill. It calls `api.exa.ai` using
`$EXA_API_KEY` from env. Without this binding, the skill loads in the
catalog but the bash call would fail at runtime — Sam would see the
missing env var and report it as a blocker.

## Runbook for the operator (you)

Two commands locally after merge:

```bash
cd infra && terraform apply           # creates GCP Secret Manager resource (idempotent)
./infra/scripts/upload-secrets.sh     # uploads EXA_API_KEY value from local .env
```

Then the next deploy (any merge to main) picks it up.

## Tier

Tier 3 (`infra/`). One-line config addition. No code change.

## Test plan

- [x] Diff is one line — declarative config, no logic
- [ ] After merge + your two commands + next deploy: Sam invokes the
exa-search skill in a test thread and the curl returns valid JSON (not
401)

Part of SAM-43.
@spashii spashii disabled auto-merge May 24, 2026 17:28
@spashii spashii merged commit 82760c0 into main May 24, 2026
2 checks passed
@spashii spashii deleted the sam/exa-search-skill branch May 24, 2026 17:29
spashii added a commit that referenced this pull request May 24, 2026
)

## What this enables

Two pieces of skill-usage evaluation, addressing the SAM-43 questions on
catalog verification + audit-log-based usage measurement:

### 1. Skill usage scan in daily-maintenance §1

New subsection that aggregates yesterday's audit log for `read_file`
calls on `src/skills/<name>/skill.md` paths → per-skill discovery
counts. Counts land in §4 journal synthesis under a new `### Skill
usage` sub-section so the trend is queryable over time.

Decision rule when a skill has consistent zero reads: catalog issue
(refine frontmatter), obsolete skill (delete), or genuinely missed (note
+ watch). Sam-as-LLM makes the call.

### 2. Parametrized catalog-presence test

`test_every_skill_visible_in_catalog` in `tests/eval/test_structural.py`
parametrizes over every `src/skills/<name>/` directory. Asserts the
skill name appears in the assembled system prompt. Adding a new skill
automatically gets defended — no manual test addition needed.

**Trigger:** PR #70 shipped exa-search without any catalog-presence
test. Under the prior single-skill check, a new skill could ship
invisible. This parametrize fixes that for every future skill.

## Consequences

- Operator sees per-skill usage trends daily without doing any querying.
- Any future skill that doesn't appear in the catalog fails CI
immediately (parametrize includes its case automatically).
- The deeper "did Sam apply the right skill" eval (Opus-as-judge) stays
as a follow-up — it's a meta-eval rubric design problem, not a build
task.

## What this doesn't cover

- Doesn't measure whether Sam applied the skill *correctly* once it was
read. Just discovery.
- Doesn't fire alerts when a skill is consistently zero-used; that's the
operator's daily-synthesis call.

## How to verify

- `pytest tests/eval/test_structural.py` — 16 passed (was 8); 9 new
parametrized cases, one per skill.
- After merge + a real cron fire: §4 journal synthesis includes a `###
Skill usage` block with per-skill counts.

## Bonus

`.gitignore` adds `mining/` so blog-scratch files from session-jsonl
mining don't keep leaking into PRs (the entry on PR #71 hasn't merged
yet).

## Tier

Tier 1 (skill prose + tests). No runtime changes.

Closes the catalog-verification + skill-usage-observability part of
SAM-43.
spashii added a commit that referenced this pull request May 24, 2026
## Why

PR #70 (merged) landed a single `.agents/skills/exa-search/SKILL.md`
symlink as a preview of the OpenCode dual-path pattern, ahead of the
runtime support that was supposed to come in via #66. Review of #66
found the SKILL.md files there aren't real symlinks (see
#66 (comment)), so #66
needs rework.

In the meantime this lone entry is:
- **Inconsistent** — the only `.agents/skills/` member on main, while
every other skill lives only at `src/skills/<name>/skill.md`.
- **Inert** — `src/runtime/prompts.py::_build_skill_catalog` globs
`src/skills/` only, so the `.agents/skills/` symlink isn't read by the
daemon.

Remove it now to keep the tree consistent. Reintroduce the full
dual-path set in one go when #66 is reworked with real symlinks +
matching runtime support.

## Diff

One file: `.agents/skills/exa-search/SKILL.md` (symlink) deleted.
`src/skills/exa-search/skill.md` (the real source) is untouched, so the
skill itself still works exactly as on main today.

## How to verify

- `git ls-tree HEAD .agents/` → empty / no exa-search dir
- `pytest tests/runtime/test_source_integrity.py` — exa-search picked up
via `src/skills/exa-search/skill.md` like every other skill

## Bypass note

Admin-merging to land before #66 is reworked; the change is a one-file
deletion of an inert symlink.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant