-
Notifications
You must be signed in to change notification settings - Fork 0
discovery mining
github-actions[bot] edited this page Jun 13, 2026
·
2 revisions
parent_design: ../../memoryvault.md part_slug: discovery-mining title: "Discovery + mining — transcript reflection + personal-skills indexer + internet skill-scan" status: pending visibility: published author: Alex Herrero contributors: [] created: 2026-05-15 updated: 2026-05-15 last_major_revision: 2026-05-15 dependencies: [write-primitives, recall-loop, reflection-and-recovery, idea-ledger, seed-pass] estimated_scope: L plan: "7b" prd: project:
Parent design: MemoryVault — see Detailed Design §8 (Discovery + mining) for full architectural context. This is the single part for plan #7b — ships after #7a has been dogfooded for 1-2 weeks of real use, so designs can absorb lessons from #7a before adding the discovery layer.
Three sub-components, all in plan #7b:
1. Transcript reflection pass (one-time + ongoing):
-
One-time pass: run the reflection-and-recovery logic against the historical Claude Code transcripts at
~/.claude/projects/*/. Mines months of past sessions for the 3 extraction categories + idea candidates, retroactively populating MemoryVault from work that pre-dates the skill. -
Ongoing pass: the existing Stop + idle hooks (shipped in
reflection-and-recoverypart) already handle each new session. This sub-component just extends them to optionally scan transcripts from sessions that ran on other machines or other Claude Code installations the user might have. -
Seed manifest awareness: this pass reads
MemoryVault/_meta/seed-manifest-20260515.md(written by theseed-passpart) and skips re-capturing content that was already seeded — avoids duplication.
2. Personal-skills auto-indexer:
- Walks every
SKILL.mdincrickets/skills/+agentm/.claude/skills/(plus any other sibling-installed repos with skill directories). - Writes one MemoryVault entry per SKILL.md to
MemoryVault/personal-skills/<repo>/<skill-name>.mdwith frontmatter (kind: skill-pointer,source_path,source_repo,last_indexed,skill_version). - Entry body = skill manifest summary + the skill's
description+ key sub-commands + tool allowlist. - Runs at toolkit install time + on
/releaseevents + on-demand via/memory index-skills. - Pre-hook injection (from
recall-looppart) merges entries from BOTHpersonal-private/(preferences) ANDpersonal-skills/(available skills) at query time — so the agent learns "we have a/design authorskill" without being told every session.
3. Internet skill-discovery scan:
- Periodic scan (cadence TBD — weekly default; configurable via
memory.skill_discovery_cadence) of curated sources for SKILL.md-shaped patterns worth adopting. -
Source whitelist (TBD precise list — to settle during implementation): GitHub trending with
claude-code/agent-skills/claude-skillstags + named awesome-lists (curated by the user during implementation) + Anthropic Cookbook + named blog feeds. The source whitelist is itself a MemoryVault entry atMemoryVault/personal-private/skill-discovery-sources.mdso the user can edit it without code changes. -
Adapt-don't-import principle: when a relevant pattern is found, the agent does NOT fork the SKILL.md into
crickets/skills/. Instead:- The agent surfaces the pattern as an idea candidate (similar shape to the idea-ledger flow).
- Writes a
personal-skill-watchlistMemoryVault entry capturing what's interesting about the pattern + what would need adapting for personal use + a link to the source. - The user (not the agent) reviews the entry + decides whether to author an actual skill in
crickets/that adapts the pattern.
-
Surface location for the watchlist (TBD — to settle during implementation): one candidate is
MemoryVault/personal-private/_skill-watchlist/<source-slug>/<pattern-slug>.md(dedicated dir, distinct from_idea-incubator/because skill discoveries are a specific shape).
Cadence + behavior:
- Internet skill-scan runs in the background via the idle-time hook (extends the existing idle infrastructure from
reflection-and-recovery). - High-confidence patterns (strong match against user's existing skill preferences in MemoryVault) → surface immediately in interactive review.
- Medium/low-confidence patterns → land in
_skill-watchlist/for batch review (/memory watchlistcommand shows the current backlog).
All five #7a parts:
-
write-primitives— every component writes via/memory save(skill-pointer entries + watchlist entries + transcript-mined entries). -
recall-loop— personal-skills entries become recall-targets at query time; the indexer needs the recall infrastructure to validate entries surface correctly. -
reflection-and-recovery— transcript-mining sub-component reuses the reflection logic; internet skill-scan extends the idle-time hook. -
idea-ledger— the adapt-don't-import workflow produces watchlist entries that look like idea-ledger entries (deep research at capture time, promotion path); reuses the idea-ledger primitives. -
seed-pass— the transcript-reflection one-time pass needs the seed manifest to know what's already captured.
#7b also depends on #7a being dogfooded for 1-2 weeks before this part starts — gives time for lessons to feed back into the discovery-mining design.
-
Transcript reflection one-time pass works — point at
~/.claude/projects/*/; verify it processes historical sessions; verify entries land in MemoryVault per tri-modal routing (HIGH/MEDIUM/LOW); verify the pass respects the seed manifest and skips already-seeded content. - Transcript reflection respects rate limits — large transcript backlogs shouldn't blow through the embedding API; verify the pass batches + paces; verify it can be paused + resumed.
-
Personal-skills indexer runs at install time — install crickets fresh; verify
personal-skills/populated with one entry per SKILL.md in toolkit + harness; verify entries have correct frontmatter. -
Personal-skills indexer runs on
/release— simulate a/releaseevent; verify the indexer re-scans + updates entries for any skill that changed version. - Personal-skills entries surface in recall — submit a prompt that should match a skill capability; verify the personal-skills entry surfaces alongside personal-private entries.
- Internet skill-scan runs on schedule — set cadence to "every 1 min" for testing; verify the idle hook fires the scan; verify it pulls from the source whitelist; verify it produces watchlist entries for matching patterns.
-
Source whitelist editable via vault file — modify
MemoryVault/personal-private/skill-discovery-sources.md; verify the next scan respects the change (no code change needed). -
Adapt-don't-import workflow — fixture a discovered pattern; verify the agent writes a
_skill-watchlist/entry (NOT a fork intocrickets/skills/); verify the entry captures what to adapt + source link. -
Watchlist review command —
/memory watchlistoutputs the current backlog; user can promote / dismiss / defer entries. - All 3 OS CI workflows green on the commit that lands this part.
- This is plan #7b's single part. It's large in scope (3 sub-components) but each sub-component is bounded. Consider whether to ship as one PR or three smaller PRs — the user's call when starting the implementing session.
- The personal-skills indexer is the simplest sub-component (file walks + writes); ship it first to validate the indexer pattern, then layer transcript-mining and internet-scan on top.
- The internet skill-scan source whitelist is load-bearing for not-flooding-the-watchlist. Start narrow (Anthropic Cookbook + 1-2 specific awesome-lists) and expand based on actual signal quality.
- The adapt-don't-import principle is the architectural commitment that protects against agent-driven skill bloat in
crickets/. The agent should NEVER authorcrickets/skills/<x>/SKILL.mdfiles autonomously — only the user does that, after reviewing a watchlist entry. Bake this hard rule into the implementing skill body. - Transcript-mining for historical sessions will likely produce thousands of candidates. The tri-modal routing means most will land in
_inbox/for batch review — plan for inbox-management UX (/memory inbox --bulk-reviewor similar) before kicking off the historical pass. - Plan #7b ships after #7a is dogfooded — so by the time this part is implemented, the user has 1-2 weeks of real recall-quality feedback. Use that data to tune any defaults in this part (cadence, confidence thresholds, source whitelist).