A design-structure idea for large Skill libraries: sparse activation + missed-case sweep + budgeted references #1288

9s5bz2jvd2-lang · 2026-06-08T14:23:00Z

9s5bz2jvd2-lang
Jun 8, 2026

Hi, thank you for publishing and maintaining this Skills repository. The recent public writing from Anthropic and Perplexity helped me understand Skills as more than prompt snippets: a Skill can be a structured folder with SKILL.md, scripts, references, assets, configuration, and progressive disclosure.

I would like to share a design-structure idea for large Skill libraries. This is not a bug report, and I am not asking for an immediate Claude Code runtime change. It is a discussion proposal about how Skills might be structured when a library grows from a few Skills to many domain-specific Skills.

My inspiration comes from two places:

DeepSeek / MoE-style sparse activation: many experts exist, but each input activates only a few relevant experts while a shared path remains available.
Clinical nutrition workflow experience: in patient-facing nutrition work, I first prioritize the most likely problem, but still run a short red-flag / contraindication / special-population / evidence sweep to avoid missing important details.

I am not suggesting that Skills are equivalent to model-weight experts. The analogy is at the Skill-design level: many Skills may exist, but each task should activate only a few relevant ones, while a small shared layer and a missed-case checklist help preserve safety and completeness.

The current foundation

From the public guidance, the foundation already seems clear:

A Skill is a folder, not just a Markdown file.
SKILL.md should act as the entry point.
Heavy references should be progressively disclosed.
Scripts should handle deterministic work.
Assets/templates/schemas can make outputs more reliable.
Skill quality should be improved through review and usage feedback.

The question I am interested in is:

Once a Skill library becomes large, can the Skill folder structure also make routing, adjacency, context budget, and missed-case checks more explicit?

Proposed structure: Skill as a graph node

For large libraries, each Skill could optionally be treated as a node in a Skill graph rather than only as an isolated folder.

One possible structure:

my-skill/
  SKILL.md            # minimal executable entry: when to use, first steps, must-not-miss items
  ROUTING.yaml        # triggers, anti-triggers, neighbors, budget hints
  CACHE.md            # stable-prefix vs variable-suffix guidance
  GRAPH.md            # upstream/downstream/adjacent/mutually-exclusive/safety-gate relationships
  scripts/            # deterministic tools
  references/         # heavy docs loaded only when needed
  assets/             # templates, schemas, examples
  evals/              # trigger accuracy, omission, and safety tests

These file names are only illustrative. The deeper idea is that a Skill can declare:

when it should be triggered,
when it should not be triggered,
which neighboring Skills should be checked,
what content can stay lightweight,
what references should only load on demand,
what must-not-miss cases should be swept before final output.

Two-stage Skill use pattern

Stage 1: sparse-first activation

Use trigger terms, descriptions, semantic matching, or a router Skill to activate only the most likely Skill region(s).

user task -> route candidates -> top-k selected Skills

The leading Skill(s) receive the main attention/context budget.

Stage 2: associative missed-case sweep

After selecting the main route, run a short checklist over neighboring Skills and safety gates.

Examples:

red flags
contraindications
special populations
evidence/citation requirements
adjacent tasks commonly confused with the current task
“must not miss” edge cases

This comes from clinical workflow: first consider the most likely problem, but still check the details that would be costly or unsafe to miss.

Budgeted references and role-based context

Not every Skill or reference needs the same amount of context.

Role	Purpose	Suggested loading strategy
`shared-core`	always-on safety / evidence / routing rules	tiny, stable, reusable
`routed-high`	primary task Skill	medium/high budget
`routed-low`	adjacent helper Skill	summary or frontmatter only
`missed-case-sweep`	red flags / contraindications / edge cases	small checklist budget
`heavy-reference`	long docs, examples, assets	explicit on-demand loading only

This is slightly different from only saying “make Skills shorter.” It is a design habit for deciding how much of each Skill is needed for a specific task.

Example: nutrition / medical education Skills

A nutrition Skill library might have:

Shared always-on layer

evidence hierarchy
no fabricated citations
no diagnosis or treatment substitution
referral / emergency red lines
non-shaming language rules

Routed expert Skills

childhood obesity nutrition
diet-record analysis
diabetes education
food composition table OCR
poster/copy generation

Missed-case sweep

age / special population
pregnancy / lactation
allergies / medications
eating-disorder risk
overclaiming / citation check

This keeps safety and evidence active while avoiding loading every disease/task Skill for every request.

Possible documentation-only MVP

This repository might be able to explore the idea without any runtime change:

Add a best-practice note for large Skill libraries.
Provide an example Skill with optional routing/adjoining metadata.
Show a “must-not-miss” checklist pattern.
Show how SKILL.md can remain a minimal entry while heavy material lives in references/.
Add eval examples for:
- correct trigger,
- non-trigger,
- adjacent-case sweep,
- safety red-line catch,
- under-trigger / over-trigger analysis.

Why I am sharing this

I noticed that many discussions around Skills focus on context pressure, progressive disclosure, and router Skills. Those are important. My suggestion is just a design-structure framing that connects those concerns with sparse activation and clinical-style missed-case checking.

In one sentence:

Large Skill libraries might benefit from being designed less like a flat folder list and more like a sparse, budgeted, safety-aware Skill graph.

This may already overlap with internal best practices or planned work. If so, I would appreciate any pointer. Thank you again for the public Skills examples and guidance.

9s5bz2jvd2-lang · 2026-06-09T11:20:02Z

9s5bz2jvd2-lang
Jun 9, 2026
Author

A small follow-up after thinking more about this design idea.

The part that may be more useful than a simple "top-k skills" framing is the networked, point-to-point neighbor activation layer.

In other words, a large Skill library does not have to behave like either:

a flat list where the model scans many skill descriptions, or
a tree where the system walks down one rigid branch.

A more scalable structure may be:

shared always-on core
  -> lightweight routing signature
  -> top-k likely Skill regions
  -> local point-to-point neighbor sweep
  -> selected detailed references / scripts / assets
  -> validation
  -> route log / gotcha feedback
  -> improved cache hit next time

The key is that the graph is not traversed broadly. Each Skill or Skill-region can expose only a small set of typed neighbors, for example:

must_check: red flags, contraindications, safety boundaries;
often_confused_with: nearby skills that are easy to mis-trigger;
requires_before: prerequisite checks;
pairs_with: complementary templates/scripts;
fallback_to: cheaper or safer alternatives;
evidence_boundary: references that should be loaded only when the answer depends on them.

This is closer to a clinical workflow pattern: first identify the most likely problem, but still run a low-budget check for red flags and adjacent missed cases. It is also analogous to sparse expert activation: the expensive experts/references are not all loaded; only the local neighborhood around the routed hit is inspected.

So the proposal is not just "add more metadata" or "retrieve fewer skills." It is a runtime-aware Skill graph structure:

Skill entry = compressed executable core
Routing metadata = lightweight gate
Graph metadata = local point-to-point adjacency
References/assets = heavy material loaded only on demand
Evals/logs = feedback loop that improves future routing

I tried to sketch this as a concrete public Skill package here:

https://github.com/9s5bz2jvd2-lang/sparse-book-to-skill-distillation

Structure diagram:

https://github.com/9s5bz2jvd2-lang/sparse-book-to-skill-distillation/blob/main/docs/structure-diagram.md

The most important design principle from this experiment is:

The token saving does not come from merely having a graph. It comes from using cross-linked point-to-point adjacency to avoid broad traversal.

Or, phrased another way:

First hit the main route, then sweep only the nearby missed cases; load heavy references last; feed routing mistakes back into the graph.

This may be especially useful for large, high-risk Skill libraries where missing a nearby exception is worse than spending a few extra tokens on a focused neighbor sweep.

0 replies

1159773558-sudo · 2026-06-09T11:22:21Z

1159773558-sudo
Jun 9, 2026

您好，我是刘思思。您的邮件已收到，我会尽快给您回复。谢谢！

0 replies

9s5bz2jvd2-lang · 2026-06-09T11:23:47Z

9s5bz2jvd2-lang
Jun 9, 2026
Author

One more extension of the analogy “large Skill ≈ small external task model”:

A mature Skill should probably have a proposal-based self-evolution loop.

Not fully automatic mutation — that would be risky, because a Skill contains executable judgment: triggers, safety boundaries, references, scripts, and templates. But every invocation can leave small evidence about how the Skill behaved:

Skill call
  -> selected route / neighbor sweep / loaded references
  -> output validation
  -> route log + gotcha + missed case
  -> proposed patch to ROUTING / GRAPH / CACHE / evals
  -> maintainer review
  -> merge only after eval/safety checks

So the Skill can “learn” from use, but in a reviewable way:

ROUTING.yaml improves when triggers or anti-triggers were wrong;
GRAPH.md improves when a point-to-point neighbor was missing;
CACHE.md improves when stable vs variable material was poorly separated;
eval cases improve when a missed-case sweep caught or failed to catch something important.

This makes the Skill graph cyclic rather than one-way: use produces logs; logs produce patches; patches make the next route/cache hit cheaper and safer.

I added a short note and diagram extension in the example repo:

https://github.com/9s5bz2jvd2-lang/sparse-book-to-skill-distillation/blob/main/docs/self-evolution-loop.md

The principle I would use is:

A Skill should not secretly rewrite itself; it should distill its usage experience into reviewable patches.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A design-structure idea for large Skill libraries: sparse activation + missed-case sweep + budgeted references #1288

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

A design-structure idea for large Skill libraries: sparse activation + missed-case sweep + budgeted references #1288

Uh oh!

9s5bz2jvd2-lang Jun 8, 2026

The current foundation

Proposed structure: Skill as a graph node

Two-stage Skill use pattern

Stage 1: sparse-first activation

Stage 2: associative missed-case sweep

Budgeted references and role-based context

Example: nutrition / medical education Skills

Shared always-on layer

Routed expert Skills

Missed-case sweep

Possible documentation-only MVP

Why I am sharing this

Replies: 3 comments

Uh oh!

9s5bz2jvd2-lang Jun 9, 2026 Author

Uh oh!

1159773558-sudo Jun 9, 2026

Uh oh!

9s5bz2jvd2-lang Jun 9, 2026 Author

9s5bz2jvd2-lang
Jun 8, 2026

9s5bz2jvd2-lang
Jun 9, 2026
Author

1159773558-sudo
Jun 9, 2026

9s5bz2jvd2-lang
Jun 9, 2026
Author