skilldigest is a static analyzer for AI coding-assistant skill libraries (
SKILL.md,AGENTS.md,.cursorrules,CLAUDE.md, agent plugins, etc.). It walks a directory of skills, measures per-skill token cost with a tiktoken-compatible BPE, builds a reference graph, and reports dead, bloated, conflicting, stale, and cyclic skills, plus a recommended loadout for a given task tag. Single static Rust binary. SARIF output drops straight into GitHub code-scanning.
- Why this exists
- Features
- Installation
- Quick start
- CLI reference
- Output formats
- Exit codes
- Configuration file
- Tokenizers
- Rule catalogue
- CI integration (GitHub Actions)
- Performance
- Determinism and reproducibility
- Security and robustness
- Comparison with other JSLEEKR tools
- Architecture
- Development
- Roadmap
- Contributing
- License
AI coding-assistant skill libraries have exploded in 2026. A partial list:
| Project | Skills | Stars |
|---|---|---|
antigravity-awesome-skills |
1,400+ | 33,455 |
Vibe-Skills |
340+ | 1,535 |
claude-skills |
232+ | 11,401 |
awesome-claude-code |
190+ | 39,123 |
oh-my-claudecode |
many | 29,372 |
Every one of them ships as a giant directory of markdown. Nobody knows:
- Which skills are actually referenced by an index/manifest and which are dead code?
- Which skills exceed the token budget of the target model?
- Which skills contradict each other (e.g. one says "MUST use
Bash(jq)", another says "MUST NOT")? - Which skills link to files that no longer exist?
- Given a task tag
refactor-tests, which minimal loadout fits in 10k tokens?
skilldigest answers all five. Adjacent tools do not:
skillpack— packages/locks skills, doesn't audit them.agentlint— validates agent config files (YAML/JSON), not skill bodies (markdown).tokencost— counts tokens per prompt, not per skill-library entry.rtk— runtime token reducer, not a static analyzer.
skilldigest is the missing piece. One Rust binary, no runtime deps, ships a SARIF report your CI already knows how to upload.
- Deterministic — same input → byte-identical output.
- Offline-first — cl100k tokenizer data ships inside the binary.
- Fast — ~1,400 skills in < 2 s on an 8-core laptop (rayon parallel tokenization).
- Multi-format — text, JSON, SARIF 2.1, Markdown (PR comment), GraphViz dot.
- Library-format agnostic — detects
SKILL.md,AGENT.md,AGENTS.md,CLAUDE.md,GEMINI.md,.cursorrules,.cursor/rules/**,.claude/skills/**,plugin.toml. - Rule catalogue — 12 distinct issue classes with SARIF
ruleIds (SKILL001–SKILL012). - Robust — tolerates BOM, CRLF, mixed indent, malformed frontmatter, non-UTF-8 bytes.
- Configurable —
.skilldigest.tomlwith per-skill budget overrides and ignore globs. - Zero
unsafe—#![forbid(unsafe_code)]at the crate root.
git clone https://github.com/JSLEEKR/skilldigest
cd skilldigest
cargo build --release
./target/release/skilldigest --helpcargo install --path .
# or, once published:
cargo install skilldigestrust-version = "1.75". Any newer stable toolchain works.
Linux (x86_64, aarch64), macOS (x86_64, aarch64), Windows (x86_64). One static binary per platform. No runtime dependencies.
# Audit a skill library
skilldigest scan ./my-skills
# Token count for a single file
skilldigest tokens ./my-skills/git/commit/SKILL.md
# Recommend a loadout for the "refactor" task tag
skilldigest loadout ./my-skills --tag refactor --max-tokens 8000
# Emit the skill reference graph as GraphViz dot
skilldigest graph ./my-skills --format dot | dot -Tsvg > skills.svg| Flag | Default | Description |
|---|---|---|
-f, --format <FORMAT> |
text |
Output format: text, json, sarif, markdown, dot |
-o, --output <FILE> |
stdout | Write output to a file |
-t, --tokenizer <NAME> |
cl100k |
Tokenizer: cl100k, o200k, llama3 |
-b, --budget <N> |
2000 |
Per-skill token budget |
--total-budget <N> |
none | Aggregate token budget across the library |
--offline |
off | No-op retained for forward compatibility — skilldigest is always fully offline (tokenizer data is bundled in the binary; no network I/O at scan time) |
--follow-symlinks |
off | Follow symlinks during scan |
--max-file-size <B> |
1048576 |
Skip files larger than this many bytes |
--config <FILE> |
auto | Path to .skilldigest.toml |
--no-color |
off | Disable ANSI color in text output |
-v, --verbose |
off | Log to stderr |
-q, --quiet |
off | Suppress non-error output |
--version |
— | Print version and exit |
--help |
— | Print help and exit |
skilldigest scan <DIR> [OPTIONS]
Runs a full audit. Emits a report in the chosen format. Returns exit-1 when any error-severity issue is found.
skilldigest scan ./skills
skilldigest scan ./skills --format json --output report.json
skilldigest scan ./skills --format sarif --output skills.sarif.json
skilldigest scan ./skills --budget 3000 --no-color
skilldigest scan ./skills --fix-hint # emit rm hints to stderrskilldigest tokens <FILE> [OPTIONS]
Count tokens in a single file.
skilldigest tokens ./skills/git/commit/SKILL.md
skilldigest tokens ./skills/git/commit/SKILL.md --by-section --format json
skilldigest tokens ./CLAUDE.md --tokenizer o200kskilldigest loadout <DIR> --tag <TAG> [--max-tokens <N>] [OPTIONS]
Score every skill for the tag and greedily select the highest-scoring
subset that fits in --max-tokens. Ties broken deterministically by skill ID.
skilldigest loadout ./skills --tag git --max-tokens 10000
skilldigest loadout ./skills --tag refactor --max-tokens 5000 --format jsonskilldigest graph <DIR> [OPTIONS]
Emit the skill reference graph.
skilldigest graph ./skills --format dot | dot -Tsvg -o graph.svg
skilldigest graph ./skills --format json
skilldigest graph ./skills --format markdown # embedded code-blockPretty-printed; stable snake_case keys; versioned via schema_version.
{
"schema_version": "skilldigest-report/1",
"tokenizer": "cl100k_base",
"tool_version": "1.0.0",
"scan_root": "./skills",
"total_skills": 12,
"total_tokens": 18432,
"budget": { "per_skill": 2000, "total": null },
"skills": [
{
"id": "git/commit-style",
"name": "commit-style",
"path": "git/commit-style/SKILL.md",
"tokens": { "frontmatter": 32, "body": 814, "total": 846 },
"tags": ["git", "commit"],
"refs_out": 2,
"refs_in": 1,
"issue_kinds": ["bloated"]
}
],
"issues": [
{
"kind": "dead",
"severity": "warning",
"skill": "legacy/old-thing",
"message": "skill 'legacy/old-thing' is never referenced by any index or other skill",
"location": { "path": "legacy/old-thing/SKILL.md", "line": 1, "column": 1 },
"related": []
}
],
"loadout": null
}The SARIF emitter is designed to be accepted by GitHub code-scanning
(github/codeql-action/upload-sarif@v3). Each issue class has its own rule
(SKILL001 – SKILL012) with stable id, name, shortDescription,
fullDescription, defaultConfiguration.level, and helpUri.
skilldigest scan ./skills --format sarif --output skills.sarif.json
# …then in your GH Actions workflow:
# - uses: github/codeql-action/upload-sarif@v3
# with: { sarif_file: skills.sarif.json }### skilldigest report
**12 skills**, **18,432 tokens** (cl100k_base), **3 issues** (1 error, 2 warning, 0 note)
| Skill | Tokens | Issues |
|-------|-------:|--------|
| `git/commit-style` | 846 | bloated |
| `legacy/old-thing` | 1204 | dead |
#### Issues
- [ERROR] **bloated** `git/commit-style` `git/commit-style/SKILL.md:1` — 846 tokens exceeds budget 500
- [warn] **dead** `legacy/old-thing` `legacy/old-thing/SKILL.md:1` — skill 'legacy/old-thing' is never referenced| Code | Meaning | Typical CI reaction |
|---|---|---|
0 |
Scan completed, no error-severity issues | green build |
1 |
Error-severity issues found | fail the build / block merge |
2 |
Operational error (bad args, IO, malformed config) | fail the build as infra error |
Drop a .skilldigest.toml at the scan root.
# Global token budgets
[budget]
per_skill = 2000
total = 40000
# Default tokenizer (CLI flag still wins)
[tokenizer]
default = "cl100k"
# Gitignore-style globs to skip
[ignore]
globs = ["archive/**", "drafts/**", "*.bak.md"]
# Per-skill overrides
[overrides."git/commit-style"]
budget = 3000
[overrides."onboarding/company-context"]
budget = 5000Precedence (highest wins) — most-specific override beats more-global setting:
- Frontmatter
budget:on an individual skill (most specific) [overrides]section in.skilldigest.toml(per-skill, by id)--budgetCLI flag (sets the global per-skill default for this run)[budget] per_skillconfig section- Built-in default (2000)
The same shape applies to the global [budget] total cap: --total-budget
on the CLI overrides [budget] total in the config file. There is no
per-skill override for the aggregate cap.
| Name | Backed by | Offline? | Notes |
|---|---|---|---|
cl100k |
tiktoken-rs::cl100k_base |
Yes (bundled) | GPT-4, Claude-ish. Default. |
o200k |
tiktoken-rs::o200k_base |
Yes (bundled) | GPT-4o. |
llama3 |
Deterministic word-piece approximation | Yes (algorithmic) | Within ~10% of real Llama 3 counts on English prose. Useful for relative comparisons. |
The llama3 backend is intentionally an approximation — we do not ship the
full HuggingFace tokenizer.json (which would require either a network
fetch or a ~20 MB binary bloat). The approximation is deterministic and
side-effect free; documented as approximate so downstream tooling knows
not to trust it for absolute billing.
| Rule ID | Issue kind | Default severity | Description |
|---|---|---|---|
SKILL001 |
dead | warning | Skill never referenced by any index or other skill |
SKILL002 |
bloated | error | Skill exceeds per-skill token budget |
SKILL003 |
conflict | error | Two skills contain opposing rules about the same subject |
SKILL004 |
stale | warning | A link or file reference points to a missing file |
SKILL005 |
cycle | error | Reference cycle in the skill graph |
SKILL006 |
oversize | error | File exceeds --max-file-size |
SKILL007 |
non-utf8 | warning | File contained bytes that could not be decoded as UTF-8 |
SKILL008 |
bad-frontmatter | warning | YAML frontmatter failed to parse |
SKILL009 |
symlink | note | Symlink skipped (use --follow-symlinks) |
SKILL010 |
duplicate | error | Two files produced the same normalized skill identifier |
SKILL011 |
path-escape | warning | Discovered file canonicalised to a path outside the scan root (e.g. via a symlink target) |
SKILL012 |
total-bloated | error | Aggregate library token cost exceeds --total-budget / [budget] total |
name: skill-digest
on:
pull_request:
paths:
- '.claude/skills/**'
- '.cursor/rules/**'
- 'AGENTS.md'
- 'CLAUDE.md'
jobs:
skilldigest:
runs-on: ubuntu-latest
permissions:
security-events: write # required for upload-sarif
contents: read
steps:
- uses: actions/checkout@v4
- name: Install skilldigest
run: |
curl -L https://github.com/JSLEEKR/skilldigest/releases/latest/download/skilldigest-linux-amd64 -o /usr/local/bin/skilldigest
chmod +x /usr/local/bin/skilldigest
- name: Run skilldigest (SARIF)
run: skilldigest scan . --format sarif --output skills.sarif.json || true
- name: Upload SARIF to GitHub code-scanning
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: skills.sarif.json
category: skilldigest
- name: Fail on any error-severity issue
run: skilldigest scan . --no-colorOr drop it straight into a PR comment:
- name: Render Markdown report
id: digest
run: skilldigest scan . --format markdown > digest.md
- name: Comment on PR
uses: marocchino/sticky-pull-request-comment@v2
with:
path: digest.mdOn an 8-core x86_64 laptop with warm filesystem cache:
| Library size | Wall time |
|---|---|
| 20 skills | ~5 ms |
| 200 skills | ~35 ms |
| 1,400 skills | < 2 s |
Run the bench yourself:
cargo bench --bench bench_scan
cargo bench --bench bench_tokenize- All collections sorted before emit.
- Tokenizer version and schema version are stamped into every JSON/SARIF output.
- No timestamps anywhere in the output — runs at different times produce byte-identical files.
- Deterministic tie-breakers in the loadout recommender (integer math, no floats).
skilldigest scan ./skills --format json > a.json
skilldigest scan ./skills --format json > b.json
diff -u a.json b.json # → empty#![forbid(unsafe_code)]at the crate root.- File-size cap (1 MiB default) prevents memory blowup on malicious inputs.
- Symlinks skipped by default — reject path traversal via canonicalization.
- UTF-8 strict on the fast path (
simdutf8), graceful fallback flags non-UTF-8 files instead of panicking. - No network I/O at scan time — tokenizer data is bundled inside the binary.
- No shell-outs — no subprocess execution at any point.
- Frontmatter YAML is parsed in a bounded mode with
serde_yamland failures producebad-frontmatterissues rather than halting the scan.
| Tool | Round | Language | Scope | Unique to skilldigest |
|---|---|---|---|---|
skillpack |
R81 | Go | Lockfile + install for skills | Token audit, dead-code detection |
agentlint |
R83 | TypeScript | Validate agent config files (JSON/YAML) | Operates on skill bodies (markdown) |
tokencost |
R54 | — | Tokens per prompt | Tokens per skill + library audit |
mcpbench |
R84 | Go | Benchmark MCP servers | Different category |
ragcheck |
R82 | Python | RAG eval harness | Different category |
agentmem |
— | — | Agent memory persistence | Different category |
Together, skillpack (R81) + agentlint (R83) + skilldigest (R85) cover
packaging, config validation, and content analysis of AI-agent skill
libraries — three non-overlapping quality gates.
+------------------+
| CLI (clap v4) |
+---------+--------+
|
v
+---------+---------+ +----------------+
| Scanner (walkdir)|---->| Parser (md+yaml)|
+---------+---------+ +-------+--------+
| |
| v
| +------+------+
| | Skill AST |
| +------+------+
| |
v v
+---------+----------+ +------+---------+
| Tokenizer pool |<---->| Graph (petgraph)|
| (tiktoken-rs) | +------+---------+
+---------+----------+ |
| v
| +------+---------+
| | Audit rules |
| +------+---------+
| |
v v
+-------------------+ +---------------+
| Output emitter |<-----+ Issue list |
| (text/json/sarif/md) | +---------------+
+-------------------+
Module layout (src/):
| Module | Purpose |
|---|---|
cli.rs |
clap v4 derive, subcommand dispatch |
scan.rs |
directory walk, file classification |
parse.rs |
markdown + frontmatter parser |
model.rs |
core data types |
tokenize.rs |
cl100k / o200k / llama3-approx tokenizers |
graph.rs |
petgraph-backed reference graph |
rules.rs |
bloat / conflict / stale / duplicate / dead detectors |
audit.rs |
orchestration |
loadout.rs |
task-tag loadout recommender |
config.rs |
.skilldigest.toml loader |
output/* |
text / json / sarif / markdown / dot renderers |
error.rs |
canonical error type + exit codes |
# Full test suite
cargo test --all-features
# Clippy — strict, warnings = errors
cargo clippy --all-targets --all-features -- -D warnings
# Format check
cargo fmt --check
# Benchmarks
cargo benchTest count at v1.0.0: 200+ tests (unit + integration + doc).
Out of scope for v1 (tracked for future rounds):
- LLM-assisted conflict detection — v1 is structural only.
--fixauto-repair — v1 only emits shell-hints via--fix-hint.- VS Code / Cursor extension — may ship as a separate project.
- Integration with
skillpacklockfile — cross-reference pinned skill versions. - Language-specific rule packs — currently tool-detection is hard-coded to Claude-style tool names; a plugin system would allow Cursor/Copilot tool-name dictionaries.
- Fork the repo.
- Create a topic branch (
git checkout -b feat/your-feature). - Make sure
cargo fmt --check,cargo clippy -- -D warnings,cargo test --all-featuresall pass. - Add tests for any new behavior.
- Open a PR with a clear description of the change.
Commit messages loosely follow conventional-commits (feat:, fix:,
docs:, refactor:). The pre-commit checklist is simply the three
commands above.
MIT © 2026 JSLEEKR. See LICENSE.