SkillCompass

Your skill could be much better. But better how? Which part? In what order?


What it is	An evaluation-driven skill evolution engine for Claude Code / OpenClaw — six-dimension scoring, directed improvement, version management.
Pain it solves	Turns "tweak and hope" into diagnose → targeted fix → verified improvement.
Use in 30 seconds	`/skill-compass evaluate {skill}` — instant quality report showing exactly what's weakest and what to improve next.

Find the weakest link → fix it → prove it worked → next weakness → repeat.

Who This Is For

For

Anyone maintaining agent skills and wanting measurable quality
Developers who want directed improvement — not guesswork, but knowing exactly which dimension to fix next
Teams needing a quality gate — any tool that edits a skill gets auto-evaluated

Not For

General code review or runtime debugging
Creating new skills from scratch (use skill-creator)
Evaluating non-skill files

Quick Start

Prerequisites: Claude Opus 4.6 (complex reasoning + consistent scoring) · Node.js v18+ (local validators)

Claude Code

git clone https://github.com/Evol-ai/SkillCompass.git
cd SkillCompass && npm install

# User-level (all projects)
rsync -a --exclude='.git'  . ~/.claude/skills/skill-compass/

# Or project-level (current project only)
rsync -a --exclude='.git'  . .claude/skills/skill-compass/

First run: Claude Code will request permission for node -e and bash commands. Select "Allow always" to avoid repeated prompts.

OpenClaw

git clone https://github.com/Evol-ai/SkillCompass.git
cd SkillCompass && npm install
# Follow OpenClaw skill installation docs for your setup
rsync -a --exclude='.git'  . <your-openclaw-skills-path>/skill-compass/

Usage

Two ways to invoke SkillCompass:

`/skill-compass` + natural language

/skill-compass evaluate ./my-skill/SKILL.md
/skill-compass improve the nano-banana skill
/skill-compass security scan ./my-skill/SKILL.md
/skill-compass audit all skills in .claude/skills/
/skill-compass compare my-skill 1.0.0 vs 1.0.0-evo.2
/skill-compass roll back my-skill to previous version

Or just talk to Claude

No slash command needed — Claude automatically recognizes the intent:

Evaluate the nano-banana skill for me
Improve this skill — fix the weakest dimension
Scan all skills in .claude/skills/ for security issues

Capability reference

Intent	Maps to
Evaluate / score / review a skill	`eval-skill`
Improve / fix / upgrade a skill	`eval-improve`
Security scan a skill	`eval-security`
Batch audit a directory	`eval-audit`
Compare two versions	`eval-compare`
Merge with upstream	`eval-merge`
Rollback to previous version	`eval-rollback`

What It Does

The score isn't the point — the direction is. You instantly see which dimension is the bottleneck and what to do about it.

Each /eval-improve round follows a closed loop: fix the weakest → re-evaluate → verify improvement → next weakest. No fix is saved unless the re-evaluation confirms it actually helped.

Six-Dimension Evaluation Model

ID	Dimension	Weight	What it evaluates
D1	Structure	10%	Frontmatter validity, markdown format, declarations
D2	Trigger	15%	Activation quality, rejection accuracy, discoverability
D3	Security	20%	Secrets, injection, permissions, exfiltration
D4	Functional	30%	Core quality, edge cases, output stability, error handling
D5	Comparative	15%	Value over direct prompting (with vs without skill)
D6	Uniqueness	10%	Overlap with similar skills, model supersession risk

overall_score = round((D1×0.10 + D2×0.15 + D3×0.20 + D4×0.30 + D5×0.15 + D6×0.10) × 10)

Verdict	Condition
PASS	score ≥ 70 AND D3 pass
CAUTION	50–69, or D3 High findings
FAIL	score < 50, or D3 Critical (gate override)

Features

Core Loop

Feature	Description
Directed Evolution	Diagnose → targeted fix → verify → next weakness. Not random patching.
Closed-Loop Improve	`/eval-improve` auto re-evaluates after each fix. Only saves if improved and nothing regressed.
Scope Control	`--scope gate` = D1+D3 (~8K tokens). `--scope target --dimension D4` = single dim + gate.
Tiered Verification	L0 syntax → L1 single dimension → L2 full re-eval → L3 cross-skill.
D1+D2 Grouping	Both metadata dimensions weak (≤5)? Improved together — they share the frontmatter layer.

Safety

Feature	Description
Pre-Accept Gate	Hooks auto-scan every SKILL.md write. D1 + D3 checks. Zero config. Warns, never blocks.
Pre-Eval Scan	Static analysis blocks malicious code, exfiltration, prompt injection before LLM eval.
Output Guard	Validates improvement output for URL injection, dangerous commands, size anomalies.
Auto-Rollback	Any dimension drops >2 points after improvement? Changes discarded.
Local Validators	JS-based D1/D2/D3 validators run locally. Saves ~60% tokens on clear-cut issues.

Smart Optimization

Feature	Description
Correction Tracking	Detects repeated manual fixes, maps to dimensions, prompts update at next invocation.
Feedback Integration	Real usage data fuses into scores: 60% static + 40% feedback signals.
Multi-Language Triggers	Detects your language, tests trigger accuracy in it, fixes multilingual gaps.
Obsolescence Detection	Compares skill vs base model. Tracks supersession risk across model updates.
Skill Type Detection	Auto-classifies atom / composite / meta. Evaluation adapts accordingly.

Version & Scale

Feature	Description
Version Management	SHA-256 hashed snapshots. Rollback to any version anytime.
Three-Way Merge	Merges upstream updates region-by-region. Local improvements preserved.
Multi-Round Evolution	`/eval-evolve` runs up to 6 rounds autonomously. Stops at PASS or plateau.
Batch Audit + Auto-Fix	`/eval-audit --fix --budget 3` scans worst-first, auto-fixes within budget.
CI Mode	`--ci` flag, exit codes: 0=PASS, 1=CAUTION, 2=FAIL.

Works With Everything

No point-to-point integration needed. The Pre-Accept Gate intercepts all SKILL.md edits regardless of source.

Tool	How it works together	Guide
Auto-Updater	Pulls new version → Gate auto-checks for security regressions → keep or rollback	guide
Claudeception	Extracts skill → auto-evaluation catches security holes + redundancy → directed fix	guide
Self-Improving Agent	Logs errors → feed as signals → SkillCompass maps to dimensions and fixes	guide

Feedback Signal Standard

SkillCompass defines an open feedback-signal.json schema for any tool to report skill usage data:

/eval-skill ./my-skill/SKILL.md --feedback ./feedback-signals.json

Signals: trigger_accuracy, correction_count, correction_patterns, adoption_rate, ignore_rate, usage_frequency. The schema is extensible (additionalProperties: true) — any pipeline can produce or consume this format.

License

MIT — Use, modify, distribute freely. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.claude-plugin		.claude-plugin
.skill-compass		.skill-compass
assets		assets
commands		commands
examples		examples
hooks		hooks
lib		lib
prompts		prompts
schemas		schemas
scripts		scripts
shared		shared
test-fixtures		test-fixtures
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SkillCompass

Who This Is For

Quick Start

Claude Code

OpenClaw

Usage

`/skill-compass` + natural language

Or just talk to Claude

What It Does

Six-Dimension Evaluation Model

Features

Core Loop

Safety

Smart Optimization

Version & Scale

Works With Everything

Feedback Signal Standard

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SkillCompass

Who This Is For

Quick Start

Claude Code

OpenClaw

Usage

/skill-compass + natural language

Or just talk to Claude

What It Does

Six-Dimension Evaluation Model

Features

Core Loop

Safety

Smart Optimization

Version & Scale

Works With Everything

Feedback Signal Standard

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`/skill-compass` + natural language

Packages