Skip to content

Evol-ai/SkillCompass

Repository files navigation

SkillCompass

Your skill could be much better. But better how? Which part? In what order?

GitHub · SKILL.md · Schemas · Changelog

MIT License Node >= 18 Claude Opus 4.6


What it is An evaluation-driven skill evolution engine for Claude Code / OpenClaw — six-dimension scoring, directed improvement, version management.
Pain it solves Turns "tweak and hope" into diagnose → targeted fix → verified improvement.
Use in 30 seconds /skill-compass evaluate {skill} — instant quality report showing exactly what's weakest and what to improve next.

Find the weakest link → fix it → prove it worked → next weakness → repeat.


Who This Is For

For

  • Anyone maintaining agent skills and wanting measurable quality
  • Developers who want directed improvement — not guesswork, but knowing exactly which dimension to fix next
  • Teams needing a quality gate — any tool that edits a skill gets auto-evaluated

Not For

  • General code review or runtime debugging
  • Creating new skills from scratch (use skill-creator)
  • Evaluating non-skill files

Quick Start

Prerequisites: Claude Opus 4.6 (complex reasoning + consistent scoring) · Node.js v18+ (local validators)

Claude Code

git clone https://github.com/Evol-ai/SkillCompass.git
cd SkillCompass && npm install

# User-level (all projects)
rsync -a --exclude='.git'  . ~/.claude/skills/skill-compass/

# Or project-level (current project only)
rsync -a --exclude='.git'  . .claude/skills/skill-compass/

First run: Claude Code will request permission for node -e and bash commands. Select "Allow always" to avoid repeated prompts.

OpenClaw

git clone https://github.com/Evol-ai/SkillCompass.git
cd SkillCompass && npm install
# Follow OpenClaw skill installation docs for your setup
rsync -a --exclude='.git'  . <your-openclaw-skills-path>/skill-compass/

Usage

Two ways to invoke SkillCompass:

/skill-compass + natural language

/skill-compass evaluate ./my-skill/SKILL.md
/skill-compass improve the nano-banana skill
/skill-compass security scan ./my-skill/SKILL.md
/skill-compass audit all skills in .claude/skills/
/skill-compass compare my-skill 1.0.0 vs 1.0.0-evo.2
/skill-compass roll back my-skill to previous version

Or just talk to Claude

No slash command needed — Claude automatically recognizes the intent:

Evaluate the nano-banana skill for me
Improve this skill — fix the weakest dimension
Scan all skills in .claude/skills/ for security issues
Capability reference
Intent Maps to
Evaluate / score / review a skill eval-skill
Improve / fix / upgrade a skill eval-improve
Security scan a skill eval-security
Batch audit a directory eval-audit
Compare two versions eval-compare
Merge with upstream eval-merge
Rollback to previous version eval-rollback

What It Does

SkillCompass — Skill Quality Report

The score isn't the point — the direction is. You instantly see which dimension is the bottleneck and what to do about it.

Each /eval-improve round follows a closed loop: fix the weakest → re-evaluate → verify improvement → next weakest. No fix is saved unless the re-evaluation confirms it actually helped.


Six-Dimension Evaluation Model

ID Dimension Weight What it evaluates
D1 Structure 10% Frontmatter validity, markdown format, declarations
D2 Trigger 15% Activation quality, rejection accuracy, discoverability
D3 Security 20% Secrets, injection, permissions, exfiltration
D4 Functional 30% Core quality, edge cases, output stability, error handling
D5 Comparative 15% Value over direct prompting (with vs without skill)
D6 Uniqueness 10% Overlap with similar skills, model supersession risk
overall_score = round((D1×0.10 + D2×0.15 + D3×0.20 + D4×0.30 + D5×0.15 + D6×0.10) × 10)
Verdict Condition
PASS score ≥ 70 AND D3 pass
CAUTION 50–69, or D3 High findings
FAIL score < 50, or D3 Critical (gate override)

Features

Core Loop

Feature Description
Directed Evolution Diagnose → targeted fix → verify → next weakness. Not random patching.
Closed-Loop Improve /eval-improve auto re-evaluates after each fix. Only saves if improved and nothing regressed.
Scope Control --scope gate = D1+D3 (~8K tokens). --scope target --dimension D4 = single dim + gate.
Tiered Verification L0 syntax → L1 single dimension → L2 full re-eval → L3 cross-skill.
D1+D2 Grouping Both metadata dimensions weak (≤5)? Improved together — they share the frontmatter layer.

Safety

Feature Description
Pre-Accept Gate Hooks auto-scan every SKILL.md write. D1 + D3 checks. Zero config. Warns, never blocks.
Pre-Eval Scan Static analysis blocks malicious code, exfiltration, prompt injection before LLM eval.
Output Guard Validates improvement output for URL injection, dangerous commands, size anomalies.
Auto-Rollback Any dimension drops >2 points after improvement? Changes discarded.
Local Validators JS-based D1/D2/D3 validators run locally. Saves ~60% tokens on clear-cut issues.

Smart Optimization

Feature Description
Correction Tracking Detects repeated manual fixes, maps to dimensions, prompts update at next invocation.
Feedback Integration Real usage data fuses into scores: 60% static + 40% feedback signals.
Multi-Language Triggers Detects your language, tests trigger accuracy in it, fixes multilingual gaps.
Obsolescence Detection Compares skill vs base model. Tracks supersession risk across model updates.
Skill Type Detection Auto-classifies atom / composite / meta. Evaluation adapts accordingly.

Version & Scale

Feature Description
Version Management SHA-256 hashed snapshots. Rollback to any version anytime.
Three-Way Merge Merges upstream updates region-by-region. Local improvements preserved.
Multi-Round Evolution /eval-evolve runs up to 6 rounds autonomously. Stops at PASS or plateau.
Batch Audit + Auto-Fix /eval-audit --fix --budget 3 scans worst-first, auto-fixes within budget.
CI Mode --ci flag, exit codes: 0=PASS, 1=CAUTION, 2=FAIL.

Works With Everything

No point-to-point integration needed. The Pre-Accept Gate intercepts all SKILL.md edits regardless of source.

Tool How it works together Guide
Auto-Updater Pulls new version → Gate auto-checks for security regressions → keep or rollback guide
Claudeception Extracts skill → auto-evaluation catches security holes + redundancy → directed fix guide
Self-Improving Agent Logs errors → feed as signals → SkillCompass maps to dimensions and fixes guide

Feedback Signal Standard

SkillCompass defines an open feedback-signal.json schema for any tool to report skill usage data:

/eval-skill ./my-skill/SKILL.md --feedback ./feedback-signals.json

Signals: trigger_accuracy, correction_count, correction_patterns, adoption_rate, ignore_rate, usage_frequency. The schema is extensible (additionalProperties: true) — any pipeline can produce or consume this format.


License

MIT — Use, modify, distribute freely. See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors