Skip to content

adrianmoses/skillcheck

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Skillcheck

An agent skill verification framework for testing AI coding assistants.

What is Skillcheck?

Skillcheck is a framework for verifying that AI coding assistants can successfully execute specific skills. It provides:

  • Standardized test definitions - YAML-based skill test specifications
  • Automated verification - Compare agent output against expected criteria
  • Detailed reporting - JSON and Markdown reports for analysis
  • Extensible architecture - Easy to add new skills and verifiers

Supported Skills

Skill Description Status
docx_create Create Word documents from scratch Available

Supported Agents

Skillcheck can test any AI coding assistant that can execute prompts. Tested with:

  • Claude Code
  • Codex CLI
  • Gemini CLI
  • OpenCrawl

Methodology

Skillcheck uses a baseline comparison approach:

  1. Define - Create a test definition with a prompt and expected criteria
  2. Execute - Give the prompt to an AI agent and capture its output
  3. Verify - Compare the output against the expected criteria
  4. Report - Generate detailed pass/fail reports with metrics

This allows objective measurement of agent capabilities across different skills.

Quick Start

# Install
pip install -e .

# Show available skills
skillcheck --list-skills

# Run a skill test
cd skills/docx_create
python run_test.py --show-prompt  # See what to ask the agent
python run_test.py --output agent_output.docx --agent "Claude Code"

See docs/QUICK_START.md for detailed setup instructions.

Project Structure

skillcheck/
├── skillcheck/          # Core framework
│   ├── models.py        # Data models
│   ├── verifier.py      # Base verification
│   ├── runner.py        # Test orchestration
│   └── reporter.py      # Report generation
│
├── skills/              # Skill test suites
│   └── docx_create/     # DOCX creation skill
│
├── docs/                # Documentation
└── tests/               # Framework tests

Note on Skills Architecture: Each skill in the skills/ directory is designed as a standalone test suite. Skills are intentionally not packaged as a Python module (no __init__.py) to keep them decoupled and independently distributable. Each skill's run_test.py serves as its entry point. The CLI auto-discovers skills by scanning for directories containing test_definition.yaml.

Writing Skill Tests

See docs/SKILL_TESTS.md for a guide on creating new skill tests.

Reports

Skillcheck generates two report formats:

  • JSON - Machine-readable for automation and dashboards
  • Markdown - Human-readable for review

See docs/REPORT_TEMPLATE.md for example reports.

Badge System (Planned)

Future versions will include a badge system for verified skills:

[Claude Code] DOCX Create: VERIFIED
[Codex CLI] DOCX Create: VERIFIED

License

MIT

About

[WIP] Tested Agent Skills

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages