An agent skill verification framework for testing AI coding assistants.
Skillcheck is a framework for verifying that AI coding assistants can successfully execute specific skills. It provides:
- Standardized test definitions - YAML-based skill test specifications
- Automated verification - Compare agent output against expected criteria
- Detailed reporting - JSON and Markdown reports for analysis
- Extensible architecture - Easy to add new skills and verifiers
| Skill | Description | Status |
|---|---|---|
docx_create |
Create Word documents from scratch | Available |
Skillcheck can test any AI coding assistant that can execute prompts. Tested with:
- Claude Code
- Codex CLI
- Gemini CLI
- OpenCrawl
Skillcheck uses a baseline comparison approach:
- Define - Create a test definition with a prompt and expected criteria
- Execute - Give the prompt to an AI agent and capture its output
- Verify - Compare the output against the expected criteria
- Report - Generate detailed pass/fail reports with metrics
This allows objective measurement of agent capabilities across different skills.
# Install
pip install -e .
# Show available skills
skillcheck --list-skills
# Run a skill test
cd skills/docx_create
python run_test.py --show-prompt # See what to ask the agent
python run_test.py --output agent_output.docx --agent "Claude Code"See docs/QUICK_START.md for detailed setup instructions.
skillcheck/
├── skillcheck/ # Core framework
│ ├── models.py # Data models
│ ├── verifier.py # Base verification
│ ├── runner.py # Test orchestration
│ └── reporter.py # Report generation
│
├── skills/ # Skill test suites
│ └── docx_create/ # DOCX creation skill
│
├── docs/ # Documentation
└── tests/ # Framework tests
Note on Skills Architecture: Each skill in the skills/ directory is designed as a standalone test suite. Skills are intentionally not packaged as a Python module (no __init__.py) to keep them decoupled and independently distributable. Each skill's run_test.py serves as its entry point. The CLI auto-discovers skills by scanning for directories containing test_definition.yaml.
See docs/SKILL_TESTS.md for a guide on creating new skill tests.
Skillcheck generates two report formats:
- JSON - Machine-readable for automation and dashboards
- Markdown - Human-readable for review
See docs/REPORT_TEMPLATE.md for example reports.
Future versions will include a badge system for verified skills:
[Claude Code] DOCX Create: VERIFIED
[Codex CLI] DOCX Create: VERIFIED
MIT