Skip to content

ddevilz/toolmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

ToolMark πŸ”¨

ESLint + Jest + npm publish β€” for AI Agent Tools.

Build, test, scan, and ship tools across OpenClaw/ClawHub, Claude Code, Cursor, and Windsurf β€” from a single CLI.

PyPI License: MIT Tests


Why ToolMark?

13,000+ tools are published on ClawHub. 13% contain critical security flaws (Snyk ToxicTools Report, Feb 2026). Tools break silently on platforms other than the one they were tested on. There is no pytest for agent tools β€” until now.

toolmark init my-tool --template github-api
toolmark test          # LLM-as-judge evaluation
toolmark scan          # prompt injection, dynamic fetch, credential leaks
toolmark compat        # check all 4 platforms at once
toolmark publish       # sign with Ed25519, push to ClawHub + Claude Code

Install

pip install toolmark

Requires Python 3.12+.


Quick Start

# 1. Scaffold
toolmark init my-github-tool --template github-api

# 2. Edit tool.md and tests/
cd my-github-tool

# 3. Test
ANTHROPIC_API_KEY=sk-ant-... toolmark test

# 4. Scan
toolmark scan

# 5. Check platform compatibility
toolmark compat

# 6. Publish
toolmark publish --platforms clawhub,claude-code

Commands

Command What it does
toolmark init Scaffold a new tool from a template
toolmark test LLM-as-judge evaluation against YAML test cases
toolmark scan Security scanner (prompt injection, dynamic fetch, creds)
toolmark compat Cross-platform compatibility check (4 platforms)
toolmark bench Benchmark latency, tokens, compute quality score (0–100)
toolmark publish Sign with Ed25519, publish to configured registries

Templates

toolmark init my-tool --template github-api      # GitHub REST API wrapper
toolmark init my-tool --template file-ops         # Local filesystem tool
toolmark init my-tool --template mcp-integration  # Wraps an MCP server tool
toolmark init my-tool --template web-search       # Search API tool
toolmark init my-tool --template loom-query       # Loom knowledge graph tool
toolmark init my-tool --template blank            # Minimal scaffold

Test Cases (YAML)

# tests/test_search.yaml
- id: search_open_prs
  input: "find my open pull requests"
  expect_invoked: true
  expect_tool: search_pull_requests
  expect_params:
    state: open
    assignee: "@me"
  tolerance: fuzzy     # strict | fuzzy | invoked
  tags: [smoke]

Run: toolmark test --tags smoke


Security

toolmark catches:

  • SF001 β€” Dynamic fetch (curl | bash, eval(fetch(...)))
  • SF002 β€” Hardcoded credentials (API keys, passwords)
  • SF003 β€” Prompt injection phrases in tool descriptions
  • SF004 β€” Undeclared network endpoints
  • SNYK-* β€” 138 rules via Snyk agent-scan (if installed)

Provenance Signing

Every published tool is signed with Ed25519:

toolmark keygen              # creates ~/.toolmark/signing.key
toolmark publish --sign      # signs + publishes
toolmark verify my-tool     # verify any published tool

GitHub Actions

Every toolmark init project includes a ready-to-use workflow:

# .github/workflows/toolmark.yml β€” already in your project
- toolmark compat    # platform check
- toolmark scan      # security gate
- toolmark test      # LLM evaluation (needs ANTHROPIC_API_KEY secret)

Quality Leaderboard

See how your tool ranks: toolmark.dev/leaderboard

Quality Score = test pass rate (50%) + security score (30%) + compat score (20%).


Roadmap

  • init β€” scaffold with 6 templates
  • test β€” LLM-as-judge evaluation
  • scan β€” built-in security rules + Snyk integration
  • compat β€” 4-platform compatibility matrix
  • bench β€” composite quality score
  • publish β€” Ed25519 signing + ClawHub
  • watch β€” re-run tests on save
  • VS Code extension
  • Rust benchmark runner
  • Claude Code + Cursor + Windsurf publish

Contributing

See CONTRIBUTING.md. We always have good first issues.

License

MIT β€” see LICENSE.


Built by @ddevilz as part of the Loom AI tooling ecosystem.

About

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages