Kasper

Kasper is a plugin for opencode that monitors agent sessions, scores adherence to user instructions via LLM-as-judge, and injects corrective instructions into AGENTS.md and per-agent prompt files.

Unofficial plugin: This is an independent project and is not affiliated with, endorsed by, or maintained by the opencode team.

Features

LLM-as-Judge Scoring — Evaluates every session on 5 dimensions: instruction following, completeness, proactiveness, code quality, and communication
Automatic Improvements — Detects recurring weaknesses and injects fixes into AGENTS.md or per-agent prompts (auto or manual approval)
Idle-Aware Evaluation — Sessions scored only when idle or complete, preventing partial-turn scoring
Per-Agent Scoring — Separate aggregates and weakness profiles per agent
Batch & Retroactive Scoring — Score past sessions via /kasper score session <id> or bulk with last N
Subagent Tracking — Tracks subagent calls and evaluates child sessions independently
Compaction Feedback — Top weaknesses injected into session compaction for ongoing agent awareness
Backups & Safety — Timestamped backups before every change; atomic writes with file locks

Installation

npm install @atonev/opencode-kasper

Add to your opencode config:

{
  "plugin": ["@atonev/opencode-kasper"]
}

With options:

{
  "plugin": [
    ["@atonev/opencode-kasper", { "auto_update": true }]
  ]
}

Verify: Start a session and run /kasper status.

Commands

Command	Description
`/kasper status [agent]`	Aggregate scores, top weaknesses, recent sessions, sparkline trend
`/kasper score session <id>`	Evaluate a past session (`last N`, `since YYYY-MM-DD`, `range X Y`)
`/kasper improve [agent]`	Numbered table of improvement suggestions
`/kasper apply [n\|all]`	Apply pending improvement
`/kasper history [agent]`	Session history with score breakdowns
`/kasper auto on\|off`	Toggle auto-apply for improvements
`/kasper config`	Display current configuration
`/kasper reset`	Clear all state
`/kasper help`	Show all commands

Tools

Tool	Description
`kasper_status`	Aggregate scores, per-agent breakdown, weaknesses
`kasper_improve`	Numbered improvement suggestions
`kasper_apply`	Apply by `[N]` index
`kasper_history`	Adherence history and trends
`kasper_score_session`	Evaluate one or more sessions
`kasper_reset`	Clear all state

Configuration

Loaded from ~/.config/opencode/kasper.jsonc, .opencode/kasper.jsonc, or the kasper key in opencode.json.

{
  "enabled": true,
  "auto_update": true,              // Auto-apply improvements
  "scoring_threshold": 0.6,         // Score below this triggers suggestions
  "model": "opencode/deepseek-v4-flash-free",
  "weakness_decay_days": 30,
  "detail_level": "standard",       // minimal | standard | thorough
  "quiet": false,
  "evaluate_subagents": false,
  "min_session_messages": 3,
  "debug": false,
  "state_dir": "",                  // Custom state directory
  "evaluation_poll_interval_ms": 10000,
  "scoring_retries": 2,
  "scoring_timeout_ms": 120000,
  "max_score_input_chars": 10000
}

Scoring

Each session is scored 0.0–1.0 across five dimensions:

Dimension	Description
`instruction_following`	Did the agent do exactly what was asked?
`completeness`	Did the agent fully complete the task?
`proactiveness`	Did the agent act appropriately?
`code_quality`	Quality and maintainability of code produced
`communication`	Clarity and helpfulness of explanations

Scores display as 🟢 ≥80%, 🟡 ≥60%, 🔴 <60%. The /kasper status command shows an ASCII sparkline of the last 7 session scores.

How It Works

Observe — Hooks on chat.message and session.created accumulate session data; 10s polling catches idle sessions
Evaluate — LLM-as-judge scores each session across 5 dimensions; large sessions split into per-pair evaluation
Improve — Recurring weaknesses trigger suggestions (AGENTS.md or per-agent prompt); auto-applied or queued for review
Measure — Score delta tracking shows before/after improvement impact in /kasper history

Limitations

Forward-looking only. Only sessions created after plugin start are auto-scored. Use /kasper score session last <N> for retroactive batch scoring.
Current config only. Scoring uses today's AGENTS.md and prompts, not the versions active when the session originally ran.
Subagents. Auto-scoring of subagent sessions is controlled by evaluate_subagents (default: false). Child sessions are evaluated during manual score session.

Development

bun install       # Install dependencies
bun run build     # Compile TypeScript
bun run typecheck # Type-check only
bun test          # 308 tests, all passing

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
scripts		scripts
src		src
tests		tests
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
ANALYSIS.md		ANALYSIS.md
CHANGELOG.md		CHANGELOG.md
DESIGN.html		DESIGN.html
DESIGN.md		DESIGN.md
IMPROVEMENTS.md		IMPROVEMENTS.md
LICENSE		LICENSE
PLAN.md		PLAN.md
README.md		README.md
REVIEW.md		REVIEW.md
biome.json		biome.json
bun.lock		bun.lock
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kasper

Features

Installation

Commands

Tools

Configuration

Scoring

How It Works

Limitations

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kasper

Features

Installation

Commands

Tools

Configuration

Scoring

How It Works

Limitations

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages