feat(targets): remove workspace_template, add target-level hooks#1095
Merged
feat(targets): remove workspace_template, add target-level hooks#1095
Conversation
Remove the unused `workspace_template` field from target schema and add per-target hooks support in eval files. Target hooks let a single eval file compare different harness configurations (e.g., baseline vs with-plugins) by running setup/teardown scripts per target variant. - Remove `workspace_template` from BASE_TARGET_SCHEMA, TargetDefinition, all provider resolved configs, orchestrator, and validators - Add `TargetHooksConfig` and `EvalTargetRef` types - Extend `execution.targets` in eval files to accept objects with hooks - Execute target hooks at 4 lifecycle points: before_all, before_each, after_each, after_all (nested inside workspace hooks) - Update docs and regenerate eval-schema.json Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deploying agentv with
|
| Latest commit: |
db1a83a
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://df322415.agentv.pages.dev |
| Branch Preview URL: | https://feat-1094-target-hooks.agentv.pages.dev |
- Accept string command shorthand in WorkspaceHookSchema (matches docs and parseHookConfig behavior) - Restore deprecation warning when workspace_template appears in targets.yaml (downgraded from error to warning for migration help) - Fix misleading "runs ONCE" comment on target before_all Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
christso
added a commit
that referenced
this pull request
Apr 14, 2026
Replace the removed workspace_template field with the target-level hooks pattern from #1095. A single base 'claude' target is defined in targets.yaml, and the eval file's execution.targets uses before_each hooks to copy variant-specific plugin configs into the workspace. Also fixes: - Use 'id' instead of deprecated 'case' in test definitions - Use full commit hash with resolve: local for base_commit - Remove shallow clone (depth: 1) that prevented commit checkout Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7 tasks
christso
added a commit
that referenced
this pull request
Apr 14, 2026
…luation (#1091) * docs(showcase): add bug-fix-benchmark example for SWE-bench style evaluation Add a showcase example demonstrating how to evaluate coding agents on real-world bug fixes using public GitHub repositories with Docker workspace isolation and commit-pinned repos. Includes: - EVAL.yaml with example test cases (null-check, fallback, property-access bugs) - targets.yaml showing all auth options (subscription, API key, mock) - mock-agent.sh for testing without API keys - import-swebench.sh for importing SWE-bench dataset instances Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(showcase): add multi-plugin benchmark with baseline comparison Add workspace templates for comparing agent performance with and without engineering plugins: superpowers, compound-engineering, agent-skills. - Add workspaces/ with per-plugin .claude/settings.json configs - Update targets.yaml with claude-baseline, claude-superpowers, claude-compound, claude-agent-skills targets - Replace hypothetical test cases with real issue #912 bug fix task - Add scripts/setup-plugins.sh for plugin installation - Update README with comparison workflow and plugin details Closes #919 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(showcase): use bypassPermissions in all workspace settings Use defaultMode: bypassPermissions instead of listing individual Bash allow rules, matching how the agentv dev environment is configured. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(showcase): use target-level hooks instead of workspace_template Replace the removed workspace_template field with the target-level hooks pattern from #1095. A single base 'claude' target is defined in targets.yaml, and the eval file's execution.targets uses before_each hooks to copy variant-specific plugin configs into the workspace. Also fixes: - Use 'id' instead of deprecated 'case' in test definitions - Use full commit hash with resolve: local for base_commit - Remove shallow clone (depth: 1) that prevented commit checkout Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(showcase): use claude-cli provider with configurable executable Switch from provider: claude to provider: claude-cli with an executable field that reads from CLAUDE_EXECUTABLE env var (defaults to "claude"). This allows using custom CLI binaries like claude-zai. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore(showcase): remove unused import-swebench.sh script The script was speculative and non-functional (used deprecated fields, hardcoded docker config, broken template variables). Not needed for the benchmark showcase. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(showcase): remove local targets.yaml, use inline rubric assertions - Remove local .agentv/targets.yaml — use repo root targets instead (targets don't merge, closest shadows; local one forced duplicating grader targets unnecessarily) - Replace llm-grader assertion with inline rubric strings (auto-unwrapped to rubrics evaluator) - Remove unused scripts: mock-agent.sh (broken with workspace repos), setup-plugins.sh (orphaned, settings.json already checked in) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(showcase): use AGENT_TARGET, reset workspace, drop fragile contains assertion - Replace hardcoded use_target: claude with ${{ AGENT_TARGET }} so the benchmark works with any provider via env var - Add workspace.hooks.before_each.reset: fast for proper isolation between pool slot reuse across plugin variants - Remove contains: effectiveCwd assertion (checks response text, not the diff); rubrics already validate the fix via file_changes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1094
Summary
workspace_templatefrom target schema — unused field (zero references in any eval file). Removed fromBASE_TARGET_SCHEMA,TargetDefinition, all provider resolved configs, orchestrator, validators, and docs.execution.targetsusing object form, enabling single-file harness variant comparison (e.g., baseline vs with-plugins vs with-guidelines).Target hooks example
Hook execution order
Target hooks nest inside workspace hooks (standard setup/teardown nesting):
before_all→ Targetbefore_allbefore_each→ Targetbefore_each→ test → Targetafter_each→ Workspaceafter_eachafter_all→ Workspaceafter_allChanges
packages/core/src/evaluation/types.ts— newTargetHooksConfig,EvalTargetReftypespackages/core/src/evaluation/validation/eval-file.schema.ts—execution.targetsaccepts(string | EvalTargetRef)[]packages/core/src/evaluation/loaders/config-loader.ts— newextractTargetRefsFromSuite()packages/core/src/evaluation/orchestrator.ts— 4 lifecycle hook execution pointsapps/cli/src/commands/eval/targets.ts— synthetic target injection, hooks threadingapps/cli/src/commands/eval/run-eval.ts— passtargetRefsandtargetHooksthroughTest plan
bun run build— passesbun run typecheck— passesbun run lint— passesbun run test— 2157 tests pass (1642 core + 67 eval + 448 cli)bun run validate:examples— 56/56 valid🤖 Generated with Claude Code