Skip to content

feat(targets): remove workspace_template, add target-level hooks#1095

Merged
christso merged 2 commits intomainfrom
feat/1094-target-hooks
Apr 14, 2026
Merged

feat(targets): remove workspace_template, add target-level hooks#1095
christso merged 2 commits intomainfrom
feat/1094-target-hooks

Conversation

@christso
Copy link
Copy Markdown
Collaborator

Closes #1094

Summary

  • Remove workspace_template from target schema — unused field (zero references in any eval file). Removed from BASE_TARGET_SCHEMA, TargetDefinition, all provider resolved configs, orchestrator, validators, and docs.
  • Add target-level hooks — eval files can now define per-target setup/teardown hooks in execution.targets using object form, enabling single-file harness variant comparison (e.g., baseline vs with-plugins vs with-guidelines).

Target hooks example

execution:
  targets:
    - baseline                          # string shorthand (no hooks)
    - name: with-skills                 # object form with hooks
      use_target: default
      hooks:
        before_each:
          command: ["setup-plugins.sh", "skills"]

Hook execution order

Target hooks nest inside workspace hooks (standard setup/teardown nesting):

  1. Workspace before_allTarget before_all
  2. Per test: Workspace before_eachTarget before_each → test → Target after_each → Workspace after_each
  3. Target after_all → Workspace after_all

Changes

  • packages/core/src/evaluation/types.ts — new TargetHooksConfig, EvalTargetRef types
  • packages/core/src/evaluation/validation/eval-file.schema.tsexecution.targets accepts (string | EvalTargetRef)[]
  • packages/core/src/evaluation/loaders/config-loader.ts — new extractTargetRefsFromSuite()
  • packages/core/src/evaluation/orchestrator.ts — 4 lifecycle hook execution points
  • apps/cli/src/commands/eval/targets.ts — synthetic target injection, hooks threading
  • apps/cli/src/commands/eval/run-eval.ts — pass targetRefs and targetHooks through
  • Docs, tests, eval-schema.json updated

Test plan

  • bun run build — passes
  • bun run typecheck — passes
  • bun run lint — passes
  • bun run test — 2157 tests pass (1642 core + 67 eval + 448 cli)
  • bun run validate:examples — 56/56 valid
  • Manual UAT with real eval file using target hooks

🤖 Generated with Claude Code

Remove the unused `workspace_template` field from target schema and add
per-target hooks support in eval files. Target hooks let a single eval
file compare different harness configurations (e.g., baseline vs
with-plugins) by running setup/teardown scripts per target variant.

- Remove `workspace_template` from BASE_TARGET_SCHEMA, TargetDefinition,
  all provider resolved configs, orchestrator, and validators
- Add `TargetHooksConfig` and `EvalTargetRef` types
- Extend `execution.targets` in eval files to accept objects with hooks
- Execute target hooks at 4 lifecycle points: before_all, before_each,
  after_each, after_all (nested inside workspace hooks)
- Update docs and regenerate eval-schema.json

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Apr 14, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: db1a83a
Status: ✅  Deploy successful!
Preview URL: https://df322415.agentv.pages.dev
Branch Preview URL: https://feat-1094-target-hooks.agentv.pages.dev

View logs

- Accept string command shorthand in WorkspaceHookSchema (matches docs
  and parseHookConfig behavior)
- Restore deprecation warning when workspace_template appears in
  targets.yaml (downgraded from error to warning for migration help)
- Fix misleading "runs ONCE" comment on target before_all

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@christso christso marked this pull request as ready for review April 14, 2026 04:31
@christso christso merged commit b3760ce into main Apr 14, 2026
4 checks passed
@christso christso deleted the feat/1094-target-hooks branch April 14, 2026 04:31
christso added a commit that referenced this pull request Apr 14, 2026
Replace the removed workspace_template field with the target-level hooks
pattern from #1095. A single base 'claude' target is defined in
targets.yaml, and the eval file's execution.targets uses before_each
hooks to copy variant-specific plugin configs into the workspace.

Also fixes:
- Use 'id' instead of deprecated 'case' in test definitions
- Use full commit hash with resolve: local for base_commit
- Remove shallow clone (depth: 1) that prevented commit checkout

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
christso added a commit that referenced this pull request Apr 14, 2026
…luation (#1091)

* docs(showcase): add bug-fix-benchmark example for SWE-bench style evaluation

Add a showcase example demonstrating how to evaluate coding agents on
real-world bug fixes using public GitHub repositories with Docker workspace
isolation and commit-pinned repos.

Includes:
- EVAL.yaml with example test cases (null-check, fallback, property-access bugs)
- targets.yaml showing all auth options (subscription, API key, mock)
- mock-agent.sh for testing without API keys
- import-swebench.sh for importing SWE-bench dataset instances

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs(showcase): add multi-plugin benchmark with baseline comparison

Add workspace templates for comparing agent performance with and without
engineering plugins: superpowers, compound-engineering, agent-skills.

- Add workspaces/ with per-plugin .claude/settings.json configs
- Update targets.yaml with claude-baseline, claude-superpowers,
  claude-compound, claude-agent-skills targets
- Replace hypothetical test cases with real issue #912 bug fix task
- Add scripts/setup-plugins.sh for plugin installation
- Update README with comparison workflow and plugin details

Closes #919

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(showcase): use bypassPermissions in all workspace settings

Use defaultMode: bypassPermissions instead of listing individual
Bash allow rules, matching how the agentv dev environment is configured.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(showcase): use target-level hooks instead of workspace_template

Replace the removed workspace_template field with the target-level hooks
pattern from #1095. A single base 'claude' target is defined in
targets.yaml, and the eval file's execution.targets uses before_each
hooks to copy variant-specific plugin configs into the workspace.

Also fixes:
- Use 'id' instead of deprecated 'case' in test definitions
- Use full commit hash with resolve: local for base_commit
- Remove shallow clone (depth: 1) that prevented commit checkout

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(showcase): use claude-cli provider with configurable executable

Switch from provider: claude to provider: claude-cli with an executable
field that reads from CLAUDE_EXECUTABLE env var (defaults to "claude").
This allows using custom CLI binaries like claude-zai.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore(showcase): remove unused import-swebench.sh script

The script was speculative and non-functional (used deprecated fields,
hardcoded docker config, broken template variables). Not needed for the
benchmark showcase.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(showcase): remove local targets.yaml, use inline rubric assertions

- Remove local .agentv/targets.yaml — use repo root targets instead
  (targets don't merge, closest shadows; local one forced duplicating
  grader targets unnecessarily)
- Replace llm-grader assertion with inline rubric strings (auto-unwrapped
  to rubrics evaluator)
- Remove unused scripts: mock-agent.sh (broken with workspace repos),
  setup-plugins.sh (orphaned, settings.json already checked in)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(showcase): use AGENT_TARGET, reset workspace, drop fragile contains assertion

- Replace hardcoded use_target: claude with ${{ AGENT_TARGET }} so the
  benchmark works with any provider via env var
- Add workspace.hooks.before_each.reset: fast for proper isolation
  between pool slot reuse across plugin variants
- Remove contains: effectiveCwd assertion (checks response text, not
  the diff); rubrics already validate the fix via file_changes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: remove workspace_template from targets, add target-level hooks for harness variant testing

1 participant