Skip to content

feat(core): expose {{ tool_calls }} template variable for LLM graders#1123

Merged
christso merged 7 commits intomainfrom
feat/1121-tool-calls-template-var
Apr 16, 2026
Merged

feat(core): expose {{ tool_calls }} template variable for LLM graders#1123
christso merged 7 commits intomainfrom
feat/1121-tool-calls-template-var

Conversation

@christso
Copy link
Copy Markdown
Collaborator

@christso christso commented Apr 16, 2026

Summary

  • Adds {{ tool_calls }} template variable so LLM graders can see tool call details in their evaluation prompts
  • Previously LLM graders were blind to tool calls — {{ output }} only contains plain text
  • Formats each tool call as a compact line: tool name + key input fields (skill, file_path, command, pattern)

Changes

File Type
packages/core/src/evaluation/graders/format-tool-calls.ts NEWformatToolCalls() utility
packages/core/src/evaluation/graders/types.ts Add toolCalls to EvaluationContext
packages/core/src/evaluation/template-variables.ts Add TOOL_CALLS constant
packages/core/src/evaluation/orchestrator.ts Thread toolCalls through pipeline (~15 sites)
packages/core/src/evaluation/graders/llm-grader.ts Wire into all prompt builders (~8 sites)
packages/core/src/evaluation/graders/llm-grader-prompt.ts Wire into prompt assembly functions
packages/core/src/evaluation/graders/index.ts Export formatToolCalls
packages/core/test/evaluation/graders/format-tool-calls.test.ts NEW — 12 unit tests
apps/web/src/content/docs/docs/graders/llm-graders.mdx Add tool_calls to template vars docs
apps/web/src/content/docs/docs/evaluation/rubrics.mdx Add "Context Available to Rubric Graders" section
examples/features/tool-calls-template/ NEW — e2e example with flat rubric assertions

Example usage

assertions:
  - The agent invoked the acme-deploy skill
  - The agent used Read to inspect the config file before editing

Test plan

  • All 2198 existing tests pass (1659 core + 67 eval + 472 cli)
  • 12 new formatToolCalls tests pass
  • TypeScript type check passes
  • Biome lint passes
  • Pre-push hooks pass (build, typecheck, lint, test, validate)
  • Manual UAT: e2e example with copilot — 3/3 tests pass at 100% using flat rubric assertions with {{ tool_calls }} context

Closes #1121

Add a new `{{ tool_calls }}` template variable that provides LLM graders
with a formatted summary of tool calls from agent execution. Previously,
LLM graders were blind to tool call details — only `{{ output }}` was
available (plain text).

The new variable formats each tool call as a compact line with the tool
name and key input fields (skill name for Skill, file_path for
Read/Write/Edit, command for Bash, pattern for Grep/Glob).

Changes:
- New `formatToolCalls()` utility in format-tool-calls.ts
- Add `toolCalls` field to EvaluationContext interface
- Add TOOL_CALLS to TEMPLATE_VARIABLES constants
- Thread toolCalls through orchestrator pipeline (~15 sites)
- Wire into all LLM grader prompt builders (~8 sites)
- Auto-append `[[ ## tool_calls ## ]]` section in default templates
- 12 new unit tests for formatToolCalls
- Update docs site and skill references

Closes #1121
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Apr 16, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 8a201c3
Status: ✅  Deploy successful!
Preview URL: https://b6a691ff.agentv.pages.dev
Branch Preview URL: https://feat-1121-tool-calls-templat.agentv.pages.dev

View logs

christso and others added 2 commits April 16, 2026 03:45
…variable

Demonstrates using {{ tool_calls }} in LLM grader prompts to verify
skill invocation — an alternative to the deterministic skill-trigger
grader when LLM reasoning is needed.

Includes:
- Mock CLI agent returning Skill/Read/Edit/Bash tool calls
- LLM grader prompts using {{ tool_calls }} for positive/negative cases
- 3 test cases: deploy skill, review-pr skill, no-skill bugfix
…ol-calls example

Move mock_agent and openrouter_grader targets to root .agentv/targets.yaml
instead of a per-example targets file. Fix prompt references to use
file:// prefix so they're resolved as file paths rather than inline text.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@christso christso marked this pull request as ready for review April 16, 2026 04:02
christso and others added 4 commits April 16, 2026 04:31
- Remove openrouter_grader target, use shared grader (via GRADER_TARGET)
- Rename dataset.eval.yaml to eval.yaml
- Verified with both mock_agent (3/3 pass) and copilot (tool_calls populated)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace mock CLI agent with real copilot-compatible workspace template
containing acme-deploy skill in all provider directories. Verified 3/3
pass with copilot target (skill triggered, rollback triggered, no skill
for unrelated). Remove mock_agent target from root targets.yaml.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ssertions

- Keep only .agents/skills/acme-deploy/SKILL.md as single source of truth
- Add before_all hook to copy skills to .claude/skills/ in workspace
- Switch from llm-grader with custom prompts to rubric assertions
- Remove prompts/ directory and mock-agent.ts
- Remove mock_agent target from root targets.yaml
- Verified 3/3 pass with copilot at 100%

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e assertions

Add "Context Available to Rubric Graders" section to rubrics.mdx
documenting that rubric assertions receive tool_calls and file_changes
context. Flatten example eval assertions from `type: rubrics` with
`criteria:` to plain string shorthand.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@christso christso merged commit 4038218 into main Apr 16, 2026
4 checks passed
@christso christso deleted the feat/1121-tool-calls-template-var branch April 16, 2026 05:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(core): expose {{ tool_calls }} template variable for LLM graders

1 participant