feat(core): expose {{ tool_calls }} template variable for LLM graders by christso · Pull Request #1123 · EntityProcess/agentv

christso · 2026-04-16T03:39:12Z

Summary

Adds {{ tool_calls }} template variable so LLM graders can see tool call details in their evaluation prompts
Previously LLM graders were blind to tool calls — {{ output }} only contains plain text
Formats each tool call as a compact line: tool name + key input fields (skill, file_path, command, pattern)

Changes

File	Type
`packages/core/src/evaluation/graders/format-tool-calls.ts`	NEW — `formatToolCalls()` utility
`packages/core/src/evaluation/graders/types.ts`	Add `toolCalls` to `EvaluationContext`
`packages/core/src/evaluation/template-variables.ts`	Add `TOOL_CALLS` constant
`packages/core/src/evaluation/orchestrator.ts`	Thread `toolCalls` through pipeline (~15 sites)
`packages/core/src/evaluation/graders/llm-grader.ts`	Wire into all prompt builders (~8 sites)
`packages/core/src/evaluation/graders/llm-grader-prompt.ts`	Wire into prompt assembly functions
`packages/core/src/evaluation/graders/index.ts`	Export `formatToolCalls`
`packages/core/test/evaluation/graders/format-tool-calls.test.ts`	NEW — 12 unit tests
`apps/web/src/content/docs/docs/graders/llm-graders.mdx`	Add `tool_calls` to template vars docs
`apps/web/src/content/docs/docs/evaluation/rubrics.mdx`	Add "Context Available to Rubric Graders" section
`examples/features/tool-calls-template/`	NEW — e2e example with flat rubric assertions

Example usage

assertions:
  - The agent invoked the acme-deploy skill
  - The agent used Read to inspect the config file before editing

Test plan

All 2198 existing tests pass (1659 core + 67 eval + 472 cli)
12 new formatToolCalls tests pass
TypeScript type check passes
Biome lint passes
Pre-push hooks pass (build, typecheck, lint, test, validate)
Manual UAT: e2e example with copilot — 3/3 tests pass at 100% using flat rubric assertions with {{ tool_calls }} context

Closes #1121

Add a new `{{ tool_calls }}` template variable that provides LLM graders with a formatted summary of tool calls from agent execution. Previously, LLM graders were blind to tool call details — only `{{ output }}` was available (plain text). The new variable formats each tool call as a compact line with the tool name and key input fields (skill name for Skill, file_path for Read/Write/Edit, command for Bash, pattern for Grep/Glob). Changes: - New `formatToolCalls()` utility in format-tool-calls.ts - Add `toolCalls` field to EvaluationContext interface - Add TOOL_CALLS to TEMPLATE_VARIABLES constants - Thread toolCalls through orchestrator pipeline (~15 sites) - Wire into all LLM grader prompt builders (~8 sites) - Auto-append `[[ ## tool_calls ## ]]` section in default templates - 12 new unit tests for formatToolCalls - Update docs site and skill references Closes #1121

cloudflare-workers-and-pages · 2026-04-16T03:39:46Z

Deploying agentv with Cloudflare Pages

Latest commit:	`8a201c3`
Status:	✅ Deploy successful!
Preview URL:	https://b6a691ff.agentv.pages.dev
Branch Preview URL:	https://feat-1121-tool-calls-templat.agentv.pages.dev

View logs

…variable Demonstrates using {{ tool_calls }} in LLM grader prompts to verify skill invocation — an alternative to the deterministic skill-trigger grader when LLM reasoning is needed. Includes: - Mock CLI agent returning Skill/Read/Edit/Bash tool calls - LLM grader prompts using {{ tool_calls }} for positive/negative cases - 3 test cases: deploy skill, review-pr skill, no-skill bugfix

…ol-calls example Move mock_agent and openrouter_grader targets to root .agentv/targets.yaml instead of a per-example targets file. Fix prompt references to use file:// prefix so they're resolved as file paths rather than inline text. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Remove openrouter_grader target, use shared grader (via GRADER_TARGET) - Rename dataset.eval.yaml to eval.yaml - Verified with both mock_agent (3/3 pass) and copilot (tool_calls populated) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace mock CLI agent with real copilot-compatible workspace template containing acme-deploy skill in all provider directories. Verified 3/3 pass with copilot target (skill triggered, rollback triggered, no skill for unrelated). Remove mock_agent target from root targets.yaml. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ssertions - Keep only .agents/skills/acme-deploy/SKILL.md as single source of truth - Add before_all hook to copy skills to .claude/skills/ in workspace - Switch from llm-grader with custom prompts to rubric assertions - Remove prompts/ directory and mock-agent.ts - Remove mock_agent target from root targets.yaml - Verified 3/3 pass with copilot at 100% Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…e assertions Add "Context Available to Rubric Graders" section to rubrics.mdx documenting that rubric assertions receive tool_calls and file_changes context. Flatten example eval assertions from `type: rubrics` with `criteria:` to plain string shorthand. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

christso and others added 2 commits April 16, 2026 03:45

christso marked this pull request as ready for review April 16, 2026 04:02

christso and others added 4 commits April 16, 2026 04:31

christso merged commit 4038218 into main Apr 16, 2026
4 checks passed

christso deleted the feat/1121-tool-calls-template-var branch April 16, 2026 05:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): expose {{ tool_calls }} template variable for LLM graders#1123

feat(core): expose {{ tool_calls }} template variable for LLM graders#1123
christso merged 7 commits intomainfrom
feat/1121-tool-calls-template-var

christso commented Apr 16, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Example usage

Test plan

Uh oh!

cloudflare-workers-and-pages bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

christso commented Apr 16, 2026 •

edited

Loading

cloudflare-workers-and-pages bot commented Apr 16, 2026 •

edited

Loading