feat(core): expose {{ tool_calls }} template variable for LLM graders#1123
Merged
feat(core): expose {{ tool_calls }} template variable for LLM graders#1123
Conversation
Add a new `{{ tool_calls }}` template variable that provides LLM graders
with a formatted summary of tool calls from agent execution. Previously,
LLM graders were blind to tool call details — only `{{ output }}` was
available (plain text).
The new variable formats each tool call as a compact line with the tool
name and key input fields (skill name for Skill, file_path for
Read/Write/Edit, command for Bash, pattern for Grep/Glob).
Changes:
- New `formatToolCalls()` utility in format-tool-calls.ts
- Add `toolCalls` field to EvaluationContext interface
- Add TOOL_CALLS to TEMPLATE_VARIABLES constants
- Thread toolCalls through orchestrator pipeline (~15 sites)
- Wire into all LLM grader prompt builders (~8 sites)
- Auto-append `[[ ## tool_calls ## ]]` section in default templates
- 12 new unit tests for formatToolCalls
- Update docs site and skill references
Closes #1121
Deploying agentv with
|
| Latest commit: |
8a201c3
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://b6a691ff.agentv.pages.dev |
| Branch Preview URL: | https://feat-1121-tool-calls-templat.agentv.pages.dev |
…variable
Demonstrates using {{ tool_calls }} in LLM grader prompts to verify
skill invocation — an alternative to the deterministic skill-trigger
grader when LLM reasoning is needed.
Includes:
- Mock CLI agent returning Skill/Read/Edit/Bash tool calls
- LLM grader prompts using {{ tool_calls }} for positive/negative cases
- 3 test cases: deploy skill, review-pr skill, no-skill bugfix
…ol-calls example Move mock_agent and openrouter_grader targets to root .agentv/targets.yaml instead of a per-example targets file. Fix prompt references to use file:// prefix so they're resolved as file paths rather than inline text. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove openrouter_grader target, use shared grader (via GRADER_TARGET) - Rename dataset.eval.yaml to eval.yaml - Verified with both mock_agent (3/3 pass) and copilot (tool_calls populated) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace mock CLI agent with real copilot-compatible workspace template containing acme-deploy skill in all provider directories. Verified 3/3 pass with copilot target (skill triggered, rollback triggered, no skill for unrelated). Remove mock_agent target from root targets.yaml. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ssertions - Keep only .agents/skills/acme-deploy/SKILL.md as single source of truth - Add before_all hook to copy skills to .claude/skills/ in workspace - Switch from llm-grader with custom prompts to rubric assertions - Remove prompts/ directory and mock-agent.ts - Remove mock_agent target from root targets.yaml - Verified 3/3 pass with copilot at 100% Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e assertions Add "Context Available to Rubric Graders" section to rubrics.mdx documenting that rubric assertions receive tool_calls and file_changes context. Flatten example eval assertions from `type: rubrics` with `criteria:` to plain string shorthand. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
{{ tool_calls }}template variable so LLM graders can see tool call details in their evaluation prompts{{ output }}only contains plain textChanges
packages/core/src/evaluation/graders/format-tool-calls.tsformatToolCalls()utilitypackages/core/src/evaluation/graders/types.tstoolCallstoEvaluationContextpackages/core/src/evaluation/template-variables.tsTOOL_CALLSconstantpackages/core/src/evaluation/orchestrator.tstoolCallsthrough pipeline (~15 sites)packages/core/src/evaluation/graders/llm-grader.tspackages/core/src/evaluation/graders/llm-grader-prompt.tspackages/core/src/evaluation/graders/index.tsformatToolCallspackages/core/test/evaluation/graders/format-tool-calls.test.tsapps/web/src/content/docs/docs/graders/llm-graders.mdxtool_callsto template vars docsapps/web/src/content/docs/docs/evaluation/rubrics.mdxexamples/features/tool-calls-template/Example usage
Test plan
formatToolCallstests pass{{ tool_calls }}contextCloses #1121