Add 4 judge providers and TrajectoryOptimality metric by pratyush618 · Pull Request #9 · ByteVeda/agenteval

pratyush618 · 2026-03-13T07:22:46Z

Summary

Google Gemini judge provider — generateContent API with x-goog-api-key header auth and responseMimeType: application/json
Azure OpenAI judge provider — Azure deployments endpoint with api-key header, configurable apiVersion
Amazon Bedrock judge provider — AWS SigV4 request signing, Anthropic Messages API format for Claude models
Custom HTTP judge provider — OpenAI-compatible endpoint for vLLM, LiteLLM, LocalAI, and other self-hosted servers
TrajectoryOptimality metric — LLM-as-judge metric evaluating agent path efficiency, penalizing redundant tool calls, circular reasoning, and unnecessary steps; configurable maxSteps
JudgeModels factory updated with google(), azure(), bedrock(), custom() methods
38 new tests, 632 total (all passing)

Test plan

mvn clean install — all 17 modules build, 632 tests pass
Checkstyle + SpotBugs: 0 violations
Google: request URL, auth header, content/token extraction, empty candidates
Azure: deployment URL with api-version, api-key header, default version fallback
Bedrock: SigV4 auth headers, region extraction, session token, content parsing
Custom: optional Bearer auth, OpenAI-compatible format, no-key mode
TrajectoryOptimality: optimal/suboptimal scoring, maxSteps, validation, defaults

- GoogleJudgeModel: Gemini generateContent API with x-goog-api-key auth - AzureOpenAiJudgeModel: Azure deployments endpoint with api-key header - BedrockJudgeModel: AWS SigV4 signing with Anthropic Messages format - CustomHttpJudgeModel: OpenAI-compatible endpoint for vLLM/LiteLLM/LocalAI - JudgeModels factory updated with google(), azure(), bedrock(), custom() - 30 new tests across all providers

- LLM-as-judge metric that penalizes redundant tool calls, circular reasoning, and unnecessary steps - Configurable maxSteps parameter for step count bounds - Validates on reasoningTrace or toolCalls (actualOutput optional) - 8 tests covering optimal/suboptimal trajectories, validation, defaults

- Gradle plugin tests: replace findByName/findByType with getByName/getByType to eliminate nullable dereference warnings (AssertJ isNotNull doesn't satisfy IDE null analysis) - Remove unused AZURE_OPENAI_API_KEY_ENV constant from JudgeModels

pratyush618 added 3 commits March 13, 2026 12:50

pratyush618 merged commit 08754f4 into main Mar 13, 2026

pratyush618 deleted the feat/p5-providers-and-metrics branch March 31, 2026 17:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 4 judge providers and TrajectoryOptimality metric#9

Add 4 judge providers and TrajectoryOptimality metric#9
pratyush618 merged 3 commits intomainfrom
feat/p5-providers-and-metrics

pratyush618 commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pratyush618 commented Mar 13, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant