Skip to content

Close spec compliance gaps#13

Merged
pratyush618 merged 14 commits intomainfrom
feat/spec-compliance-gaps
Mar 28, 2026
Merged

Close spec compliance gaps#13
pratyush618 merged 14 commits intomainfrom
feat/spec-compliance-gaps

Conversation

@pratyush618
Copy link
Copy Markdown
Collaborator

Summary

  • Add missing JUnit 5 annotations: @GoldenSet, @JudgeModelConfig, @EvalTimeout with full extension support
  • Add agenteval-bom module for centralized dependency management
  • Add YAML dataset read support (YamlDatasetLoader, .yaml/.yml detection)
  • Implement judge result caching (CachingJudgeModel decorator, auto-wraps when cacheJudgeResults=true)
  • Add Maven wrapper (3.9.9) for reproducible builds
  • Use Java 21 sealed classes (AbstractHttpJudgeModel, LLMConversationMetric) and pattern matching (McpSchemaValidator)
  • Align metric default thresholds with spec: ToolResultUtilization 0.6→0.7, TopicDriftDetection 0.6→0.5, ConversationResolution 0.7→0.5
  • Add toolCallCount() and outputMatchesSchema() to fluent assertion API
  • Add .editorconfig, .pre-commit-config.yaml, GitHub Actions CI workflows, Dependabot
  • Fix spotbugs exclusions and editorconfig compliance across XML files

Test plan

  • Full test suite passes (579 tests, 0 failures)
  • Pre-commit hooks pass (checkstyle, editorconfig, trailing whitespace)
  • Verify YAML dataset loading with a sample .yaml file
  • Verify @EvalTimeout annotation triggers timeout on slow evaluations
  • Verify @JudgeModelConfig correctly overrides judge at class/method level

@github-advanced-security
Copy link
Copy Markdown

You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool.

What Enabling Code Scanning Means:

  • The 'Security' tab will display more code scanning analysis results (e.g., for the default branch).
  • Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results.
  • You will be able to see the analysis results for the pull request's branch on this overview once the scans have completed and the checks have passed.

For more information about GitHub Code Scanning, check out the documentation.

@pratyush618 pratyush618 merged commit df0b5e5 into main Mar 28, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants