Skip to content

feat(replay): add quality assertions#685

Merged
XingYu-Zhong merged 1 commit into
KunAgent:developfrom
luoye520ww:codex/replay-quality-assertions
Jul 1, 2026
Merged

feat(replay): add quality assertions#685
XingYu-Zhong merged 1 commit into
KunAgent:developfrom
luoye520ww:codex/replay-quality-assertions

Conversation

@luoye520ww

Copy link
Copy Markdown
Collaborator

Summary

  • extend the existing HTTP/SSE replay benchmark with correctness assertions
  • score required output, forbidden behavior, changed files, and cost budgets
  • include quality violations in the existing run failure path

Why

The experimental v2 branch had a separate pure replay evaluator that was never connected to a runnable benchmark. This change folds the useful scoring behavior into the existing replay runner from #645 instead of introducing a parallel evaluation system.

Changes

  • add requiredOutputs, forbiddenBehaviors, expectedChangedFiles, and maxCostUsd to replay expectations
  • derive observations from real assistant and tool-call SSE items
  • use file_change tool arguments for changed-file assertions
  • emit a weighted quality result while retaining the current pass/fail report contract

Tests

  • npm.cmd --prefix kun test -- replay-benchmark.test.ts (8 passed)
  • npm.cmd --prefix kun run typecheck
  • npm.cmd --prefix kun run build
  • focused ESLint
  • git diff --check

@luoye520ww luoye520ww marked this pull request as ready for review July 1, 2026 13:56

@XingYu-Zhong XingYu-Zhong left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed and approved for merge.

@XingYu-Zhong XingYu-Zhong merged commit d031113 into KunAgent:develop Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants