Skip to content

[bench] pipeline: fix citation counting, oracle validation, and defensive improvements#4

Merged
dburks-svg merged 1 commit into
mainfrom
bench/deepseek-timeout-increase
May 15, 2026
Merged

[bench] pipeline: fix citation counting, oracle validation, and defensive improvements#4
dburks-svg merged 1 commit into
mainfrom
bench/deepseek-timeout-increase

Conversation

@dburks-svg
Copy link
Copy Markdown
Contributor

Summary

  • Fix broken citation counting in cli/commands.py and utils/viewer.py — oracle citations are dicts with constraint_id, not plain strings; both sites now handle both shapes with fallback logging for unknown types
  • Fix citation display in _print_entry_line — extracts constraint_id from dicts instead of rendering raw dict repr
  • Reject VETO with empty citations in oracle validation — a veto without citing any violated constraint is semantically invalid
  • Add early OPENROUTER_API_KEY validation — surfaces a clear error instead of an opaque SDK exception when the env var is missing
  • Fix token accounting consistency_accumulate_tokens now runs unconditionally after the challenger CLEAR/else branch

Test plan

  • Full test suite passes (264/264)
  • No exposed API keys or secrets
  • All changes audited for correctness and backward compatibility

🤖 Generated with Claude Code

…sive improvements

- Fix broken citation counting in cli/commands.py and utils/viewer.py:
  citations are dicts, not strings; handle both shapes with fallback logging
- Fix _print_entry_line rendering raw dicts instead of constraint IDs
- Reject VETO verdicts with empty constraint_citations in oracle validation
- Add early OPENROUTER_API_KEY validation before SDK client construction
- Move _accumulate_tokens outside else block for consistent token accounting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dburks-svg dburks-svg merged commit d24ac7f into main May 15, 2026
3 checks passed
@dburks-svg dburks-svg deleted the bench/deepseek-timeout-increase branch May 15, 2026 04:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant