feat: corroboration scoring with diff-size correction (#432) by justn-hyeok · Pull Request #437 · bssm-oss/CodeAgora

justn-hyeok · 2026-04-01T07:53:51Z

Summary

Single-reviewer penalty: Findings reported by only 1 out of N reviewers (N>=3) get confidence reduced by 0.5x (small diffs) or 0.7x (large diffs >500 lines), targeting likely hallucinations
Triple+ corroboration boost: Findings confirmed by 3+ reviewers get a 1.2x confidence boost (capped at 100)
Diff-size correction: Large diffs are treated more leniently for single-reviewer findings since they may contain legitimate unique issues

Changes

packages/core/src/pipeline/confidence.ts — Extended computeL1Confidence with corroboration penalty/boost logic and optional totalDiffLines parameter
packages/core/src/pipeline/orchestrator.ts — Pass totalDiffLines (from filtered diff content) to computeL1Confidence
packages/core/src/tests/parser-bilingual.test.ts — Added 6 new test cases covering all corroboration scoring scenarios

Test plan

Single reviewer (1/5), small diff: confidence x 0.5
Single reviewer (1/5), large diff (>500 lines): confidence x 0.7
Triple corroboration (3/5): confidence x 1.2
All reviewers agree (5/5): confidence x 1.2 (capped at 100)
2 reviewers agree: no penalty/boost (middle ground)
totalReviewers < 3: no penalty (not enough data)
All 26 tests pass (20 existing + 6 new)

Closes #432

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Enhanced confidence scoring to account for reviewer agreement and corroboration levels.
- Penalties applied for single-reviewer scenarios; boosts applied when consensus is strong (3+ reviewers).
Tests
- Added comprehensive test coverage for the updated confidence-scoring logic with various reviewer agreement scenarios.

Single-reviewer findings (1/N) get confidence penalty: - Small diff: × 0.5 (high hallucination probability) - Large diff (>500 lines): × 0.7 (may be legitimate) Triple+ corroboration (3+/N) gets × 1.2 boost. This is the final layer of the 4-layer hallucination filter, strengthening the signal that MAD's majority voting provides. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-01T07:54:09Z

📝 Walkthrough

Walkthrough

This PR implements Layer 2 corroboration scoring with diff-size correction for the confidence calculation system. The computeL1Confidence function now accepts an optional totalDiffLines parameter to apply penalties for single-reviewer agreement (scaled by diff size) and boosts for multi-reviewer agreement, with the result clamped to [0, 100].

Changes

Cohort / File(s)	Summary
Confidence Computation Logic `packages/core/src/pipeline/confidence.ts`	Updated `computeL1Confidence` to accept optional `totalDiffLines` parameter. Added corroboration scoring: penalizes confidence (×0.5 or ×0.7 based on diff size) when 1 reviewer agrees and ≥3 total reviewers exist; boosts confidence (×1.2, capped at 100) when ≥3 reviewers agree. Final value clamped to [0, 100].
Orchestrator Integration `packages/core/src/pipeline/orchestrator.ts`	Updated `runPipeline` to compute `totalDiffLines` from filtered diff content and pass it as an argument to `computeL1Confidence` for non-rule evidence documents.
Test Coverage `packages/core/src/tests/parser-bilingual.test.ts`	Added comprehensive test suite ("corroboration scoring (`#432`)") validating penalty application for single-reviewer scenarios (small vs. large diffs), boost logic for 3+ agreers, boundary conditions when totalReviewers < 3, and mid-level agreement scenarios.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat: add pre-debate hallucination filter (#428) #433: Modifies the same runPipeline function in orchestrator.ts with a hallucination-filter step that directly interacts with the L1 confidence computation changes introduced here.

Suggested labels

size/M

Poem

🐰 Whiskers twitch with glee,
Three reviewers now agree,
Confidence blooms bright,
Corroboration's might,
Large diffs show their honesty! 🌟

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: implementing corroboration scoring with diff-size correction as described in issue `#432`.
Linked Issues check	✅ Passed	All coding requirements from issue `#432` are met: confidence adjustments for 1/N (×0.5/×0.7), 2/N (no change), 3+/N (×1.2), diff-size correction, and test coverage.
Out of Scope Changes check	✅ Passed	All changes align with issue `#432` scope: confidence computation logic, orchestrator integration, and comprehensive test coverage with no extraneous modifications.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/corroboration-scoring-432

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codeagora-bot

CodeAgora Review

📋 Triage: 3 verify · 3 ignore

Verdict: ✅ ACCEPT · 1 critical · 6 warning

The only flagged issue (d001) was unanimously dismissed by the reviewers after discussion, leaving zero unresolved or confirmed problems of any severity. With no CRITICAL/HARSHLY_CRITICAL findings remaining and no escalated disagreements, the change has been vetted and deemed safe to merge.

Blocking Issues

Severity	File	Line	Issue	Confidence
🔴 CRITICAL	`packages/core/src/pipeline/confidence.ts`	15–48	Inconsistent Corroboration Boost	🟡 40%

5 warning(s)

Severity	File	Line	Issue	Confidence
🟡 WARNING	`packages/core/src/pipeline/confidence.ts`	21	Potential Division by Zero Error	🟡 60%
🟡 WARNING	`packages/core/src/pipeline/confidence.ts`	37	Lack of Input Validation for `totalDiffLines`	🔴 38%
🟡 WARNING	`packages/core/src/pipeline/confidence.ts`	20	Potential division by zero in computeL1Confidence	🟡 45%
🟡 WARNING	`packages/core/src/pipeline/confidence.ts`	26	Potential loss of precision in computeL1Confidence	🔴 34%
🟡 WARNING	`packages/core/src/pipeline/orchestrator.ts`	753	Missing error handling in runPipeline	🟡 56%

Issue distribution (2 file(s))

File	Issues
`packages/core/src/pipeline/confidence.ts`	████████████ 5
`packages/core/src/pipeline/orchestrator.ts`	██ 1

Agent consensus log (1 discussion(s))

✅ d001 — 1 round(s), consensus → DISMISSED

Verdict: DISMISSED — Majority rejected (2/3 disagree)

_{CodeAgora · Session: 2026-04-01/001}

codeagora-bot · 2026-04-01T07:54:19Z