Conversation
Add `finding_source: Literal["review", "synthesized"] | None = None` to ReviewFinding so callers can distinguish human review findings from findings synthesized from transcript tool failures. Default is None for backward compatibility with existing serialized data. Add comprehensive tests for the new field including roundtrip serialization.
Add AlcoveAdapter._synthesize_findings() that walks the tool sequence and creates ReviewFinding objects (finding_source='synthesized', severity='major', reviewer='synthesized') from every testing-phase tool call that ended in a failure. Also set finding_source='review' on all explicitly parsed findings from the JSON 'findings' array so consumers can distinguish the two kinds. Deduplication: same failure text across repeated test runs → 1 finding. Synthesis only runs when no explicit findings are present in the JSON.
…thesized findings Skip findings with finding_source='synthesized' in three metric computations: - knowledge_gap_rate._compute_with_doc_chunks(): synthesized findings contain raw tool output that matches doc chunks too broadly, polluting gap rate. - knowledge_miss_rate._compute_with_doc_chunks(): same reason, same guard. - self_correction_rate.compute(): synthesized findings are not actionable review feedback; excluding them from the denominator prevents repeated test failures from artificially lowering the correction rate. Severity metric intentionally unchanged: synthesized findings still count toward review_severity_distribution (real code quality signal). Knowledge metrics legacy path (context_source='synthesized') was already guarded since ticket #183; this completes the doc_chunks path.
…report CLI summary sentence (generate_summary_sentence): - When both review and synthesized findings exist, appends '(N synthesized from test failures)' as a parenthetical note. - When only synthesized findings exist, uses a distinct sentence 'N issue(s) synthesized from test/lint failures'. - Synthesized findings are excluded from the per-severity count so that 'Reviewers found X major issues' stays accurate. HTML report (report.html.j2): - Synthesized findings show a grey 'synthesized' badge beside the severity badge so they are visually distinguishable. - A footnote under the findings list explains what synthesized findings are and which metrics exclude them. - Reviewer name is suppressed for synthesized findings (shows 'synthesized' already via badge).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ReviewFindingobjects from Alcove/bridge session transcripts (test failures, lint errors)finding_source: Literal["review", "synthesized"] | Nonefield toReviewFindingmodelself_correction_rateexclude synthesized findings (different quality signal)alcove-with-failures.jsonfor end-to-end testingTest plan
uv run pytest tests/ -v -m "not slow"— 1164 passedfinding_sourcefield correctly set on all findingsself_correction_rateonly counts real review findingsRefs #186
🤖 Generated with SODA + Claude Code