feat: self-reflection FP filtering, indicatif progress bars, 4 new languages#2
feat: self-reflection FP filtering, indicatif progress bars, 4 new languages#2
Conversation
After the initial LLM review generates comments, a second LLM pass evaluates each comment against the original diff, scoring it 1-10 for relevance and accuracy. Comments scoring below the threshold (default: 7) are filtered out. This directly addresses the #1 user complaint: noise. Changes: - New config fields: self_reflection (bool, default: true) and self_reflection_score_threshold (u8, default: 7) in ReviewConfig - New build_self_reflection_prompt() and parse_self_reflection_response() in prompt.rs with comprehensive tests - Self-reflection step integrated in pipeline between deduplication and final filtering - CLI flag: --no-self-reflection to skip the second pass - Stats: comments_reflected_out tracked and displayed in all output formats (text, markdown, JSON, verbose) - Graceful fallback: if self-reflection LLM call fails, all comments are kept (no data loss) - Severity can be revised by self-reflection (e.g., downgrade bug to suggestion if the LLM judges it less severe) Inspired by PR-Agent's score_mechanism and BugBot's 70%+ fix rate achieved through aggressive false positive filtering. Entire-Checkpoint: b934bccc5ebd
Replace raw eprintln! progress output with proper indicatif progress bars and spinners for a polished CLI experience. Changes: - Add indicatif 0.17 dependency to workspace and argus-review - Multi-group reviews show a progress bar with group count + elapsed time - Each group gets a spinner showing its label while the LLM processes - Single-group reviews show a spinner with elapsed time - Self-reflection and summary generation get their own spinners - All spinners finish with a result message (e.g., '→ 3 comments') - Non-TTY mode: no output (clean for CI/piped usage) - Helper function make_spinner() centralizes TTY-aware spinner creation Entire-Checkpoint: b934bccc5ebd
Extend language support from 5 languages (Rust, Python, TypeScript, JavaScript, Go) to 9 languages by adding Java, C, C++, and Ruby. Changes: - Add tree-sitter-java, tree-sitter-c, tree-sitter-cpp, tree-sitter-ruby dependencies to workspace and argus-repomap - Language enum: new Java, C, Cpp, Ruby variants - Extension mapping: .java, .c, .h, .cpp, .cc, .cxx, .hpp, .hxx, .hh, .rb - Symbol extraction (parser.rs): new collectors for each language - Java: classes, interfaces, enums, methods, constructors - C: functions, structs, enums (via function_declarator for names) - C++: classes, structs, enums, methods, functions - Ruby: classes, modules, methods - Code chunking (chunker.rs): matching chunk collectors for codelens semantic search on the new languages - 4 new parser tests + updated walker test covering all 9 languages - Total test count: 362 (up from 275) Entire-Checkpoint: b934bccc5ebd
📝 WalkthroughWalkthroughThis PR implements three major enhancements to the Argus code review system: self-reflection (a second LLM pass scoring and filtering comments), progress bars for visual feedback, and support for Java, C, C++, and Ruby languages. Changes span dependency management, language detection, symbol parsing, LLM prompting, and the review pipeline orchestration. Changes
Sequence DiagramsequenceDiagram
participant Pipeline as Review Pipeline
participant Collector as Comment Collector
participant Deduplicator as Deduplicator
participant LLM as LLM Service
participant Parser as Response Parser
participant Filter as Filter/Sort
Collector->>Pipeline: initial comments
Pipeline->>Deduplicator: deduplicated comments
Deduplicator->>Pipeline: comments (deduplicated)
Pipeline->>LLM: self_reflection_prompt(comments, diff)
LLM->>LLM: evaluate each comment
LLM->>Parser: JSON response (scores, severities)
Parser->>Parser: parse & clamp scores
Parser->>Parser: map revised_severity
Parser->>Pipeline: (index, score, severity) tuples
Pipeline->>Pipeline: filter by threshold
Pipeline->>Filter: reflected comments
Filter->>Pipeline: final sorted comments
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@crates/argus-repomap/src/parser.rs`:
- Around line 722-754: The parser currently treats "struct_specifier" like
"enum_specifier" and doesn't set inside_class when recursing, so methods
declared inside C++ structs are misclassified; update the collect_cpp_symbols
handling for "struct_specifier" to mirror "class_specifier": after pushing the
Struct Symbol (use the same name/sig logic) recurse into the struct body by
calling collect_cpp_symbols with inside_class: true and return early (like
class_specifier does) to prevent the generic fall-through recursion from using
the parent's inside_class value.
🧹 Nitpick comments (5)
crates/argus-codelens/src/chunker.rs (2)
532-602: Java interface/enum declarations don't recurse into children for method extraction.
class_declarationcorrectly recurses to extract nested methods, butinterface_declarationandenum_declarationdo not. In Java, interfaces can have default methods (Java 8+) and enums can have methods — those won't be individually chunked.This is fine for initial support but worth noting for a follow-up.
♻️ Potential improvement for interface_declaration
"interface_declaration" => { - if let Some(name) = find_child_text(&node, "identifier", source) { + let name = find_child_text(&node, "identifier", source); + if let Some(name) = &name { chunks.push(make_chunk( file_path, &node, source, - &name, + name, "interface", language, scope, )); } + let scope_name = name.as_deref(); + let mut cursor = node.walk(); + for child in node.children(&mut cursor) { + collect_java_chunks(child, source, file_path, language, scope_name, chunks); + } + return; }
604-671: C++ struct methods won't be individually chunked.
class_specifierrecurses into children (Line 644-648) to extract nested methods, butstruct_specifierdoes not. In C++, structs can contain methods just like classes. For C-only usage this is fine, but since this collector handles both C and C++, C++ struct methods will be missed.Low priority since the struct itself is still chunked as a whole.
crates/argus-review/src/prompt.rs (1)
361-390: All-or-nothing deserialization could discard valid evaluations if one entry has an out-of-range score.
SelfReflectionEval.scoreisu8, so if the LLM returns a score like-1or256, serde will fail to deserialize the entire response, discarding all evaluations. Since the pipeline retains all comments when self-reflection fails (graceful fallback), this is safe but could reduce the effectiveness of self-reflection.Consider using a more permissive type (e.g.,
i32orserde_json::Value) for thescorefield and clamping/validating per-entry, so one malformed evaluation doesn't lose all the others.♻️ More resilient score parsing
#[derive(Deserialize)] struct SelfReflectionEval { index: usize, - score: u8, + score: serde_json::Value, #[allow(dead_code)] reason: Option<String>, revised_severity: Option<String>, }Then in
parse_self_reflection_response:for eval in parsed.evaluations { - let score = eval.score.min(10); + let Some(score_raw) = eval.score.as_u64().or_else(|| eval.score.as_i64().map(|v| v.max(0) as u64)) else { + continue; // skip this entry + }; + let score = (score_raw.min(10)) as u8;crates/argus-review/src/pipeline.rs (1)
268-285:MultiProgressallocated unconditionally even in non-TTY environments.
MultiProgress::new()is created on line 268 before the TTY check. While harmless (no bars are added in non-TTY), moving it inside theis_ttybranch would be slightly cleaner. Low priority.♻️ Optional: gate MultiProgress on TTY
- let mp = MultiProgress::new(); - - let main_pb = if is_tty { + let mp = if is_tty { + Some(MultiProgress::new()) + } else { + None + }; + + let main_pb = if let Some(mp) = &mp { let file_count: usize = groups.iter().map(|g| g.len()).sum(); let pb = mp.add(ProgressBar::new(group_count as u64));(This ripples into
group_pbcreation — only worthwhile if you want zero allocation in CI.)crates/argus-repomap/src/parser.rs (1)
1256-1294: C++ test doesn't cover methods inside structs.Since C++ structs can have methods, consider adding a test case with a method inside
Pointto catch theinside_classclassification issue noted above.💡 Suggested additional test coverage
struct Point { - int x, y; + int x, y; + double distance() { return 0.0; } };Then assert:
let distance = symbols.iter().find(|s| s.name == "distance").unwrap(); assert_eq!(distance.kind, SymbolKind::Method);
| "struct_specifier" => { | ||
| if let Some(name) = find_child_text(&node, "type_identifier", source) { | ||
| let sig = extract_signature(&node, source); | ||
| symbols.push(Symbol { | ||
| name, | ||
| kind: SymbolKind::Struct, | ||
| file: file.clone(), | ||
| line: node.start_position().row as u32 + 1, | ||
| token_cost: sig.len() / 4, | ||
| signature: sig, | ||
| }); | ||
| } | ||
| } | ||
| "enum_specifier" => { | ||
| if let Some(name) = find_child_text(&node, "type_identifier", source) { | ||
| let sig = extract_signature(&node, source); | ||
| symbols.push(Symbol { | ||
| name, | ||
| kind: SymbolKind::Enum, | ||
| file: file.clone(), | ||
| line: node.start_position().row as u32 + 1, | ||
| token_cost: sig.len() / 4, | ||
| signature: sig, | ||
| }); | ||
| } | ||
| } | ||
| _ => {} | ||
| } | ||
|
|
||
| let mut cursor = node.walk(); | ||
| for child in node.children(&mut cursor) { | ||
| collect_cpp_symbols(child, source, file, inside_class, symbols); | ||
| } |
There was a problem hiding this comment.
C++ struct_specifier does not set inside_class = true for its body, misclassifying struct methods.
Unlike class_specifier (line 704) which recurses with inside_class: true and returns early, struct_specifier falls through to the general recursion at lines 751–754 with the parent's inside_class value. In C++, structs can contain methods identical to class methods, so methods defined inside a struct will be incorrectly classified as SymbolKind::Function.
🐛 Proposed fix: treat struct body the same as class body
"struct_specifier" => {
if let Some(name) = find_child_text(&node, "type_identifier", source) {
let sig = extract_signature(&node, source);
symbols.push(Symbol {
name,
kind: SymbolKind::Struct,
file: file.clone(),
line: node.start_position().row as u32 + 1,
token_cost: sig.len() / 4,
signature: sig,
});
}
+ let mut cursor = node.walk();
+ for child in node.children(&mut cursor) {
+ collect_cpp_symbols(child, source, file, true, symbols);
+ }
+ return;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "struct_specifier" => { | |
| if let Some(name) = find_child_text(&node, "type_identifier", source) { | |
| let sig = extract_signature(&node, source); | |
| symbols.push(Symbol { | |
| name, | |
| kind: SymbolKind::Struct, | |
| file: file.clone(), | |
| line: node.start_position().row as u32 + 1, | |
| token_cost: sig.len() / 4, | |
| signature: sig, | |
| }); | |
| } | |
| } | |
| "enum_specifier" => { | |
| if let Some(name) = find_child_text(&node, "type_identifier", source) { | |
| let sig = extract_signature(&node, source); | |
| symbols.push(Symbol { | |
| name, | |
| kind: SymbolKind::Enum, | |
| file: file.clone(), | |
| line: node.start_position().row as u32 + 1, | |
| token_cost: sig.len() / 4, | |
| signature: sig, | |
| }); | |
| } | |
| } | |
| _ => {} | |
| } | |
| let mut cursor = node.walk(); | |
| for child in node.children(&mut cursor) { | |
| collect_cpp_symbols(child, source, file, inside_class, symbols); | |
| } | |
| "struct_specifier" => { | |
| if let Some(name) = find_child_text(&node, "type_identifier", source) { | |
| let sig = extract_signature(&node, source); | |
| symbols.push(Symbol { | |
| name, | |
| kind: SymbolKind::Struct, | |
| file: file.clone(), | |
| line: node.start_position().row as u32 + 1, | |
| token_cost: sig.len() / 4, | |
| signature: sig, | |
| }); | |
| } | |
| let mut cursor = node.walk(); | |
| for child in node.children(&mut cursor) { | |
| collect_cpp_symbols(child, source, file, true, symbols); | |
| } | |
| return; | |
| } | |
| "enum_specifier" => { | |
| if let Some(name) = find_child_text(&node, "type_identifier", source) { | |
| let sig = extract_signature(&node, source); | |
| symbols.push(Symbol { | |
| name, | |
| kind: SymbolKind::Enum, | |
| file: file.clone(), | |
| line: node.start_position().row as u32 + 1, | |
| token_cost: sig.len() / 4, | |
| signature: sig, | |
| }); | |
| } | |
| } | |
| _ => {} | |
| } | |
| let mut cursor = node.walk(); | |
| for child in node.children(&mut cursor) { | |
| collect_cpp_symbols(child, source, file, inside_class, symbols); | |
| } |
🤖 Prompt for AI Agents
In `@crates/argus-repomap/src/parser.rs` around lines 722 - 754, The parser
currently treats "struct_specifier" like "enum_specifier" and doesn't set
inside_class when recursing, so methods declared inside C++ structs are
misclassified; update the collect_cpp_symbols handling for "struct_specifier" to
mirror "class_specifier": after pushing the Struct Symbol (use the same name/sig
logic) recurse into the struct body by calling collect_cpp_symbols with
inside_class: true and return early (like class_specifier does) to prevent the
generic fall-through recursion from using the parent's inside_class value.
There was a problem hiding this comment.
Pull request overview
This PR implements three significant features for the Argus code review tool: self-reflection false positive filtering, interactive progress bars using indicatif, and tree-sitter support for 4 new programming languages (Java, C, C++, Ruby). The self-reflection feature addresses the #1 user complaint about noisy AI-generated comments by adding a second LLM evaluation pass that scores and filters low-quality suggestions. The changes expand language support from 5 to 9 languages and significantly improve user experience with professional progress indicators.
Changes:
- Self-reflection filtering system with configurable threshold (default: 7/10) that performs a second LLM pass to evaluate and filter generated review comments, with graceful error fallback
- Interactive progress bars using indicatif library with TTY-aware display, showing elapsed time for LLM operations with multi-group progress tracking
- Tree-sitter support for Java, C, C++, and Ruby including language detection, symbol extraction, and code chunking
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/main.rs | Added CLI flag --no-self-reflection, updated config template with self-reflection settings, and enhanced stats output to include reflected-out comments count |
| crates/argus-review/src/prompt.rs | Implemented self-reflection prompt builder and response parser with comprehensive unit tests for scoring and severity revision |
| crates/argus-review/src/pipeline.rs | Integrated self-reflection pass in review pipeline between deduplication and filtering, replaced eprintln with indicatif progress bars, and added comments_reflected_out stats tracking |
| crates/argus-review/src/sarif.rs | Updated SARIF output to include comments_reflected_out field in stats |
| crates/argus-review/Cargo.toml | Added indicatif dependency for progress bars |
| crates/argus-core/src/config.rs | Added self_reflection and self_reflection_score_threshold config fields with defaults (true, 7) |
| crates/argus-repomap/src/walker.rs | Extended Language enum with Java, C, Cpp, Ruby variants and added extension mappings |
| crates/argus-repomap/src/parser.rs | Implemented symbol collection functions for Java, C, C++, Ruby with comprehensive tests for each language |
| crates/argus-codelens/src/chunker.rs | Implemented code chunking functions for Java, C, C++, Ruby to extract functions, classes, and methods |
| crates/argus-repomap/Cargo.toml | Added tree-sitter dependencies for Java, C, C++, Ruby |
| Cargo.toml | Added workspace dependencies for tree-sitter-java, tree-sitter-c, tree-sitter-cpp, tree-sitter-ruby, and indicatif |
| Cargo.lock | Updated with new dependencies including indicatif, console, portable-atomic, number_prefix, and new tree-sitter language grammars |
| PLAN.md | Added sprint planning document describing the three features and implementation approach |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| [review.noise] | ||
| # skip_patterns = ["*.lock", "*.min.js", "vendor/**"] | ||
| # min_confidence = 90 | ||
| # include_suggestions = false | ||
| # self_reflection = true | ||
| # self_reflection_score_threshold = 7 |
There was a problem hiding this comment.
The config template incorrectly shows self_reflection settings under a [review.noise] section that doesn't exist in the actual config structure. According to the ReviewConfig struct in argus-core/src/config.rs, all fields including self_reflection, self_reflection_score_threshold, skip_patterns, min_confidence, and include_suggestions should be directly under [review], not under a nested [review.noise] section. This will cause user confusion when they try to configure these settings.
…nguages (#2) * feat(review): add self-reflection pass to filter false positives After the initial LLM review generates comments, a second LLM pass evaluates each comment against the original diff, scoring it 1-10 for relevance and accuracy. Comments scoring below the threshold (default: 7) are filtered out. This directly addresses the #1 user complaint: noise. Changes: - New config fields: self_reflection (bool, default: true) and self_reflection_score_threshold (u8, default: 7) in ReviewConfig - New build_self_reflection_prompt() and parse_self_reflection_response() in prompt.rs with comprehensive tests - Self-reflection step integrated in pipeline between deduplication and final filtering - CLI flag: --no-self-reflection to skip the second pass - Stats: comments_reflected_out tracked and displayed in all output formats (text, markdown, JSON, verbose) - Graceful fallback: if self-reflection LLM call fails, all comments are kept (no data loss) - Severity can be revised by self-reflection (e.g., downgrade bug to suggestion if the LLM judges it less severe) Inspired by PR-Agent's score_mechanism and BugBot's 70%+ fix rate achieved through aggressive false positive filtering. Entire-Checkpoint: b934bccc5ebd * feat(review): replace eprintln progress with indicatif spinners Replace raw eprintln! progress output with proper indicatif progress bars and spinners for a polished CLI experience. Changes: - Add indicatif 0.17 dependency to workspace and argus-review - Multi-group reviews show a progress bar with group count + elapsed time - Each group gets a spinner showing its label while the LLM processes - Single-group reviews show a spinner with elapsed time - Self-reflection and summary generation get their own spinners - All spinners finish with a result message (e.g., '→ 3 comments') - Non-TTY mode: no output (clean for CI/piped usage) - Helper function make_spinner() centralizes TTY-aware spinner creation Entire-Checkpoint: b934bccc5ebd * feat(repomap): add tree-sitter support for Java, C, C++, and Ruby Extend language support from 5 languages (Rust, Python, TypeScript, JavaScript, Go) to 9 languages by adding Java, C, C++, and Ruby. Changes: - Add tree-sitter-java, tree-sitter-c, tree-sitter-cpp, tree-sitter-ruby dependencies to workspace and argus-repomap - Language enum: new Java, C, Cpp, Ruby variants - Extension mapping: .java, .c, .h, .cpp, .cc, .cxx, .hpp, .hxx, .hh, .rb - Symbol extraction (parser.rs): new collectors for each language - Java: classes, interfaces, enums, methods, constructors - C: functions, structs, enums (via function_declarator for names) - C++: classes, structs, enums, methods, functions - Ruby: classes, modules, methods - Code chunking (chunker.rs): matching chunk collectors for codelens semantic search on the new languages - 4 new parser tests + updated walker test covering all 9 languages - Total test count: 362 (up from 275) Entire-Checkpoint: b934bccc5ebd --------- Co-authored-by: Meru Patel <meru@meru.page>
Sprint 2026-02-16 — Three Features
1. Self-Reflection / False Positive Filtering (
875a826)After the initial LLM review, a second LLM pass evaluates each comment (scoring 1-10), filtering out low-quality ones. This directly addresses the #1 user complaint: noise.
self_reflection = true(default),self_reflection_score_threshold = 7--no-self-reflectionto skip2. Progress Bars with indicatif (
60f2bd3)Replace raw
eprintln!with proper spinners and progress bars.3. Tree-sitter: Java, C, C++, Ruby (
f2cb07a)Language support: 5 → 9 languages.
Tests
362 tests passing (up from ~275). All new features have comprehensive unit tests.
Summary by CodeRabbit
Release Notes
New Features
--no-self-reflectionCLI flag to disable filteringConfiguration
self_reflection(enabled by default) andself_reflection_score_threshold(default: 7)