Add JSON output format, structured blocks, and progress bar control#38
Conversation
- Add show_progress field to AnalysisConfig (defaults to True) - Pass disable=not show_progress to all tqdm calls in embedding backends and scorer - Add --quiet / -q CLI flag that sets show_progress=False - Useful for CI pipelines and library usage where progress bars add noise to output Co-authored-by: Cursor <cursoragent@cursor.com>
|
Warning Review limit reached
Your plan currently allows 2 reviews/hour. Refill in 19 minutes and 1 second. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more review capacity refills, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (11)
📝 WalkthroughWalkthroughThis PR adds output-format selection (JSON or XML) and a --quiet toggle for progress bars, introduces JsonFormatter, wires new fields into AnalysisConfig, updates SemanticLogAnalyzer to choose the formatter and populate AnalysisResult.blocks, and applies progress control across embedders and scorer. ChangesOutput Configuration and Result Structure
CLI Argument Integration
Progress Bar Visibility Control
JSON Output Formatting
Pipeline Integration and Unit Tests
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
b953b3e to
158354d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/test_cli.py (1)
75-86: ⚡ Quick winAdd parser coverage for
--quietbehavior.This file now covers
--format, but not the new--quiet/-qflag. Add tests for both default (False) and enabled (True) so CLI/config wiring regressions are caught early.Proposed patch
class TestParseArgs: @@ def test_format_default_xml(self, monkeypatch: pytest.MonkeyPatch) -> None: """Test that --format defaults to xml.""" monkeypatch.setattr(sys, "argv", ["cordon", "test.log"]) args = parse_args() assert args.output_format == "xml" + + def test_quiet_flag(self, monkeypatch: pytest.MonkeyPatch) -> None: + """Test --quiet / -q flag.""" + monkeypatch.setattr(sys, "argv", ["cordon", "--quiet", "test.log"]) + args = parse_args() + assert args.quiet is True + + def test_quiet_default_false(self, monkeypatch: pytest.MonkeyPatch) -> None: + """Test that --quiet defaults to False.""" + monkeypatch.setattr(sys, "argv", ["cordon", "test.log"]) + args = parse_args() + assert args.quiet is False🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_cli.py` around lines 75 - 86, Add two tests mirroring the existing format tests to cover the new --quiet/-q flag: use monkeypatch.setattr(sys, "argv", [...]) to simulate CLI invocation, call parse_args(), and assert args.quiet is False for the default case (e.g., ["cordon", "test.log"]) and True for the enabled cases (e.g., ["cordon", "--quiet", "test.log"] and optionally ["cordon", "-q", "test.log"]); reference the existing test functions test_format_json and test_format_default_xml and the parse_args() function to locate where to add these new tests.src/cordon/core/config.py (1)
93-94: ⚡ Quick winValidate new config fields in
__post_init__to fail fast.
output_formatandshow_progressare now public config inputs, but they are not validated. Add explicit checks so invalid values are rejected at construction time.Proposed patch
def _validate_core_params(self) -> None: """Validate core analysis parameters.""" @@ if self.scoring_batch_size is not None and self.scoring_batch_size < 1: raise ValueError("scoring_batch_size must be >= 1 or None for auto-detect") if self.device is not None and self.device not in ("cuda", "mps", "cpu"): raise ValueError("device must be 'cuda', 'mps', 'cpu', or None") + if self.output_format not in ("xml", "json"): + raise ValueError("output_format must be 'xml' or 'json'") + if not isinstance(self.show_progress, bool): + raise ValueError("show_progress must be a boolean")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cordon/core/config.py` around lines 93 - 94, Add validation inside the dataclass __post_init__ to fail fast: verify show_progress is a bool and output_format is one of the allowed literals ("xml","json"); if either check fails, raise a ValueError with a clear message. Update the __post_init__ method that runs after construction to perform these checks for the fields show_progress and output_format so invalid config values are rejected at instantiation time.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cordon/core/types.py`:
- Around line 92-94: The docstring for AnalysisResult.output currently describes
only XML; update it to be format-agnostic by changing the description to
something like "Formatted output string (e.g., XML or JSON) representing the
analysis results" so it no longer implies XML-only; modify the docstring on the
AnalysisResult.output attribute in src/cordon/core/types.py and ensure
neighboring descriptions (blocks, total_lines) remain unchanged.
---
Nitpick comments:
In `@src/cordon/core/config.py`:
- Around line 93-94: Add validation inside the dataclass __post_init__ to fail
fast: verify show_progress is a bool and output_format is one of the allowed
literals ("xml","json"); if either check fails, raise a ValueError with a clear
message. Update the __post_init__ method that runs after construction to perform
these checks for the fields show_progress and output_format so invalid config
values are rejected at instantiation time.
In `@tests/test_cli.py`:
- Around line 75-86: Add two tests mirroring the existing format tests to cover
the new --quiet/-q flag: use monkeypatch.setattr(sys, "argv", [...]) to simulate
CLI invocation, call parse_args(), and assert args.quiet is False for the
default case (e.g., ["cordon", "test.log"]) and True for the enabled cases
(e.g., ["cordon", "--quiet", "test.log"] and optionally ["cordon", "-q",
"test.log"]); reference the existing test functions test_format_json and
test_format_default_xml and the parse_args() function to locate where to add
these new tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9419a413-1fb7-4df3-a831-717c2ec662a9
📒 Files selected for processing (16)
benchmark/evaluate.pysrc/cordon/analysis/scorer.pysrc/cordon/cli.pysrc/cordon/core/config.pysrc/cordon/core/types.pysrc/cordon/embedding/llama_cpp.pysrc/cordon/embedding/remote.pysrc/cordon/embedding/transformer.pysrc/cordon/pipeline.pysrc/cordon/postprocess/__init__.pysrc/cordon/postprocess/json_formatter.pytests/test_cli.pytests/test_core.pytests/test_integration.pytests/test_json_formatter.pytests/test_pipeline_unit.py
| output: Formatted output string with XML tags. | ||
| blocks: Structured list of merged anomaly blocks. | ||
| total_lines: Total number of lines in the input file. |
There was a problem hiding this comment.
Update AnalysisResult.output docstring to reflect JSON support.
output is documented as XML-only, but this PR introduces JSON output too. Please make the attribute description format-agnostic.
Proposed patch
- output: Formatted output string with XML tags.
+ output: Formatted output string (XML or JSON, depending on configuration).
blocks: Structured list of merged anomaly blocks.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| output: Formatted output string with XML tags. | |
| blocks: Structured list of merged anomaly blocks. | |
| total_lines: Total number of lines in the input file. | |
| output: Formatted output string (XML or JSON, depending on configuration). | |
| blocks: Structured list of merged anomaly blocks. | |
| total_lines: Total number of lines in the input file. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/cordon/core/types.py` around lines 92 - 94, The docstring for
AnalysisResult.output currently describes only XML; update it to be
format-agnostic by changing the description to something like "Formatted output
string (e.g., XML or JSON) representing the analysis results" so it no longer
implies XML-only; modify the docstring on the AnalysisResult.output attribute in
src/cordon/core/types.py and ensure neighboring descriptions (blocks,
total_lines) remain unchanged.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/cordon/core/config.py (1)
94-119:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd runtime validation for
output_formatinAnalysisConfig.Line 94 introduces constrained values, but invalid strings can still be passed at runtime (type hints alone won’t enforce this). Please fail fast in
_validate_core_paramsto keep behavior consistent with other validated enum-like fields.Proposed patch
def _validate_core_params(self) -> None: """Validate core analysis parameters.""" @@ if self.device is not None and self.device not in ("cuda", "mps", "cpu"): raise ValueError("device must be 'cuda', 'mps', 'cpu', or None") + if self.output_format not in ("xml", "json"): + raise ValueError("output_format must be 'xml' or 'json'")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cordon/core/config.py` around lines 94 - 119, The output_format field on AnalysisConfig isn't being validated at runtime; update the _validate_core_params method to check that self.output_format is either "xml" or "json" and raise a ValueError with a clear message if not. Locate the validation logic inside AnalysisConfig::_validate_core_params and add a check similar to the other field validations (e.g., if self.output_format not in ("xml", "json"): raise ValueError(...)) so invalid strings fail fast.src/cordon/pipeline.py (1)
79-79:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate docstring to reflect JSON output support.
The docstring states "Formatted string with XML-tagged significant blocks" but the method can now return JSON-formatted output when
output_format="json". Consider updating to "Formatted string with significant blocks" or "Formatted string with significant blocks (XML or JSON format)".📝 Suggested docstring update
- Formatted string with XML-tagged significant blocks. + Formatted string with significant blocks (XML or JSON format).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cordon/pipeline.py` at line 79, Update the docstring that currently reads "Formatted string with XML-tagged significant blocks." to reflect JSON support; mention the output_format parameter (e.g., "Formatted string with significant blocks (XML or JSON format)" or "Formatted string with significant blocks") in the docstring for the function/method that returns formatted output and accepts output_format, so callers see that JSON output is supported.
🧹 Nitpick comments (1)
tests/test_cli.py (1)
75-86: ⚡ Quick winAdd a parsing test for
--quietto cover progress-toggle CLI behavior.This file now covers
--format; adding one--quietassertion would round out coverage for the paired CLI feature in this PR.Proposed patch
class TestParseArgs: @@ def test_format_default_xml(self, monkeypatch: pytest.MonkeyPatch) -> None: """Test that --format defaults to xml.""" monkeypatch.setattr(sys, "argv", ["cordon", "test.log"]) args = parse_args() assert args.output_format == "xml" + + def test_quiet_flag(self, monkeypatch: pytest.MonkeyPatch) -> None: + """Test --quiet flag.""" + monkeypatch.setattr(sys, "argv", ["cordon", "--quiet", "test.log"]) + args = parse_args() + assert args.quiet is True🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_cli.py` around lines 75 - 86, Add a new unit test in tests/test_cli.py that verifies the --quiet flag is parsed: create a test function (e.g. test_quiet_flag) that monkeypatches sys.argv to ["cordon", "--quiet", "test.log"], calls parse_args(), and asserts the resulting namespace has args.quiet == True (or the appropriate attribute indicating quiet/progress off). Place it alongside the existing test_format_* tests and reuse monkeypatch and parse_args to keep style consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/cordon/core/config.py`:
- Around line 94-119: The output_format field on AnalysisConfig isn't being
validated at runtime; update the _validate_core_params method to check that
self.output_format is either "xml" or "json" and raise a ValueError with a clear
message if not. Locate the validation logic inside
AnalysisConfig::_validate_core_params and add a check similar to the other field
validations (e.g., if self.output_format not in ("xml", "json"): raise
ValueError(...)) so invalid strings fail fast.
In `@src/cordon/pipeline.py`:
- Line 79: Update the docstring that currently reads "Formatted string with
XML-tagged significant blocks." to reflect JSON support; mention the
output_format parameter (e.g., "Formatted string with significant blocks (XML or
JSON format)" or "Formatted string with significant blocks") in the docstring
for the function/method that returns formatted output and accepts output_format,
so callers see that JSON output is supported.
---
Nitpick comments:
In `@tests/test_cli.py`:
- Around line 75-86: Add a new unit test in tests/test_cli.py that verifies the
--quiet flag is parsed: create a test function (e.g. test_quiet_flag) that
monkeypatches sys.argv to ["cordon", "--quiet", "test.log"], calls parse_args(),
and asserts the resulting namespace has args.quiet == True (or the appropriate
attribute indicating quiet/progress off). Place it alongside the existing
test_format_* tests and reuse monkeypatch and parse_args to keep style
consistent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 52cc56fc-bf49-46bc-affd-83fc6313c994
📒 Files selected for processing (11)
src/cordon/cli.pysrc/cordon/core/config.pysrc/cordon/core/types.pysrc/cordon/pipeline.pysrc/cordon/postprocess/__init__.pysrc/cordon/postprocess/json_formatter.pytests/test_cli.pytests/test_core.pytests/test_integration.pytests/test_json_formatter.pytests/test_pipeline_unit.py
- Add blocks field to AnalysisResult for direct structured access to anomaly data without parsing formatted output - Create JsonFormatter producing JSON with anomalies array - Add output_format config field and --format CLI flag - Pipeline selects formatter based on config when none injected - Add JSON formatter tests and update existing tests Co-authored-by: Cursor <cursoragent@cursor.com>
158354d to
958f2db
Compare
|



Summary
show_progressconfig option (bool, defaultTrue) to control tqdm progress bars across all embedding backends and scorer--quiet/-qCLI flag that setsshow_progress=Falseoutput_formatconfig option (Literal["xml", "json"], default"xml")--format {xml,json}CLI flagJsonFormatterproducing JSON with ananomaliesarray (start_line, end_line, score, content)blocks: list[MergedBlock]field toAnalysisResultfor structured access to anomaly data without parsing formatted outputoutput_formatwhen none is injectedSummary by CodeRabbit
New Features
--formatCLI option (xml|json) and--quietto suppress progress display--formatBehavior
--quietand are now consistently respected across embedding and scoringTests