Add JSON output format, structured blocks, and progress bar control by calebevans · Pull Request #38 · calebevans/cordon

calebevans · 2026-05-23T18:22:21Z

Summary

Add show_progress config option (bool, default True) to control tqdm progress bars across all embedding backends and scorer
Add --quiet / -q CLI flag that sets show_progress=False
Add output_format config option (Literal["xml", "json"], default "xml")
Add --format {xml,json} CLI flag
Create JsonFormatter producing JSON with an anomalies array (start_line, end_line, score, content)
Add blocks: list[MergedBlock] field to AnalysisResult for structured access to anomaly data without parsing formatted output
Pipeline selects formatter automatically based on output_format when none is injected

Summary by CodeRabbit

New Features
- Added --format CLI option (xml|json) and --quiet to suppress progress display
- New JSON output formatter and automatic formatter selection based on --format
- Analysis results now include a materialized list of anomaly blocks in detailed output
Behavior
- Progress bars can be disabled via configuration/--quiet and are now consistently respected across embedding and scoring
Tests
- Added tests for CLI format, JSON formatter, and blocks presence in results

- Add show_progress field to AnalysisConfig (defaults to True) - Pass disable=not show_progress to all tqdm calls in embedding backends and scorer - Add --quiet / -q CLI flag that sets show_progress=False - Useful for CI pipelines and library usage where progress bars add noise to output Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai · 2026-05-23T18:22:32Z

Warning

Review limit reached

@calebevans, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 2 reviews/hour. Refill in 19 minutes and 1 second.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 40731634-22ea-42c8-99b3-779a6730bcaf

📥 Commits

Reviewing files that changed from the base of the PR and between 158354d and 958f2db.

📒 Files selected for processing (11)

src/cordon/cli.py
src/cordon/core/config.py
src/cordon/core/types.py
src/cordon/pipeline.py
src/cordon/postprocess/__init__.py
src/cordon/postprocess/json_formatter.py
tests/test_cli.py
tests/test_core.py
tests/test_integration.py
tests/test_json_formatter.py
tests/test_pipeline_unit.py

📝 Walkthrough

Walkthrough

This PR adds output-format selection (JSON or XML) and a --quiet toggle for progress bars, introduces JsonFormatter, wires new fields into AnalysisConfig, updates SemanticLogAnalyzer to choose the formatter and populate AnalysisResult.blocks, and applies progress control across embedders and scorer.

Changes

Output Configuration and Result Structure

Layer / File(s)	Summary
Configuration and Result `src/cordon/core/config.py`, `src/cordon/core/types.py`	`AnalysisConfig` gains `show_progress: bool = True` and `output_format: Literal["xml", "json"] = "xml"`; `AnalysisResult` gains `blocks: list[MergedBlock]`.
Configuration and Result Tests `tests/test_core.py`	Tests added to verify `output_format` defaults to `"xml"` and accepts `"json"`; `TestAnalysisResult` validates `blocks` storage and empty list handling.

CLI Argument Integration

Layer / File(s)	Summary
CLI Arguments and Configuration Wiring `src/cordon/cli.py`, `tests/test_cli.py`	CLI parser adds `--format` (xml/json, default xml) and `--quiet`/`-q`; CLI wiring passes `show_progress=not args.quiet` and `output_format=args.output_format` into `AnalysisConfig`; tests verify parsing.

Progress Bar Visibility Control

Layer / File(s)	Summary
Embedder Progress Bar Control `src/cordon/embedding/llama_cpp.py`, `src/cordon/embedding/remote.py`, `src/cordon/embedding/transformer.py`	Each embedder’s batch tqdm call now uses `disable=not self.config.show_progress`.
Scorer and Benchmark Progress Control `src/cordon/analysis/scorer.py`, `benchmark/evaluate.py`	`DensityAnomalyScorer` scoring loop uses `disable=not config.show_progress`; benchmark `run_cordon_analysis(return_intermediates=True)` now returns embeddings built from embedder outputs (`embedded`) instead of scored windows.

JSON Output Formatting

Layer / File(s)	Summary
JsonFormatter Implementation and Export `src/cordon/postprocess/json_formatter.py`, `src/cordon/postprocess/__init__.py`	Adds `JsonFormatter.format_blocks()` that serializes merged blocks to indented JSON `{"anomalies": [...]}` sorted by `start_line`, includes rounded `score` and concatenated multiline `content`; package exports updated.
Pipeline Formatter Selection and Blocks Population `src/cordon/pipeline.py`	`SemanticLogAnalyzer` selects `JsonFormatter` when `config.output_format == "json"`, otherwise `XmlFormatter`; `analyze_file_detailed` materializes merged blocks into `AnalysisResult.blocks` as a list.
JSON Formatter Tests `tests/test_json_formatter.py`	Tests verify single/multiple/empty blocks, score rounding, content inclusion, special-character preservation (no XML escaping), and that output parses as valid JSON.

Pipeline Integration and Unit Tests

Layer / File(s)	Summary
Pipeline Analysis Tests `tests/test_integration.py`, `tests/test_pipeline_unit.py`	Integration and unit tests assert `result.blocks` is a `list`, that `result.blocks` is populated from merged blocks, that `JsonFormatter` is selected when `output_format == "json"`, and that custom formatter overrides work.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

calebevans/cordon#35: Overlaps on benchmark/evaluate.py change to how run_cordon_analysis(return_intermediates=True) builds returned embeddings from embedder output.
calebevans/cordon#34: Modifies DensityAnomalyScorer._score_windows, related to scorer/progress changes.
calebevans/cordon#37: Related refactor of SemanticLogAnalyzer and formatter initialization that intersects with formatter selection changes.

Poem

🐰 I munched some carrots, then wrote some code so fine,
Formats now JSON or XML, progress bars decline,
Configs snug in dataclasses, formatters take their place,
Blocks listed, tests passing — a tidy little space! 🥕✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely summarizes the three main changes: JSON output format support, structured blocks field, and progress bar control—matching the PR's primary objectives.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/output-formats

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

tests/test_cli.py (1)

75-86: ⚡ Quick win

Add parser coverage for --quiet behavior.

This file now covers --format, but not the new --quiet/-q flag. Add tests for both default (False) and enabled (True) so CLI/config wiring regressions are caught early.

Proposed patch

 class TestParseArgs:
@@
     def test_format_default_xml(self, monkeypatch: pytest.MonkeyPatch) -> None:
         """Test that --format defaults to xml."""
         monkeypatch.setattr(sys, "argv", ["cordon", "test.log"])
         args = parse_args()
         assert args.output_format == "xml"
+
+    def test_quiet_flag(self, monkeypatch: pytest.MonkeyPatch) -> None:
+        """Test --quiet / -q flag."""
+        monkeypatch.setattr(sys, "argv", ["cordon", "--quiet", "test.log"])
+        args = parse_args()
+        assert args.quiet is True
+
+    def test_quiet_default_false(self, monkeypatch: pytest.MonkeyPatch) -> None:
+        """Test that --quiet defaults to False."""
+        monkeypatch.setattr(sys, "argv", ["cordon", "test.log"])
+        args = parse_args()
+        assert args.quiet is False

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cli.py` around lines 75 - 86, Add two tests mirroring the existing
format tests to cover the new --quiet/-q flag: use monkeypatch.setattr(sys,
"argv", [...]) to simulate CLI invocation, call parse_args(), and assert
args.quiet is False for the default case (e.g., ["cordon", "test.log"]) and True
for the enabled cases (e.g., ["cordon", "--quiet", "test.log"] and optionally
["cordon", "-q", "test.log"]); reference the existing test functions
test_format_json and test_format_default_xml and the parse_args() function to
locate where to add these new tests.

src/cordon/core/config.py (1)

93-94: ⚡ Quick win

Validate new config fields in __post_init__ to fail fast.

output_format and show_progress are now public config inputs, but they are not validated. Add explicit checks so invalid values are rejected at construction time.

Proposed patch

 def _validate_core_params(self) -> None:
     """Validate core analysis parameters."""
@@
     if self.scoring_batch_size is not None and self.scoring_batch_size < 1:
         raise ValueError("scoring_batch_size must be >= 1 or None for auto-detect")
     if self.device is not None and self.device not in ("cuda", "mps", "cpu"):
         raise ValueError("device must be 'cuda', 'mps', 'cpu', or None")
+    if self.output_format not in ("xml", "json"):
+        raise ValueError("output_format must be 'xml' or 'json'")
+    if not isinstance(self.show_progress, bool):
+        raise ValueError("show_progress must be a boolean")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cordon/core/config.py` around lines 93 - 94, Add validation inside the
dataclass __post_init__ to fail fast: verify show_progress is a bool and
output_format is one of the allowed literals ("xml","json"); if either check
fails, raise a ValueError with a clear message. Update the __post_init__ method
that runs after construction to perform these checks for the fields
show_progress and output_format so invalid config values are rejected at
instantiation time.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cordon/core/types.py`:
- Around line 92-94: The docstring for AnalysisResult.output currently describes
only XML; update it to be format-agnostic by changing the description to
something like "Formatted output string (e.g., XML or JSON) representing the
analysis results" so it no longer implies XML-only; modify the docstring on the
AnalysisResult.output attribute in src/cordon/core/types.py and ensure
neighboring descriptions (blocks, total_lines) remain unchanged.

---

Nitpick comments:
In `@src/cordon/core/config.py`:
- Around line 93-94: Add validation inside the dataclass __post_init__ to fail
fast: verify show_progress is a bool and output_format is one of the allowed
literals ("xml","json"); if either check fails, raise a ValueError with a clear
message. Update the __post_init__ method that runs after construction to perform
these checks for the fields show_progress and output_format so invalid config
values are rejected at instantiation time.

In `@tests/test_cli.py`:
- Around line 75-86: Add two tests mirroring the existing format tests to cover
the new --quiet/-q flag: use monkeypatch.setattr(sys, "argv", [...]) to simulate
CLI invocation, call parse_args(), and assert args.quiet is False for the
default case (e.g., ["cordon", "test.log"]) and True for the enabled cases
(e.g., ["cordon", "--quiet", "test.log"] and optionally ["cordon", "-q",
"test.log"]); reference the existing test functions test_format_json and
test_format_default_xml and the parse_args() function to locate where to add
these new tests.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9419a413-1fb7-4df3-a831-717c2ec662a9

📥 Commits

Reviewing files that changed from the base of the PR and between a818f0a and b953b3e.

📒 Files selected for processing (16)

benchmark/evaluate.py
src/cordon/analysis/scorer.py
src/cordon/cli.py
src/cordon/core/config.py
src/cordon/core/types.py
src/cordon/embedding/llama_cpp.py
src/cordon/embedding/remote.py
src/cordon/embedding/transformer.py
src/cordon/pipeline.py
src/cordon/postprocess/__init__.py
src/cordon/postprocess/json_formatter.py
tests/test_cli.py
tests/test_core.py
tests/test_integration.py
tests/test_json_formatter.py
tests/test_pipeline_unit.py

coderabbitai · 2026-05-23T18:27:25Z

        output: Formatted output string with XML tags.
+        blocks: Structured list of merged anomaly blocks.
        total_lines: Total number of lines in the input file.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update AnalysisResult.output docstring to reflect JSON support.

output is documented as XML-only, but this PR introduces JSON output too. Please make the attribute description format-agnostic.

Proposed patch

- output: Formatted output string with XML tags. + output: Formatted output string (XML or JSON, depending on configuration). blocks: Structured list of merged anomaly blocks.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

output: Formatted output string with XML tags.

blocks: Structured list of merged anomaly blocks.

total_lines: Total number of lines in the input file.

output: Formatted output string (XML or JSON, depending on configuration).

blocks: Structured list of merged anomaly blocks.

total_lines: Total number of lines in the input file.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cordon/core/types.py` around lines 92 - 94, The docstring for AnalysisResult.output currently describes only XML; update it to be format-agnostic by changing the description to something like "Formatted output string (e.g., XML or JSON) representing the analysis results" so it no longer implies XML-only; modify the docstring on the AnalysisResult.output attribute in src/cordon/core/types.py and ensure neighboring descriptions (blocks, total_lines) remain unchanged.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/cordon/core/config.py (1)
94-119: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add runtime validation for output_format in AnalysisConfig.

Line 94 introduces constrained values, but invalid strings can still be passed at runtime (type hints alone won’t enforce this). Please fail fast in _validate_core_params to keep behavior consistent with other validated enum-like fields.
Proposed patch
 def _validate_core_params(self) -> None:
     """Validate core analysis parameters."""
@@
     if self.device is not None and self.device not in ("cuda", "mps", "cpu"):
         raise ValueError("device must be 'cuda', 'mps', 'cpu', or None")
+    if self.output_format not in ("xml", "json"):
+        raise ValueError("output_format must be 'xml' or 'json'")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cordon/core/config.py` around lines 94 - 119, The output_format field on
AnalysisConfig isn't being validated at runtime; update the
_validate_core_params method to check that self.output_format is either "xml" or
"json" and raise a ValueError with a clear message if not. Locate the validation
logic inside AnalysisConfig::_validate_core_params and add a check similar to
the other field validations (e.g., if self.output_format not in ("xml", "json"):
raise ValueError(...)) so invalid strings fail fast.
src/cordon/pipeline.py (1)
79-79: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update docstring to reflect JSON output support.

The docstring states "Formatted string with XML-tagged significant blocks" but the method can now return JSON-formatted output when output_format="json". Consider updating to "Formatted string with significant blocks" or "Formatted string with significant blocks (XML or JSON format)".
📝 Suggested docstring update
-            Formatted string with XML-tagged significant blocks.
+            Formatted string with significant blocks (XML or JSON format).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cordon/pipeline.py` at line 79, Update the docstring that currently reads
"Formatted string with XML-tagged significant blocks." to reflect JSON support;
mention the output_format parameter (e.g., "Formatted string with significant
blocks (XML or JSON format)" or "Formatted string with significant blocks") in
the docstring for the function/method that returns formatted output and accepts
output_format, so callers see that JSON output is supported.

🧹 Nitpick comments (1)

tests/test_cli.py (1)

75-86: ⚡ Quick win

Add a parsing test for --quiet to cover progress-toggle CLI behavior.

This file now covers --format; adding one --quiet assertion would round out coverage for the paired CLI feature in this PR.

Proposed patch

 class TestParseArgs:
@@
     def test_format_default_xml(self, monkeypatch: pytest.MonkeyPatch) -> None:
         """Test that --format defaults to xml."""
         monkeypatch.setattr(sys, "argv", ["cordon", "test.log"])
         args = parse_args()
         assert args.output_format == "xml"
+
+    def test_quiet_flag(self, monkeypatch: pytest.MonkeyPatch) -> None:
+        """Test --quiet flag."""
+        monkeypatch.setattr(sys, "argv", ["cordon", "--quiet", "test.log"])
+        args = parse_args()
+        assert args.quiet is True

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cli.py` around lines 75 - 86, Add a new unit test in
tests/test_cli.py that verifies the --quiet flag is parsed: create a test
function (e.g. test_quiet_flag) that monkeypatches sys.argv to ["cordon",
"--quiet", "test.log"], calls parse_args(), and asserts the resulting namespace
has args.quiet == True (or the appropriate attribute indicating quiet/progress
off). Place it alongside the existing test_format_* tests and reuse monkeypatch
and parse_args to keep style consistent.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/cordon/core/config.py`:
- Around line 94-119: The output_format field on AnalysisConfig isn't being
validated at runtime; update the _validate_core_params method to check that
self.output_format is either "xml" or "json" and raise a ValueError with a clear
message if not. Locate the validation logic inside
AnalysisConfig::_validate_core_params and add a check similar to the other field
validations (e.g., if self.output_format not in ("xml", "json"): raise
ValueError(...)) so invalid strings fail fast.

In `@src/cordon/pipeline.py`:
- Line 79: Update the docstring that currently reads "Formatted string with
XML-tagged significant blocks." to reflect JSON support; mention the
output_format parameter (e.g., "Formatted string with significant blocks (XML or
JSON format)" or "Formatted string with significant blocks") in the docstring
for the function/method that returns formatted output and accepts output_format,
so callers see that JSON output is supported.

---

Nitpick comments:
In `@tests/test_cli.py`:
- Around line 75-86: Add a new unit test in tests/test_cli.py that verifies the
--quiet flag is parsed: create a test function (e.g. test_quiet_flag) that
monkeypatches sys.argv to ["cordon", "--quiet", "test.log"], calls parse_args(),
and asserts the resulting namespace has args.quiet == True (or the appropriate
attribute indicating quiet/progress off). Place it alongside the existing
test_format_* tests and reuse monkeypatch and parse_args to keep style
consistent.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 52cc56fc-bf49-46bc-affd-83fc6313c994

📥 Commits

Reviewing files that changed from the base of the PR and between b953b3e and 158354d.

📒 Files selected for processing (11)

src/cordon/cli.py
src/cordon/core/config.py
src/cordon/core/types.py
src/cordon/pipeline.py
src/cordon/postprocess/__init__.py
src/cordon/postprocess/json_formatter.py
tests/test_cli.py
tests/test_core.py
tests/test_integration.py
tests/test_json_formatter.py
tests/test_pipeline_unit.py

- Add blocks field to AnalysisResult for direct structured access to anomaly data without parsing formatted output - Create JsonFormatter producing JSON with anomalies array - Add output_format config field and --format CLI flag - Pipeline selects formatter based on config when none injected - Add JSON formatter tests and update existing tests Co-authored-by: Cursor <cursoragent@cursor.com>

sonarqubecloud · 2026-05-23T18:33:50Z

Quality Gate passed

Issues
0 New issues
4 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

calebevans self-assigned this May 23, 2026

calebevans added the enhancement New feature or request label May 23, 2026

calebevans force-pushed the feat/output-formats branch from b953b3e to 158354d Compare May 23, 2026 18:26

coderabbitai Bot reviewed May 23, 2026

View reviewed changes

calebevans force-pushed the feat/output-formats branch from 158354d to 958f2db Compare May 23, 2026 18:33

calebevans merged commit 006d26d into main May 23, 2026
22 of 23 checks passed

calebevans deleted the feat/output-formats branch May 23, 2026 22:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add JSON output format, structured blocks, and progress bar control#38

Add JSON output format, structured blocks, and progress bar control#38
calebevans merged 2 commits into
mainfrom
feat/output-formats

calebevans commented May 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 23, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 23, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

sonarqubecloud Bot commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

calebevans commented May 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented May 23, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

calebevans commented May 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 23, 2026 •

edited

Loading