Skip to content

Add JSON output format, structured blocks, and progress bar control#38

Merged
calebevans merged 2 commits into
mainfrom
feat/output-formats
May 23, 2026
Merged

Add JSON output format, structured blocks, and progress bar control#38
calebevans merged 2 commits into
mainfrom
feat/output-formats

Conversation

@calebevans
Copy link
Copy Markdown
Owner

@calebevans calebevans commented May 23, 2026

Summary

  • Add show_progress config option (bool, default True) to control tqdm progress bars across all embedding backends and scorer
  • Add --quiet / -q CLI flag that sets show_progress=False
  • Add output_format config option (Literal["xml", "json"], default "xml")
  • Add --format {xml,json} CLI flag
  • Create JsonFormatter producing JSON with an anomalies array (start_line, end_line, score, content)
  • Add blocks: list[MergedBlock] field to AnalysisResult for structured access to anomaly data without parsing formatted output
  • Pipeline selects formatter automatically based on output_format when none is injected

Summary by CodeRabbit

  • New Features

    • Added --format CLI option (xml|json) and --quiet to suppress progress display
    • New JSON output formatter and automatic formatter selection based on --format
    • Analysis results now include a materialized list of anomaly blocks in detailed output
  • Behavior

    • Progress bars can be disabled via configuration/--quiet and are now consistently respected across embedding and scoring
  • Tests

    • Added tests for CLI format, JSON formatter, and blocks presence in results

Review Change Stack

- Add show_progress field to AnalysisConfig (defaults to True)
- Pass disable=not show_progress to all tqdm calls in embedding
  backends and scorer
- Add --quiet / -q CLI flag that sets show_progress=False
- Useful for CI pipelines and library usage where progress bars
  add noise to output

Co-authored-by: Cursor <cursoragent@cursor.com>
@calebevans calebevans self-assigned this May 23, 2026
@calebevans calebevans added the enhancement New feature or request label May 23, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 23, 2026

Warning

Review limit reached

@calebevans, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 2 reviews/hour. Refill in 19 minutes and 1 second.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 40731634-22ea-42c8-99b3-779a6730bcaf

📥 Commits

Reviewing files that changed from the base of the PR and between 158354d and 958f2db.

📒 Files selected for processing (11)
  • src/cordon/cli.py
  • src/cordon/core/config.py
  • src/cordon/core/types.py
  • src/cordon/pipeline.py
  • src/cordon/postprocess/__init__.py
  • src/cordon/postprocess/json_formatter.py
  • tests/test_cli.py
  • tests/test_core.py
  • tests/test_integration.py
  • tests/test_json_formatter.py
  • tests/test_pipeline_unit.py
📝 Walkthrough

Walkthrough

This PR adds output-format selection (JSON or XML) and a --quiet toggle for progress bars, introduces JsonFormatter, wires new fields into AnalysisConfig, updates SemanticLogAnalyzer to choose the formatter and populate AnalysisResult.blocks, and applies progress control across embedders and scorer.

Changes

Output Configuration and Result Structure

Layer / File(s) Summary
Configuration and Result
src/cordon/core/config.py, src/cordon/core/types.py
AnalysisConfig gains show_progress: bool = True and output_format: Literal["xml", "json"] = "xml"; AnalysisResult gains blocks: list[MergedBlock].
Configuration and Result Tests
tests/test_core.py
Tests added to verify output_format defaults to "xml" and accepts "json"; TestAnalysisResult validates blocks storage and empty list handling.

CLI Argument Integration

Layer / File(s) Summary
CLI Arguments and Configuration Wiring
src/cordon/cli.py, tests/test_cli.py
CLI parser adds --format (xml/json, default xml) and --quiet/-q; CLI wiring passes show_progress=not args.quiet and output_format=args.output_format into AnalysisConfig; tests verify parsing.

Progress Bar Visibility Control

Layer / File(s) Summary
Embedder Progress Bar Control
src/cordon/embedding/llama_cpp.py, src/cordon/embedding/remote.py, src/cordon/embedding/transformer.py
Each embedder’s batch tqdm call now uses disable=not self.config.show_progress.
Scorer and Benchmark Progress Control
src/cordon/analysis/scorer.py, benchmark/evaluate.py
DensityAnomalyScorer scoring loop uses disable=not config.show_progress; benchmark run_cordon_analysis(return_intermediates=True) now returns embeddings built from embedder outputs (embedded) instead of scored windows.

JSON Output Formatting

Layer / File(s) Summary
JsonFormatter Implementation and Export
src/cordon/postprocess/json_formatter.py, src/cordon/postprocess/__init__.py
Adds JsonFormatter.format_blocks() that serializes merged blocks to indented JSON {"anomalies": [...]} sorted by start_line, includes rounded score and concatenated multiline content; package exports updated.
Pipeline Formatter Selection and Blocks Population
src/cordon/pipeline.py
SemanticLogAnalyzer selects JsonFormatter when config.output_format == "json", otherwise XmlFormatter; analyze_file_detailed materializes merged blocks into AnalysisResult.blocks as a list.
JSON Formatter Tests
tests/test_json_formatter.py
Tests verify single/multiple/empty blocks, score rounding, content inclusion, special-character preservation (no XML escaping), and that output parses as valid JSON.

Pipeline Integration and Unit Tests

Layer / File(s) Summary
Pipeline Analysis Tests
tests/test_integration.py, tests/test_pipeline_unit.py
Integration and unit tests assert result.blocks is a list, that result.blocks is populated from merged blocks, that JsonFormatter is selected when output_format == "json", and that custom formatter overrides work.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • calebevans/cordon#35: Overlaps on benchmark/evaluate.py change to how run_cordon_analysis(return_intermediates=True) builds returned embeddings from embedder output.
  • calebevans/cordon#34: Modifies DensityAnomalyScorer._score_windows, related to scorer/progress changes.
  • calebevans/cordon#37: Related refactor of SemanticLogAnalyzer and formatter initialization that intersects with formatter selection changes.

Poem

🐰 I munched some carrots, then wrote some code so fine,
Formats now JSON or XML, progress bars decline,
Configs snug in dataclasses, formatters take their place,
Blocks listed, tests passing — a tidy little space! 🥕✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the three main changes: JSON output format support, structured blocks field, and progress bar control—matching the PR's primary objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/output-formats

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@calebevans calebevans force-pushed the feat/output-formats branch from b953b3e to 158354d Compare May 23, 2026 18:26
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/test_cli.py (1)

75-86: ⚡ Quick win

Add parser coverage for --quiet behavior.

This file now covers --format, but not the new --quiet/-q flag. Add tests for both default (False) and enabled (True) so CLI/config wiring regressions are caught early.

Proposed patch
 class TestParseArgs:
@@
     def test_format_default_xml(self, monkeypatch: pytest.MonkeyPatch) -> None:
         """Test that --format defaults to xml."""
         monkeypatch.setattr(sys, "argv", ["cordon", "test.log"])
         args = parse_args()
         assert args.output_format == "xml"
+
+    def test_quiet_flag(self, monkeypatch: pytest.MonkeyPatch) -> None:
+        """Test --quiet / -q flag."""
+        monkeypatch.setattr(sys, "argv", ["cordon", "--quiet", "test.log"])
+        args = parse_args()
+        assert args.quiet is True
+
+    def test_quiet_default_false(self, monkeypatch: pytest.MonkeyPatch) -> None:
+        """Test that --quiet defaults to False."""
+        monkeypatch.setattr(sys, "argv", ["cordon", "test.log"])
+        args = parse_args()
+        assert args.quiet is False
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cli.py` around lines 75 - 86, Add two tests mirroring the existing
format tests to cover the new --quiet/-q flag: use monkeypatch.setattr(sys,
"argv", [...]) to simulate CLI invocation, call parse_args(), and assert
args.quiet is False for the default case (e.g., ["cordon", "test.log"]) and True
for the enabled cases (e.g., ["cordon", "--quiet", "test.log"] and optionally
["cordon", "-q", "test.log"]); reference the existing test functions
test_format_json and test_format_default_xml and the parse_args() function to
locate where to add these new tests.
src/cordon/core/config.py (1)

93-94: ⚡ Quick win

Validate new config fields in __post_init__ to fail fast.

output_format and show_progress are now public config inputs, but they are not validated. Add explicit checks so invalid values are rejected at construction time.

Proposed patch
 def _validate_core_params(self) -> None:
     """Validate core analysis parameters."""
@@
     if self.scoring_batch_size is not None and self.scoring_batch_size < 1:
         raise ValueError("scoring_batch_size must be >= 1 or None for auto-detect")
     if self.device is not None and self.device not in ("cuda", "mps", "cpu"):
         raise ValueError("device must be 'cuda', 'mps', 'cpu', or None")
+    if self.output_format not in ("xml", "json"):
+        raise ValueError("output_format must be 'xml' or 'json'")
+    if not isinstance(self.show_progress, bool):
+        raise ValueError("show_progress must be a boolean")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cordon/core/config.py` around lines 93 - 94, Add validation inside the
dataclass __post_init__ to fail fast: verify show_progress is a bool and
output_format is one of the allowed literals ("xml","json"); if either check
fails, raise a ValueError with a clear message. Update the __post_init__ method
that runs after construction to perform these checks for the fields
show_progress and output_format so invalid config values are rejected at
instantiation time.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cordon/core/types.py`:
- Around line 92-94: The docstring for AnalysisResult.output currently describes
only XML; update it to be format-agnostic by changing the description to
something like "Formatted output string (e.g., XML or JSON) representing the
analysis results" so it no longer implies XML-only; modify the docstring on the
AnalysisResult.output attribute in src/cordon/core/types.py and ensure
neighboring descriptions (blocks, total_lines) remain unchanged.

---

Nitpick comments:
In `@src/cordon/core/config.py`:
- Around line 93-94: Add validation inside the dataclass __post_init__ to fail
fast: verify show_progress is a bool and output_format is one of the allowed
literals ("xml","json"); if either check fails, raise a ValueError with a clear
message. Update the __post_init__ method that runs after construction to perform
these checks for the fields show_progress and output_format so invalid config
values are rejected at instantiation time.

In `@tests/test_cli.py`:
- Around line 75-86: Add two tests mirroring the existing format tests to cover
the new --quiet/-q flag: use monkeypatch.setattr(sys, "argv", [...]) to simulate
CLI invocation, call parse_args(), and assert args.quiet is False for the
default case (e.g., ["cordon", "test.log"]) and True for the enabled cases
(e.g., ["cordon", "--quiet", "test.log"] and optionally ["cordon", "-q",
"test.log"]); reference the existing test functions test_format_json and
test_format_default_xml and the parse_args() function to locate where to add
these new tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9419a413-1fb7-4df3-a831-717c2ec662a9

📥 Commits

Reviewing files that changed from the base of the PR and between a818f0a and b953b3e.

📒 Files selected for processing (16)
  • benchmark/evaluate.py
  • src/cordon/analysis/scorer.py
  • src/cordon/cli.py
  • src/cordon/core/config.py
  • src/cordon/core/types.py
  • src/cordon/embedding/llama_cpp.py
  • src/cordon/embedding/remote.py
  • src/cordon/embedding/transformer.py
  • src/cordon/pipeline.py
  • src/cordon/postprocess/__init__.py
  • src/cordon/postprocess/json_formatter.py
  • tests/test_cli.py
  • tests/test_core.py
  • tests/test_integration.py
  • tests/test_json_formatter.py
  • tests/test_pipeline_unit.py

Comment thread src/cordon/core/types.py
Comment on lines 92 to 94
output: Formatted output string with XML tags.
blocks: Structured list of merged anomaly blocks.
total_lines: Total number of lines in the input file.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update AnalysisResult.output docstring to reflect JSON support.

output is documented as XML-only, but this PR introduces JSON output too. Please make the attribute description format-agnostic.

Proposed patch
-        output: Formatted output string with XML tags.
+        output: Formatted output string (XML or JSON, depending on configuration).
         blocks: Structured list of merged anomaly blocks.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
output: Formatted output string with XML tags.
blocks: Structured list of merged anomaly blocks.
total_lines: Total number of lines in the input file.
output: Formatted output string (XML or JSON, depending on configuration).
blocks: Structured list of merged anomaly blocks.
total_lines: Total number of lines in the input file.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cordon/core/types.py` around lines 92 - 94, The docstring for
AnalysisResult.output currently describes only XML; update it to be
format-agnostic by changing the description to something like "Formatted output
string (e.g., XML or JSON) representing the analysis results" so it no longer
implies XML-only; modify the docstring on the AnalysisResult.output attribute in
src/cordon/core/types.py and ensure neighboring descriptions (blocks,
total_lines) remain unchanged.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/cordon/core/config.py (1)

94-119: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add runtime validation for output_format in AnalysisConfig.

Line 94 introduces constrained values, but invalid strings can still be passed at runtime (type hints alone won’t enforce this). Please fail fast in _validate_core_params to keep behavior consistent with other validated enum-like fields.

Proposed patch
 def _validate_core_params(self) -> None:
     """Validate core analysis parameters."""
@@
     if self.device is not None and self.device not in ("cuda", "mps", "cpu"):
         raise ValueError("device must be 'cuda', 'mps', 'cpu', or None")
+    if self.output_format not in ("xml", "json"):
+        raise ValueError("output_format must be 'xml' or 'json'")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cordon/core/config.py` around lines 94 - 119, The output_format field on
AnalysisConfig isn't being validated at runtime; update the
_validate_core_params method to check that self.output_format is either "xml" or
"json" and raise a ValueError with a clear message if not. Locate the validation
logic inside AnalysisConfig::_validate_core_params and add a check similar to
the other field validations (e.g., if self.output_format not in ("xml", "json"):
raise ValueError(...)) so invalid strings fail fast.
src/cordon/pipeline.py (1)

79-79: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update docstring to reflect JSON output support.

The docstring states "Formatted string with XML-tagged significant blocks" but the method can now return JSON-formatted output when output_format="json". Consider updating to "Formatted string with significant blocks" or "Formatted string with significant blocks (XML or JSON format)".

📝 Suggested docstring update
-            Formatted string with XML-tagged significant blocks.
+            Formatted string with significant blocks (XML or JSON format).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cordon/pipeline.py` at line 79, Update the docstring that currently reads
"Formatted string with XML-tagged significant blocks." to reflect JSON support;
mention the output_format parameter (e.g., "Formatted string with significant
blocks (XML or JSON format)" or "Formatted string with significant blocks") in
the docstring for the function/method that returns formatted output and accepts
output_format, so callers see that JSON output is supported.
🧹 Nitpick comments (1)
tests/test_cli.py (1)

75-86: ⚡ Quick win

Add a parsing test for --quiet to cover progress-toggle CLI behavior.

This file now covers --format; adding one --quiet assertion would round out coverage for the paired CLI feature in this PR.

Proposed patch
 class TestParseArgs:
@@
     def test_format_default_xml(self, monkeypatch: pytest.MonkeyPatch) -> None:
         """Test that --format defaults to xml."""
         monkeypatch.setattr(sys, "argv", ["cordon", "test.log"])
         args = parse_args()
         assert args.output_format == "xml"
+
+    def test_quiet_flag(self, monkeypatch: pytest.MonkeyPatch) -> None:
+        """Test --quiet flag."""
+        monkeypatch.setattr(sys, "argv", ["cordon", "--quiet", "test.log"])
+        args = parse_args()
+        assert args.quiet is True
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cli.py` around lines 75 - 86, Add a new unit test in
tests/test_cli.py that verifies the --quiet flag is parsed: create a test
function (e.g. test_quiet_flag) that monkeypatches sys.argv to ["cordon",
"--quiet", "test.log"], calls parse_args(), and asserts the resulting namespace
has args.quiet == True (or the appropriate attribute indicating quiet/progress
off). Place it alongside the existing test_format_* tests and reuse monkeypatch
and parse_args to keep style consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/cordon/core/config.py`:
- Around line 94-119: The output_format field on AnalysisConfig isn't being
validated at runtime; update the _validate_core_params method to check that
self.output_format is either "xml" or "json" and raise a ValueError with a clear
message if not. Locate the validation logic inside
AnalysisConfig::_validate_core_params and add a check similar to the other field
validations (e.g., if self.output_format not in ("xml", "json"): raise
ValueError(...)) so invalid strings fail fast.

In `@src/cordon/pipeline.py`:
- Line 79: Update the docstring that currently reads "Formatted string with
XML-tagged significant blocks." to reflect JSON support; mention the
output_format parameter (e.g., "Formatted string with significant blocks (XML or
JSON format)" or "Formatted string with significant blocks") in the docstring
for the function/method that returns formatted output and accepts output_format,
so callers see that JSON output is supported.

---

Nitpick comments:
In `@tests/test_cli.py`:
- Around line 75-86: Add a new unit test in tests/test_cli.py that verifies the
--quiet flag is parsed: create a test function (e.g. test_quiet_flag) that
monkeypatches sys.argv to ["cordon", "--quiet", "test.log"], calls parse_args(),
and asserts the resulting namespace has args.quiet == True (or the appropriate
attribute indicating quiet/progress off). Place it alongside the existing
test_format_* tests and reuse monkeypatch and parse_args to keep style
consistent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 52cc56fc-bf49-46bc-affd-83fc6313c994

📥 Commits

Reviewing files that changed from the base of the PR and between b953b3e and 158354d.

📒 Files selected for processing (11)
  • src/cordon/cli.py
  • src/cordon/core/config.py
  • src/cordon/core/types.py
  • src/cordon/pipeline.py
  • src/cordon/postprocess/__init__.py
  • src/cordon/postprocess/json_formatter.py
  • tests/test_cli.py
  • tests/test_core.py
  • tests/test_integration.py
  • tests/test_json_formatter.py
  • tests/test_pipeline_unit.py

- Add blocks field to AnalysisResult for direct structured access
  to anomaly data without parsing formatted output
- Create JsonFormatter producing JSON with anomalies array
- Add output_format config field and --format CLI flag
- Pipeline selects formatter based on config when none injected
- Add JSON formatter tests and update existing tests

Co-authored-by: Cursor <cursoragent@cursor.com>
@calebevans calebevans force-pushed the feat/output-formats branch from 158354d to 958f2db Compare May 23, 2026 18:33
@sonarqubecloud
Copy link
Copy Markdown

@calebevans calebevans merged commit 006d26d into main May 23, 2026
22 of 23 checks passed
@calebevans calebevans deleted the feat/output-formats branch May 23, 2026 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant