Restore bash-parity regressions and review fixes by simonsmallchua · Pull Request #15 · Good-Native/paperbark

simonsmallchua · 2026-05-04T22:30:31Z

Summary

Bundles every issue surfaced by the v0.1.0 review and the bash-parity audit against ~/Documents/GitHub/hover/scripts/logs.sh. All changes land behind passing CI; no schema breaks beyond reverting the v0.1.0 filename-pattern drift (which itself was the contract break — see point 4).

Bash-parity regressions restored

Per-iter flat CSV side-output — dispatcher now passes flat_csv_path to summarise_log_file, so <YYYYMMDDTHHMMSSZ>_iter<N>.csv lands next to the JSON for every iteration. v0.1.0 silently dropped it.
Cleanup / rotation — new [monitor] keys cleanup_enabled (default true), cleanup_days (default 1), cleanup_mode (default "zip"), plus CLI flags --cleanup / --no-cleanup, --cleanup-days N, --cleanup-mode {zip,delete}. "zip" archives each <app>/raw/ to a sibling raw.zip (verified readable via testzip() before the original tree is removed) and strips per-iter *_iter*.{json,csv} files; summaries and time-series CSVs are preserved. Pass runs once at loop start.
samples line cap on flyctl — [[sources]].samples (default 400, matches reference/logs.sh's --samples). flyctl logs itself has no native flag for this (-n is the short form of --no-tail), so the bound is enforced inside FlyctlSource.capture() via a deque(maxlen=samples) — same behaviour as the bash dispatcher's flyctl logs … | tail -n <samples> pipe. An earlier draft of this PR tried to pass -n <samples> directly to flyctl, which silently broke captures (flyctl interpreted -n as --no-tail and 400 as a positional arg → 0 lines per iter); fixed in 79f0411 before merge after live smoke against the hover Fly.io app caught the regression.
Per-iter filename pattern reverted — <YYYYMMDDTHHMMSSZ>_iter<N>.{log,json,csv} (timestamp first, no zero-padded iter index), as originally documented. v0.1.0 used iter_<NNNN>_<YYYYMMDDTHHMMSSZ>, which broke the run-dir contract and any tooling sorting by iteration filename.

Review fixes

Per-source format selection — [[sources]].format_keys table (flyctl) overrides JSON key tuples for canonical fields (timestamp, level, message, component). Each value may be a string or list of strings; unknown field names are rejected. Threaded through FlyctlSource → capture_iteration → summarise_log_file → summarise_lines. Non-JSON formats (regex presets) remain on the v0.2 list.
Format-mismatch warning — when an iteration's parse rate drops below 50% (with at least 5 captured lines), monitor.log records the failure every iteration and stderr surfaces a one-time warning per source per run. Closes the silent-failure case where every probe printed "(no matches)" with no diagnostic. Live-tested against hover-analysis (19/100 parsed → fired).
Database probe heading renamed — Database / external → External errors and timeouts. Toggle key stays database (config back-compat); the heading now matches what the default pattern set actually catches (generic Go context timeouts and outbound HTTP failures, not just DB driver errors). Pattern set unchanged; users who want a DB-only matcher override under [probes.patterns].database.
Search ANSI strip — paperbark search now strips ANSI escapes from matched lines by default (Fly's coloured timestamp prefix becomes noise when piped). New --keep-ansi (BooleanOptionalAction, with --no-keep-ansi to clear) flag preserves them for TTY-aware viewers; corresponding [search].keep_ansi TOML key for full flag↔TOML parity per CLAUDE.md.

Tests, lint, types

407 tests pass (was 382): 24 new test functions (some parametrised) across cleanup pass, parse-rate warning, samples knob, format_keys validation, search ANSI handling (default + opt-out + TOML-drives + --no-keep-ansi-clears-TOML), zip archive content, and config-loader cleanup-field validation.
ruff check, ruff format --check, mypy, and pip-audit all clean.
Live smoke against the hover Fly.io app confirms every new behaviour works end-to-end: filename pattern, CSV side-output, format_keys override (visible in summary.md per app), parse-rate warning fires for hover-analysis, probe rename in analysis.md, search ANSI strip + --keep-ansi, and cleanup zip/delete modes against seeded older run dirs.

Test plan

Unit tests cover every new public function and config field.
mypy src tests clean across 60 source files.
pre-commit run --all-files clean (ruff + ruff-format + prettier).
pip-audit -r <(uv export --no-emit-project) clean.
Live smoke against Fly.io app for every new behaviour.
Reviewer should sanity-check the cleanup / _zip_rotate_run paths once more — they're the largest new code path and the ones with the most filesystem side effects.

Closes the bash-parity audit punch list and the v0.1.0 review findings (items #1–8 from prior review).

Summary by CodeRabbit

New Features
- Flyctl source: configurable sampling (default 400) and per-field JSON key overrides; per-iteration flat CSV side‑outputs and restored timestamp_iterN filename shape; run cleanup with retention days and modes (zip/delete); search strips ANSI by default with --keep-ansi.
Bug Fixes
- Monitor now emits a one‑time warning and logs diagnostics when a source’s parse rate is unexpectedly low.
Documentation
- Renamed probe heading to "External errors and timeouts" and expanded monitor/search/Flyctl/config docs.
Tests
- Added tests for sampling, format-key handling, cleanup/rotation, parse warnings and ANSI behaviour.

coderabbitai · 2026-05-04T22:30:44Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR adds monitor run‑dir cleanup/rotation; per‑source samples and format_keys for flyctl captures; per‑iteration flat CSV outputs and parse‑failure one‑time warnings; default ANSI‑stripping in search with --keep-ansi; and renames a probe heading to "External errors and timeouts."

Changes

Monitor Cleanup and Rotation

Layer / File(s)	Summary
Configuration Constants `src/paperbark/config.py`	Adds `DEFAULT_CLEANUP_DAYS`, `DEFAULT_CLEANUP_MODE`, and `CLEANUP_MODES`.
Config Parsing / Dataclass `src/paperbark/config.py`	Extends `MonitorConfig` with `cleanup_enabled`, `cleanup_days`, `cleanup_mode`; `_parse_monitor` validates these fields.
CLI Wiring `src/paperbark/cli.py`	Adds mutually exclusive `--no-cleanup`/`--cleanup`, `--cleanup-days`, `--cleanup-mode`, and merges them into `MonitorConfig`.
Dispatcher Implementation `src/paperbark/dispatcher.py`	Adds `cleanup_old_runs(...)` and helpers; `run_monitor_loop` invokes cleanup when enabled; implements `zip`/`delete` modes and retention cutoff logic.
Starter Template `src/paperbark/init.py`	Updates starter TOML with `cleanup_enabled`, `cleanup_days`, `cleanup_mode` and inline comments.
Tests & Docs `tests/test_config.py`, `tests/test_dispatcher.py`, `docs/CONFIG.md`	Adds tests and docs for parsing/validation, retention semantics, and cleanup behaviour.

Source Configuration and Log Parsing Enhancements

Layer / File(s)	Summary
Format-key Parsing `src/paperbark/dispatcher.py`	Adds `_parse_format_keys()` with allowlist validation and normalisation; `build_source` validates `samples` and `format_keys` and rejects unknown keys.
Flyctl Source `src/paperbark/sources/flyctl.py`	Adds `DEFAULT_SAMPLES = 400`; `FlyctlSource.__init__` accepts `samples` and `format_keys` (validated); `capture()` buffers with `deque(..., maxlen=samples)` and yields only the last N lines.
Iteration Key Resolution `src/paperbark/iteration.py`	Replaces fixed key constants with `DEFAULT_*` sets and `_DEFAULT_FORMAT_KEYS`; adds `_resolved_format_keys()`; `summarise_lines`/`summarise_log_file` accept `format_keys` and compute core fields dynamically.
Capture Integration & Filenames `src/paperbark/dispatcher.py`	`capture_iteration` writes `raw/<timestamp>_iter<N>.log`, `<timestamp>_iter<N>.json` and `<timestamp>_iter<N>.csv`; passes `format_keys` into summariser.
Parse Mismatch Warnings `src/paperbark/dispatcher.py`	Adds `_maybe_warn_parse_failure()` and `parse_warned` tracking; computes parsed/total ratios from iteration summaries and emits one‑time stderr warnings plus `monitor.log` entries for low parse rates.
Tests & Docs `tests/test_sources.py`, `tests/test_dispatcher.py`, `docs/SOURCES.md`, `docs/CONFIG.md`	Adds tests for `samples` enforcement, `format_keys` validation, per‑iteration CSV presence, parse‑warning behaviour; documents flyctl options and run‑dir layout.

Search ANSI Escape Handling

Layer / File(s)	Summary
Search Functionality `src/paperbark/search.py`	Adds `keep_ansi: bool = False` parameter; strips ANSI escapes by default using `ANSI_RE.sub("", line)` and preserves when requested.
CLI Wiring `src/paperbark/cli.py`	`search` subparser adds `--keep-ansi` / `--no-keep-ansi`; `_run_search` forwards flag into `search.run`.
Tests `tests/test_search.py`	Adds tests asserting default ANSI‑stripping and `--keep-ansi` preserves escape sequences; TOML default and CLI override cases covered.

Probe Label Documentation Update

Layer / File(s)	Summary
Probe Definition `src/paperbark/probes/__init__.py`	Renames display label for the `database` probe from `"Database / external"` to `"External errors and timeouts"` and adds explanatory comment; probe key remains `database`.
Documentation `docs/PROBES.md`	Updates heading and clarifies default patterns and override guidance under `[probes.patterns].database`.
Tests `tests/test_probes_config.py`, `tests/test_probes_regex_bucket.py`	Updates tests to expect the renamed probe label across assertions.

Possibly related PRs

Good-Native/paperbark#5: Prior dispatcher/monitor work this PR extends (build_source, capture/iteration wiring).
Good-Native/paperbark#8: Earlier monitor/dispatcher additions that this PR builds upon (MonitorConfig and loop wiring).
Good-Native/paperbark#1: Related search/CLI wiring changes affecting search_runs and CLI argument threading.

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Restore bash-parity regressions and review fixes' accurately describes the primary purpose of the PR: reverting v0.1.0 contract breaks and addressing review feedback through features like filename pattern restoration, cleanup rotation, format-key selection, and ANSI stripping.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docs/CONFIG.md (1)
112-126: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Document --keep-ansi and its matching [search] key

Line 125-Line 126 list search flags, but the new ANSI-preservation option is not documented, and the [search] key table does not show a corresponding boolean key. This leaves the config reference incomplete for a shipped flag.

As per coding guidelines, docs/CONFIG.md: "Maintain docs/CONFIG.md with TOML reference documenting every key".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/CONFIG.md` around lines 112 - 126, Add documentation for the
ANSI-preservation option: add a boolean key `keep_ansi` to the `[search]` TOML
table (default `false`) with a one-line description like "Preserve ANSI color
codes in matches", and update the CLI flags list to include `--keep-ansi` (and
note its effect or that it's mutually exclusive with any `--strip-ansi` if
applicable) so the `[search]` table and the flags section both document the new
option.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/CONFIG.md`:
- Around line 251-253: The docs show the run-dir placeholder `<HHMMSSZ>_iter<N>`
which doesn't match the dispatcher/test contract; update the examples and any
occurrences to the exact UTC timestamp shape `YYYYMMDDTHHMMSSZ_iter<N>` (e.g.,
use the full date+time token), ensuring the README entries that currently list
`<HHMMSSZ>_iter<N>` are replaced and any explanatory text clarifies the exact
format expected by the dispatcher/test contract.

In `@docs/ROADMAP.md`:
- Around line 10-13: Update the v0.1.1 roadmap blurb to list the omitted
features: mention the new format_keys option for per-source field mapping, the
format-mismatch warnings feature, the probe heading rename to "External errors
and timeouts", and the ANSI-stripping behavior change for paperbark search (add
the --keep-ansi flag to preserve ANSI). Reference the existing v0.1.1 sentence
and append these features succinctly so the release summary includes
format_keys, format-mismatch warnings, the renamed probe heading, and the
paperbark search --keep-ansi behavior.
- Line 17: Update the test count text in the ROADMAP entry that currently reads
"**Tests:** 402 passing across 27 test modules; CI has been green on" to reflect
the actual number of tests (change 402 to 342) and ensure any nearby mentions of
test counts are consistent; edit the line containing that exact sentence to
replace the incorrect number.

In `@src/paperbark/dispatcher.py`:
- Around line 477-518: In _zip_rotate_run, avoid removing raw_dir immediately
after shutil.make_archive succeeds; instead, after calling
shutil.make_archive(base, "zip", root_dir=raw_dir.parent, base_dir="raw"),
verify the created zip_path (raw_dir.parent / "raw.zip") is a valid, readable
archive (e.g., attempt opening it with zipfile.ZipFile and calling .testzip() or
reading the file header) and only call shutil.rmtree(raw_dir) when that
verification passes; on verification failure log the error and do not delete the
source directory (keep existing exception handling for make_archive and unlink
failures).

In `@src/paperbark/search.py`:
- Around line 202-203: The CLI-only keep_ansi handling must be plumbed through
the search config and parser so TOML [search].keep_ansi can be used and CLI
still overrides it: add a keep_ansi field to SearchConfig (bool, default False),
read the TOML value in _parse_search into that field, and modify the code path
that currently does keep_ansi = bool(getattr(args, "keep_ansi", False)) to
instead compute keep_ansi = args.keep_ansi if hasattr(args, "keep_ansi") and
args.keep_ansi is not None else parsed_search_config.keep_ansi; ensure the
computed keep_ansi (bool) is passed into search_runs and update starter/docs
surfaces to document [search].keep_ansi with the same semantics.

In `@tests/test_dispatcher.py`:
- Around line 757-770: Extend the
test_cleanup_zip_archives_raw_and_strips_iter_files test to open the produced
raw.zip (created by cleanup_old_runs/_zip_rotate_run) and assert the archived
file list contains the raw artefacts that were removed from disk (e.g., entries
under raw/ such as the iter JSON and CSV filenames) while the originals on disk
are gone and summary.md remains on disk; use the same generated filenames (from
_seed_old_run) to check presence inside the zip and absence on the filesystem so
the test verifies actual zip contents, not just existence of raw.zip.
- Around line 739-740: The assertion is order-dependent because it checks only
the first bucket from component_counts; instead, ensure the test asserts that at
least one bucket contains "api" by scanning all values in component_counts
(e.g., use any(...) or set union over component_counts.values()) so the
assertion is stable regardless of ordering; update the assertion that currently
reads using next(iter(component_counts.values())) to a check over all values of
component_counts (referencing the variables component_counts and summary).
- Line 646: The test's assertion is too loose because it only checks that "750"
appears in source.command; change it to assert the exact flag/value pairing by
checking for "-n 750" (or a regex like r"\b-n\s+750\b") in source.command so the
samples count is verified explicitly; update the assertion referring to
source.command accordingly.

---

Outside diff comments:
In `@docs/CONFIG.md`:
- Around line 112-126: Add documentation for the ANSI-preservation option: add a
boolean key `keep_ansi` to the `[search]` TOML table (default `false`) with a
one-line description like "Preserve ANSI color codes in matches", and update the
CLI flags list to include `--keep-ansi` (and note its effect or that it's
mutually exclusive with any `--strip-ansi` if applicable) so the `[search]`
table and the flags section both document the new option.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 45b7fe35-c91b-4a9d-943b-6a5d9eef58ad

📥 Commits

Reviewing files that changed from the base of the PR and between 75cd478 and 7bd992c.

📒 Files selected for processing (19)

CHANGELOG.md
docs/CONFIG.md
docs/PROBES.md
docs/ROADMAP.md
docs/SOURCES.md
src/paperbark/cli.py
src/paperbark/config.py
src/paperbark/dispatcher.py
src/paperbark/init.py
src/paperbark/iteration.py
src/paperbark/probes/__init__.py
src/paperbark/search.py
src/paperbark/sources/flyctl.py
tests/test_config.py
tests/test_dispatcher.py
tests/test_probes_config.py
tests/test_probes_regex_bucket.py
tests/test_search.py
tests/test_sources.py

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CHANGELOG.md`:
- Line 46: The Unreleased changelog line "New tests: 20 covering the cleanup
pass (`zip`/`delete` modes," is inconsistent with the PR summary which reports
+21 tests; update that string in CHANGELOG.md to "New tests: 21 covering the
cleanup pass (`zip`/`delete` modes," (or adjust the PR summary instead) so the
Unreleased notes and PR description match.
- Around line 14-16: Update the CHANGELOG entry so it reflects that sampling is
performed by passing a `-n <samples>` argument from FlyctlSource.command rather
than an internal bounded deque in capture(); locate the text referring to
`flyctl -n`/`--no-tail`, `capture()`, and the bounded `deque` and change it to
state that the implementation now emits `-n <samples>` via FlyctlSource.command
to perform sampling, removing the incorrect claim about enforcing the bound
inside capture() with a deque.

In `@tests/test_dispatcher.py`:
- Around line 840-848: The test uses a finite iterator mono_seq for the
monotonic parameter which can raise StopIteration if the loop calls monotonic
more times; replace mono_seq with the reusable helper _FakeMonotonic and pass
_FakeMonotonic(start=0.0, step=0.1) (or equivalent args used elsewhere) into
run_monitor_loop's monotonic kwarg so the monitor loop gets a stable,
non-exhausting monotonic source; update the test invocation that currently sets
monotonic=lambda: next(mono_seq) to use _FakeMonotonic instead and keep the rest
of the run_monitor_loop call the same.

In `@tests/test_sources.py`:
- Around line 60-62: The test name claims it covers both zero and negative
samples but only checks zero; update
test_flyctl_rejects_zero_or_negative_samples to either parametrize over samples
values (e.g., 0 and -1) and assert both raise ValueError matching "samples must
be > 0", or if you prefer a narrower check, rename the test to
test_flyctl_rejects_zero_samples; locate the instantiation of FlyctlSource in
the test and add the additional negative sample case (or rename the test
function accordingly).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 03762036-2370-4c51-af9e-1ba5d1917892

📥 Commits

Reviewing files that changed from the base of the PR and between 7bd992c and 79f0411.

📒 Files selected for processing (5)

CHANGELOG.md
src/paperbark/dispatcher.py
src/paperbark/sources/flyctl.py
tests/test_dispatcher.py
tests/test_sources.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

CHANGELOG.md (2)

12-19: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Sampling mechanism is documented incorrectly in two places.

These entries still say sampling is enforced via a bounded deque and that flyctl has no native -n path, but this PR’s stated behaviour is that FlyctlSource.command emits -n <samples>. Please align both sections to avoid publishing the wrong contract.

Also applies to: 68-71

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CHANGELOG.md` around lines 12 - 19, Update the changelog wording to reflect
the actual sampling implementation: replace claims that sampling is enforced by
a bounded deque inside capture() and that flyctl lacks a native -n option with a
statement that FlyctlSource.command emits "-n <samples>" (so sampling is
performed by passing -n to flyctl), and mirror this correction in the other
occurrence referencing the same behavior (the section that currently claims
deque-based truncation).

47-51: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

New test count appears inconsistent with the PR summary.

The changelog says 24 new tests, while the PR notes say 21. Please reconcile to a single verified number before merge.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CHANGELOG.md` around lines 47 - 51, The CHANGELOG entry currently states "24
new tests covering the cleanup pass..." which conflicts with the PR summary
stating 21; verify the actual number of new tests and update the changelog
string to the verified count (e.g., change "24 new tests" to "21 new tests" if
PR summary is correct) so both the CHANGELOG and the PR notes report the same,
verified test count; edit the literal text "24 new tests covering the cleanup
pass (`zip`/`delete` modes, retention window, idempotency, missing-root no-op,
invalid mode, zip-content verification), parse-rate warning, `samples` knob,
`format_keys` validation, search ANSI handling (default + opt-out + TOML drives
+ `--no-keep-ansi` clears TOML)" to reflect the reconciled number.

tests/test_dispatcher.py (1)

637-655: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reinstate explicit -n <samples> command assertion for Flyctl parity.

At Line 638, the docstring says sampling is not via a Flyctl flag, but this PR’s contract requires FlyctlSource.command to emit -n <samples>. The current test can still pass if -n disappears because the mocked runner ignores command args and truncation happens post-capture.

Suggested tightening

 def test_build_source_flyctl_threads_samples_through() -> None:
-    """``samples`` is enforced inside ``capture()`` via a bounded deque (the
-    bash dispatcher's ``| tail -n <samples>`` analogue), not via a flyctl
-    flag. The attribute round-trips so a probe of ``source.samples`` confirms
-    the TOML value reached the source instance.
-    """
+    """``samples`` round-trips from config and is reflected in Flyctl args."""
@@
     source = build_source(spec)
     assert isinstance(source, FlyctlSource)
     assert source.samples == 750
+    assert "-n" in source.command
+    assert source.command[source.command.index("-n") + 1] == "750"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_dispatcher.py` around lines 637 - 655, The test
test_build_source_flyctl_threads_samples_through must also assert that the
Flyctl command includes the explicit '-n <samples>' flag to ensure
FlyctlSource.command preserves the samples contract; locate where the test
builds the SourceConfig and call build_source(...) (symbols: SourceConfig,
build_source, FlyctlSource) and add an assertion that source.command (or the
attribute/method that returns the command string) contains '-n 750' (or the
samples value from spec.options) before overriding source._runner and checking
capture behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CHANGELOG.md`:
- Around line 33-38: The release notes claim a warning triggers at "≤50% parsed"
but the implemented behavior in `paperbark monitor` is the format-mismatch case
only (conditions: captured >= 5 and parsed == 0) with a one-time stderr warning
and a per-iteration entry in `monitor.log`; update the CHANGELOG text to reflect
those exact conditions and the one-time stderr / per-iteration log behavior so
it matches the implementation (reference `paperbark monitor`, `monitor.log`, and
the conditions `captured >= 5` and `parsed == 0`).

---

Duplicate comments:
In `@CHANGELOG.md`:
- Around line 12-19: Update the changelog wording to reflect the actual sampling
implementation: replace claims that sampling is enforced by a bounded deque
inside capture() and that flyctl lacks a native -n option with a statement that
FlyctlSource.command emits "-n <samples>" (so sampling is performed by passing
-n to flyctl), and mirror this correction in the other occurrence referencing
the same behavior (the section that currently claims deque-based truncation).
- Around line 47-51: The CHANGELOG entry currently states "24 new tests covering
the cleanup pass..." which conflicts with the PR summary stating 21; verify the
actual number of new tests and update the changelog string to the verified count
(e.g., change "24 new tests" to "21 new tests" if PR summary is correct) so both
the CHANGELOG and the PR notes report the same, verified test count; edit the
literal text "24 new tests covering the cleanup pass (`zip`/`delete` modes,
retention window, idempotency, missing-root no-op, invalid mode, zip-content
verification), parse-rate warning, `samples` knob, `format_keys` validation,
search ANSI handling (default + opt-out + TOML drives + `--no-keep-ansi` clears
TOML)" to reflect the reconciled number.

In `@tests/test_dispatcher.py`:
- Around line 637-655: The test test_build_source_flyctl_threads_samples_through
must also assert that the Flyctl command includes the explicit '-n <samples>'
flag to ensure FlyctlSource.command preserves the samples contract; locate where
the test builds the SourceConfig and call build_source(...) (symbols:
SourceConfig, build_source, FlyctlSource) and add an assertion that
source.command (or the attribute/method that returns the command string)
contains '-n 750' (or the samples value from spec.options) before overriding
source._runner and checking capture behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 05dd0aa8-9c7e-491c-a26b-d4678a59583b

📥 Commits

Reviewing files that changed from the base of the PR and between a3b0466 and b8fa7c8.

📒 Files selected for processing (3)

CHANGELOG.md
tests/test_dispatcher.py
tests/test_sources.py

simonsmallchua · 2026-05-05T00:10:29Z

Triage of the latest review pass — all four comments rest on a stale walkthrough cached from the first commit on the branch. The implementation, CHANGELOG, and PR body are all consistent at the current head (b8fa7c8). Skipping all four with verification:

CHANGELOG L33-38 (warning threshold) — bot says the warning fires at "captured ≥5 and parsed == 0". That was the first commit's threshold (_PARSE_WARN_RATE = 0.0). The current code at dispatcher.py:555 has _PARSE_WARN_RATE = 0.5, raised in a3b0466 after the 0% threshold failed to trip live on hover-analysis (19/100 parsed). CHANGELOG correctly says ≤50%.
CHANGELOG L12-19 / L68-71 (flyctl -n <samples> claim) — bot says the implementation emits -n <samples>. That was the draft approach in the first commit on this branch; it broke captures live because -n is the flyctl short form of --no-tail. Fixed in 79f0411 to enforce the bound via deque(maxlen=samples) inside capture() — i.e., exactly what the CHANGELOG documents. FlyctlSource.command is ["flyctl", "logs", "-a", self.app, "--no-tail"] — no -n.
CHANGELOG L47-51 (24 vs 21 new tests) — verified count: git diff 75cd478 HEAD -- 'tests/*.py' | grep -c '^+def test_' returns 24. The number on 21 (or any other figure) appearing in the bot's auto-walkthrough is also a stale-cache artefact. CHANGELOG is correct.
tests/test_dispatcher.py:637-655 (assert -n <samples> in command) — would invert correctness. The flyctl source intentionally does not emit -n (would conflict with --no-tail short form); samples are enforced via deque trim inside capture(). The current test asserts len(list(source.capture())) == 750, which directly verifies the trimming contract. Adding assert "-n" in source.command would fail with the current command shape, which is the deliberate design.

No code or doc changes; the implementation is consistent with the description. Happy to take any of these forward if I'm misreading the request.

coderabbitai

♻️ Duplicate comments (1)

docs/ROADMAP.md (1)
24-24: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the documented test count to the current pytest total.

Line 24 still says 405 passing, but this PR’s stated head status is 407 passing. Please refresh this number from pytest -q (or pytest --collect-only -q) so the roadmap stays accurate.

Based on learnings: test-case counts in this repo must be sourced from pytest collection/results, not by grepping def test_, because parametrisation expands collected cases.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/ROADMAP.md` at line 24, Update the documented test count string "**405
passing**" in ROADMAP.md to match the current pytest total (e.g., "**407
passing**"); run pytest -q (or pytest --collect-only -q) to get the
authoritative collected/passed case count and replace the hard-coded number so
the roadmap reflects pytest's collected/summary count rather than a grep-based
estimate.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@docs/ROADMAP.md`:
- Line 24: Update the documented test count string "**405 passing**" in
ROADMAP.md to match the current pytest total (e.g., "**407 passing**"); run
pytest -q (or pytest --collect-only -q) to get the authoritative
collected/passed case count and replace the hard-coded number so the roadmap
reflects pytest's collected/summary count rather than a grep-based estimate.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 7fe77f70-e2d8-42d4-8c3a-63a487df42df

📥 Commits

Reviewing files that changed from the base of the PR and between b8fa7c8 and 3956a62.

📒 Files selected for processing (2)

README.md
docs/ROADMAP.md

Restore bash-parity regressions

7bd992c

coderabbitai Bot requested changes May 4, 2026

View reviewed changes

Fix samples knob and tighten warn threshold

79f0411

coderabbitai Bot requested changes May 4, 2026

View reviewed changes

Comment thread CHANGELOG.md

Comment thread CHANGELOG.md Outdated

Comment thread tests/test_dispatcher.py Outdated

Comment thread tests/test_sources.py Outdated

simonsmallchua added 2 commits May 5, 2026 10:01

Address coderabbit review feedback

a3b0466

Tighten review nits and CHANGELOG accuracy

b8fa7c8

coderabbitai Bot requested changes May 5, 2026

View reviewed changes

Comment thread CHANGELOG.md

coderabbitai Bot approved these changes May 5, 2026

View reviewed changes

Document format-layer scope

3956a62

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

Bump roadmap test count to 407

8d6ad03

simonsmallchua merged commit 11d3dff into main May 5, 2026
5 checks passed

Conversation

simonsmallchua commented May 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bash-parity regressions restored

Review fixes

Tests, lint, types

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

simonsmallchua commented May 5, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

simonsmallchua commented May 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 4, 2026 •

edited

Loading