Restore bash-parity regressions and review fixes#15
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR adds monitor run‑dir cleanup/rotation; per‑source ChangesMonitor Cleanup and Rotation
Source Configuration and Log Parsing Enhancements
Search ANSI Escape Handling
Probe Label Documentation Update
Possibly related PRs
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/CONFIG.md (1)
112-126:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDocument
--keep-ansiand its matching[search]keyLine 125-Line 126 list search flags, but the new ANSI-preservation option is not documented, and the
[search]key table does not show a corresponding boolean key. This leaves the config reference incomplete for a shipped flag.As per coding guidelines,
docs/CONFIG.md: "Maintaindocs/CONFIG.mdwith TOML reference documenting every key".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/CONFIG.md` around lines 112 - 126, Add documentation for the ANSI-preservation option: add a boolean key `keep_ansi` to the `[search]` TOML table (default `false`) with a one-line description like "Preserve ANSI color codes in matches", and update the CLI flags list to include `--keep-ansi` (and note its effect or that it's mutually exclusive with any `--strip-ansi` if applicable) so the `[search]` table and the flags section both document the new option.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/CONFIG.md`:
- Around line 251-253: The docs show the run-dir placeholder `<HHMMSSZ>_iter<N>`
which doesn't match the dispatcher/test contract; update the examples and any
occurrences to the exact UTC timestamp shape `YYYYMMDDTHHMMSSZ_iter<N>` (e.g.,
use the full date+time token), ensuring the README entries that currently list
`<HHMMSSZ>_iter<N>` are replaced and any explanatory text clarifies the exact
format expected by the dispatcher/test contract.
In `@docs/ROADMAP.md`:
- Around line 10-13: Update the v0.1.1 roadmap blurb to list the omitted
features: mention the new format_keys option for per-source field mapping, the
format-mismatch warnings feature, the probe heading rename to "External errors
and timeouts", and the ANSI-stripping behavior change for paperbark search (add
the --keep-ansi flag to preserve ANSI). Reference the existing v0.1.1 sentence
and append these features succinctly so the release summary includes
format_keys, format-mismatch warnings, the renamed probe heading, and the
paperbark search --keep-ansi behavior.
- Line 17: Update the test count text in the ROADMAP entry that currently reads
"**Tests:** 402 passing across 27 test modules; CI has been green on" to reflect
the actual number of tests (change 402 to 342) and ensure any nearby mentions of
test counts are consistent; edit the line containing that exact sentence to
replace the incorrect number.
In `@src/paperbark/dispatcher.py`:
- Around line 477-518: In _zip_rotate_run, avoid removing raw_dir immediately
after shutil.make_archive succeeds; instead, after calling
shutil.make_archive(base, "zip", root_dir=raw_dir.parent, base_dir="raw"),
verify the created zip_path (raw_dir.parent / "raw.zip") is a valid, readable
archive (e.g., attempt opening it with zipfile.ZipFile and calling .testzip() or
reading the file header) and only call shutil.rmtree(raw_dir) when that
verification passes; on verification failure log the error and do not delete the
source directory (keep existing exception handling for make_archive and unlink
failures).
In `@src/paperbark/search.py`:
- Around line 202-203: The CLI-only keep_ansi handling must be plumbed through
the search config and parser so TOML [search].keep_ansi can be used and CLI
still overrides it: add a keep_ansi field to SearchConfig (bool, default False),
read the TOML value in _parse_search into that field, and modify the code path
that currently does keep_ansi = bool(getattr(args, "keep_ansi", False)) to
instead compute keep_ansi = args.keep_ansi if hasattr(args, "keep_ansi") and
args.keep_ansi is not None else parsed_search_config.keep_ansi; ensure the
computed keep_ansi (bool) is passed into search_runs and update starter/docs
surfaces to document [search].keep_ansi with the same semantics.
In `@tests/test_dispatcher.py`:
- Around line 757-770: Extend the
test_cleanup_zip_archives_raw_and_strips_iter_files test to open the produced
raw.zip (created by cleanup_old_runs/_zip_rotate_run) and assert the archived
file list contains the raw artefacts that were removed from disk (e.g., entries
under raw/ such as the iter JSON and CSV filenames) while the originals on disk
are gone and summary.md remains on disk; use the same generated filenames (from
_seed_old_run) to check presence inside the zip and absence on the filesystem so
the test verifies actual zip contents, not just existence of raw.zip.
- Around line 739-740: The assertion is order-dependent because it checks only
the first bucket from component_counts; instead, ensure the test asserts that at
least one bucket contains "api" by scanning all values in component_counts
(e.g., use any(...) or set union over component_counts.values()) so the
assertion is stable regardless of ordering; update the assertion that currently
reads using next(iter(component_counts.values())) to a check over all values of
component_counts (referencing the variables component_counts and summary).
- Line 646: The test's assertion is too loose because it only checks that "750"
appears in source.command; change it to assert the exact flag/value pairing by
checking for "-n 750" (or a regex like r"\b-n\s+750\b") in source.command so the
samples count is verified explicitly; update the assertion referring to
source.command accordingly.
---
Outside diff comments:
In `@docs/CONFIG.md`:
- Around line 112-126: Add documentation for the ANSI-preservation option: add a
boolean key `keep_ansi` to the `[search]` TOML table (default `false`) with a
one-line description like "Preserve ANSI color codes in matches", and update the
CLI flags list to include `--keep-ansi` (and note its effect or that it's
mutually exclusive with any `--strip-ansi` if applicable) so the `[search]`
table and the flags section both document the new option.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 45b7fe35-c91b-4a9d-943b-6a5d9eef58ad
📒 Files selected for processing (19)
CHANGELOG.mddocs/CONFIG.mddocs/PROBES.mddocs/ROADMAP.mddocs/SOURCES.mdsrc/paperbark/cli.pysrc/paperbark/config.pysrc/paperbark/dispatcher.pysrc/paperbark/init.pysrc/paperbark/iteration.pysrc/paperbark/probes/__init__.pysrc/paperbark/search.pysrc/paperbark/sources/flyctl.pytests/test_config.pytests/test_dispatcher.pytests/test_probes_config.pytests/test_probes_regex_bucket.pytests/test_search.pytests/test_sources.py
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CHANGELOG.md`:
- Line 46: The Unreleased changelog line "New tests: 20 covering the cleanup
pass (`zip`/`delete` modes," is inconsistent with the PR summary which reports
+21 tests; update that string in CHANGELOG.md to "New tests: 21 covering the
cleanup pass (`zip`/`delete` modes," (or adjust the PR summary instead) so the
Unreleased notes and PR description match.
- Around line 14-16: Update the CHANGELOG entry so it reflects that sampling is
performed by passing a `-n <samples>` argument from FlyctlSource.command rather
than an internal bounded deque in capture(); locate the text referring to
`flyctl -n`/`--no-tail`, `capture()`, and the bounded `deque` and change it to
state that the implementation now emits `-n <samples>` via FlyctlSource.command
to perform sampling, removing the incorrect claim about enforcing the bound
inside capture() with a deque.
In `@tests/test_dispatcher.py`:
- Around line 840-848: The test uses a finite iterator mono_seq for the
monotonic parameter which can raise StopIteration if the loop calls monotonic
more times; replace mono_seq with the reusable helper _FakeMonotonic and pass
_FakeMonotonic(start=0.0, step=0.1) (or equivalent args used elsewhere) into
run_monitor_loop's monotonic kwarg so the monitor loop gets a stable,
non-exhausting monotonic source; update the test invocation that currently sets
monotonic=lambda: next(mono_seq) to use _FakeMonotonic instead and keep the rest
of the run_monitor_loop call the same.
In `@tests/test_sources.py`:
- Around line 60-62: The test name claims it covers both zero and negative
samples but only checks zero; update
test_flyctl_rejects_zero_or_negative_samples to either parametrize over samples
values (e.g., 0 and -1) and assert both raise ValueError matching "samples must
be > 0", or if you prefer a narrower check, rename the test to
test_flyctl_rejects_zero_samples; locate the instantiation of FlyctlSource in
the test and add the additional negative sample case (or rename the test
function accordingly).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 03762036-2370-4c51-af9e-1ba5d1917892
📒 Files selected for processing (5)
CHANGELOG.mdsrc/paperbark/dispatcher.pysrc/paperbark/sources/flyctl.pytests/test_dispatcher.pytests/test_sources.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
CHANGELOG.md (2)
12-19:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSampling mechanism is documented incorrectly in two places.
These entries still say sampling is enforced via a bounded
dequeand thatflyctlhas no native-npath, but this PR’s stated behaviour is thatFlyctlSource.commandemits-n <samples>. Please align both sections to avoid publishing the wrong contract.Also applies to: 68-71
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CHANGELOG.md` around lines 12 - 19, Update the changelog wording to reflect the actual sampling implementation: replace claims that sampling is enforced by a bounded deque inside capture() and that flyctl lacks a native -n option with a statement that FlyctlSource.command emits "-n <samples>" (so sampling is performed by passing -n to flyctl), and mirror this correction in the other occurrence referencing the same behavior (the section that currently claims deque-based truncation).
47-51:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winNew test count appears inconsistent with the PR summary.
The changelog says 24 new tests, while the PR notes say 21. Please reconcile to a single verified number before merge.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CHANGELOG.md` around lines 47 - 51, The CHANGELOG entry currently states "24 new tests covering the cleanup pass..." which conflicts with the PR summary stating 21; verify the actual number of new tests and update the changelog string to the verified count (e.g., change "24 new tests" to "21 new tests" if PR summary is correct) so both the CHANGELOG and the PR notes report the same, verified test count; edit the literal text "24 new tests covering the cleanup pass (`zip`/`delete` modes, retention window, idempotency, missing-root no-op, invalid mode, zip-content verification), parse-rate warning, `samples` knob, `format_keys` validation, search ANSI handling (default + opt-out + TOML drives + `--no-keep-ansi` clears TOML)" to reflect the reconciled number.tests/test_dispatcher.py (1)
637-655:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReinstate explicit
-n <samples>command assertion for Flyctl parity.At Line 638, the docstring says sampling is not via a Flyctl flag, but this PR’s contract requires
FlyctlSource.commandto emit-n <samples>. The current test can still pass if-ndisappears because the mocked runner ignores command args and truncation happens post-capture.Suggested tightening
def test_build_source_flyctl_threads_samples_through() -> None: - """``samples`` is enforced inside ``capture()`` via a bounded deque (the - bash dispatcher's ``| tail -n <samples>`` analogue), not via a flyctl - flag. The attribute round-trips so a probe of ``source.samples`` confirms - the TOML value reached the source instance. - """ + """``samples`` round-trips from config and is reflected in Flyctl args.""" @@ source = build_source(spec) assert isinstance(source, FlyctlSource) assert source.samples == 750 + assert "-n" in source.command + assert source.command[source.command.index("-n") + 1] == "750"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_dispatcher.py` around lines 637 - 655, The test test_build_source_flyctl_threads_samples_through must also assert that the Flyctl command includes the explicit '-n <samples>' flag to ensure FlyctlSource.command preserves the samples contract; locate where the test builds the SourceConfig and call build_source(...) (symbols: SourceConfig, build_source, FlyctlSource) and add an assertion that source.command (or the attribute/method that returns the command string) contains '-n 750' (or the samples value from spec.options) before overriding source._runner and checking capture behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CHANGELOG.md`:
- Around line 33-38: The release notes claim a warning triggers at "≤50% parsed"
but the implemented behavior in `paperbark monitor` is the format-mismatch case
only (conditions: captured >= 5 and parsed == 0) with a one-time stderr warning
and a per-iteration entry in `monitor.log`; update the CHANGELOG text to reflect
those exact conditions and the one-time stderr / per-iteration log behavior so
it matches the implementation (reference `paperbark monitor`, `monitor.log`, and
the conditions `captured >= 5` and `parsed == 0`).
---
Duplicate comments:
In `@CHANGELOG.md`:
- Around line 12-19: Update the changelog wording to reflect the actual sampling
implementation: replace claims that sampling is enforced by a bounded deque
inside capture() and that flyctl lacks a native -n option with a statement that
FlyctlSource.command emits "-n <samples>" (so sampling is performed by passing
-n to flyctl), and mirror this correction in the other occurrence referencing
the same behavior (the section that currently claims deque-based truncation).
- Around line 47-51: The CHANGELOG entry currently states "24 new tests covering
the cleanup pass..." which conflicts with the PR summary stating 21; verify the
actual number of new tests and update the changelog string to the verified count
(e.g., change "24 new tests" to "21 new tests" if PR summary is correct) so both
the CHANGELOG and the PR notes report the same, verified test count; edit the
literal text "24 new tests covering the cleanup pass (`zip`/`delete` modes,
retention window, idempotency, missing-root no-op, invalid mode, zip-content
verification), parse-rate warning, `samples` knob, `format_keys` validation,
search ANSI handling (default + opt-out + TOML drives + `--no-keep-ansi` clears
TOML)" to reflect the reconciled number.
In `@tests/test_dispatcher.py`:
- Around line 637-655: The test test_build_source_flyctl_threads_samples_through
must also assert that the Flyctl command includes the explicit '-n <samples>'
flag to ensure FlyctlSource.command preserves the samples contract; locate where
the test builds the SourceConfig and call build_source(...) (symbols:
SourceConfig, build_source, FlyctlSource) and add an assertion that
source.command (or the attribute/method that returns the command string)
contains '-n 750' (or the samples value from spec.options) before overriding
source._runner and checking capture behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 05dd0aa8-9c7e-491c-a26b-d4678a59583b
📒 Files selected for processing (3)
CHANGELOG.mdtests/test_dispatcher.pytests/test_sources.py
|
Triage of the latest review pass — all four comments rest on a stale walkthrough cached from the first commit on the branch. The implementation, CHANGELOG, and PR body are all consistent at the current head (b8fa7c8). Skipping all four with verification:
No code or doc changes; the implementation is consistent with the description. Happy to take any of these forward if I'm misreading the request. |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
docs/ROADMAP.md (1)
24-24:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate the documented test count to the current pytest total.
Line 24 still says 405 passing, but this PR’s stated head status is 407 passing. Please refresh this number from
pytest -q(orpytest --collect-only -q) so the roadmap stays accurate.Based on learnings: test-case counts in this repo must be sourced from pytest collection/results, not by grepping
def test_, because parametrisation expands collected cases.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/ROADMAP.md` at line 24, Update the documented test count string "**405 passing**" in ROADMAP.md to match the current pytest total (e.g., "**407 passing**"); run pytest -q (or pytest --collect-only -q) to get the authoritative collected/passed case count and replace the hard-coded number so the roadmap reflects pytest's collected/summary count rather than a grep-based estimate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@docs/ROADMAP.md`:
- Line 24: Update the documented test count string "**405 passing**" in
ROADMAP.md to match the current pytest total (e.g., "**407 passing**"); run
pytest -q (or pytest --collect-only -q) to get the authoritative
collected/passed case count and replace the hard-coded number so the roadmap
reflects pytest's collected/summary count rather than a grep-based estimate.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 7fe77f70-e2d8-42d4-8c3a-63a487df42df
📒 Files selected for processing (2)
README.mddocs/ROADMAP.md
Summary
Bundles every issue surfaced by the v0.1.0 review and the bash-parity audit against
~/Documents/GitHub/hover/scripts/logs.sh. All changes land behind passing CI; no schema breaks beyond reverting the v0.1.0 filename-pattern drift (which itself was the contract break — see point 4).Bash-parity regressions restored
flat_csv_pathtosummarise_log_file, so<YYYYMMDDTHHMMSSZ>_iter<N>.csvlands next to the JSON for every iteration. v0.1.0 silently dropped it.[monitor]keyscleanup_enabled(defaulttrue),cleanup_days(default1),cleanup_mode(default"zip"), plus CLI flags--cleanup/--no-cleanup,--cleanup-days N,--cleanup-mode {zip,delete}."zip"archives each<app>/raw/to a siblingraw.zip(verified readable viatestzip()before the original tree is removed) and strips per-iter*_iter*.{json,csv}files; summaries and time-series CSVs are preserved. Pass runs once at loop start.samplesline cap on flyctl —[[sources]].samples(default400, matchesreference/logs.sh's--samples).flyctl logsitself has no native flag for this (-nis the short form of--no-tail), so the bound is enforced insideFlyctlSource.capture()via adeque(maxlen=samples)— same behaviour as the bash dispatcher'sflyctl logs … | tail -n <samples>pipe. An earlier draft of this PR tried to pass-n <samples>directly to flyctl, which silently broke captures (flyctl interpreted-nas--no-tailand400as a positional arg → 0 lines per iter); fixed in 79f0411 before merge after live smoke against the hover Fly.io app caught the regression.<YYYYMMDDTHHMMSSZ>_iter<N>.{log,json,csv}(timestamp first, no zero-padded iter index), as originally documented. v0.1.0 usediter_<NNNN>_<YYYYMMDDTHHMMSSZ>, which broke the run-dir contract and any tooling sorting by iteration filename.Review fixes
[[sources]].format_keystable (flyctl) overrides JSON key tuples for canonical fields (timestamp,level,message,component). Each value may be a string or list of strings; unknown field names are rejected. Threaded throughFlyctlSource → capture_iteration → summarise_log_file → summarise_lines. Non-JSON formats (regex presets) remain on the v0.2 list.monitor.logrecords the failure every iteration and stderr surfaces a one-time warning per source per run. Closes the silent-failure case where every probe printed "(no matches)" with no diagnostic. Live-tested againsthover-analysis(19/100 parsed → fired).Database / external→External errors and timeouts. Toggle key staysdatabase(config back-compat); the heading now matches what the default pattern set actually catches (generic Go context timeouts and outbound HTTP failures, not just DB driver errors). Pattern set unchanged; users who want a DB-only matcher override under[probes.patterns].database.paperbark searchnow strips ANSI escapes from matched lines by default (Fly's coloured timestamp prefix becomes noise when piped). New--keep-ansi(BooleanOptionalAction, with--no-keep-ansito clear) flag preserves them for TTY-aware viewers; corresponding[search].keep_ansiTOML key for full flag↔TOML parity per CLAUDE.md.Tests, lint, types
samplesknob,format_keysvalidation, search ANSI handling (default + opt-out + TOML-drives +--no-keep-ansi-clears-TOML), zip archive content, and config-loader cleanup-field validation.ruff check,ruff format --check,mypy, andpip-auditall clean.--keep-ansi, and cleanupzip/deletemodes against seeded older run dirs.Test plan
mypy src testsclean across 60 source files.pre-commit run --all-filesclean (ruff + ruff-format + prettier).pip-audit -r <(uv export --no-emit-project)clean.cleanup/_zip_rotate_runpaths once more — they're the largest new code path and the ones with the most filesystem side effects.Closes the bash-parity audit punch list and the v0.1.0 review findings (items #1–8 from prior review).
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests