Skip to content

test: deflake logs -f follow tests and kubo restart-race detect window#78

Merged
Rinse12 merged 3 commits into
masterfrom
test/harden-flaky-follow-restart-tests
Jun 8, 2026
Merged

test: deflake logs -f follow tests and kubo restart-race detect window#78
Rinse12 merged 3 commits into
masterfrom
test/harden-flaky-follow-restart-tests

Conversation

@Rinse12

@Rinse12 Rinse12 commented Jun 8, 2026

Copy link
Copy Markdown
Member

Fixes the two flaky CI failures seen on master run 27133947713 (macos + ubuntu). Both tests pass locally; the failures were CI-runner-load timing races, confirmed by reading the job logs.

Changes

logs -f follow tests (logs.test.ts) — macos failure

The three bitsocial logs -f log file rotation tests killed the CLI on a fixed 8s timer anchored to process spawn. That window had to cover CLI cold-start (oclif/pkc-js import, several seconds) plus printing the initial marker, detecting the file change, and flushing the second marker. On a slow runner, startup ate the budget and the process was SIGINT'd before the awaited line was emitted — macos saw the initial line but never APPENDED_LINE.

Replaced the inline duplicated promise logic with a shared runFollowUntil() helper that:

  • kills the process as soon as the awaited output is observed (doneWhen), decoupling correctness from startup latency;
  • supports an optional graceMs so the --stdout filter test can still catch an erroneously-unfiltered stderr line before killing (negative assertion stays meaningful);
  • has a generous 25s safety cap (under the 30s testTimeout so a genuine hang still fails as an assertion, not a timeout).

These tests now finish in ~1–5s locally instead of always 8s, while tolerating up to 25s of startup slowness on CI.

kubo restart-race test (daemon-kubo-restart-race.test.ts) — ubuntu failure

The failing line (expect(kuboRestarted).toBe(true)) is a setup precondition, not the assertion under test: after /shutdown, the daemon must auto-restart kubo before we can verify it dies on SIGTERM. The 30s detect window was too tight given the test injects PKC_CLI_TEST_IPFS_READY_DELAY_MS=5000 and the runner may be loaded with parallel kubo nodes. It ran 43.8s before failing on ubuntu CI but lands in ~13s locally. Widened to 60s — still well within the test's 120s cap.

No production code changed; test-harness robustness only.

Verification

Both files pass locally after the change:

  • logs.test.ts: 29/29 passed (file 51s → 36s)
  • daemon-kubo-restart-race.test.ts: 4/4 passed

Refs #77

Summary by CodeRabbit

  • Tests
    • Improved daemon shutdown test reliability by expanding timing detection windows to reduce CI flakiness
    • Enhanced log-following tests with output-based detection mechanisms for better test robustness

The logs -f rotation tests killed the CLI on a fixed 8s timer anchored to
process spawn, so the budget for the awaited output had to share that window
with CLI cold-start (oclif/pkc-js import). On a slow CI runner startup ate the
budget and the process was killed before the second marker was emitted (macos
failure: stdout had the initial line but not APPENDED_LINE).

Replace the fixed window with a shared runFollowUntil() helper that kills as
soon as the awaited output is observed (with an optional grace period and a
generous 25s safety cap), decoupling correctness from startup latency. Tests
now finish in ~1-5s locally instead of always 8s.

Also widen the restart-race setup precondition (restarted-kubo detect) from 30s
to 60s: it must absorb the injected 5s ready-delay plus restart under runner
load. Ran 43.8s before failing on ubuntu CI; ~13s locally. Still well within
the test's 120s cap.

Refs #77
@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@Rinse12, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 28 minutes and 57 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ae191b21-9e63-41ae-85c6-16e8b8444fcb

📥 Commits

Reviewing files that changed from the base of the PR and between 89371c6 and c26e21f.

📒 Files selected for processing (2)
  • test/cli/daemon.test.ts
  • test/cli/logs.test.ts
📝 Walkthrough

Walkthrough

This PR hardens two test suites to reduce CI timing flakiness: the kubo restart-race test gains a documented, expanded detection window, and three log-follow tests replace fixed-timeout kill logic with output-anchored termination via a new runFollowUntil helper.

Changes

Test Timing Hardening

Layer / File(s) Summary
Kubo Restart Detection Window Expansion
test/cli/daemon-kubo-restart-race.test.ts
Adds explanatory comment describing the /shutdown precondition and restart timing requirements, then increases the waitForCondition timeout for detecting kubo restart from 30s to 60s to reduce slow-CI flakiness.
Output-Anchored Log-Follow Test Helper
test/cli/logs.test.ts
Introduces runFollowUntil helper to spawn bitsocial logs -f and monitor accumulated stdout/stderr, terminating via SIGINT when a doneWhen predicate is satisfied (with optional graceMs and maxMs safety). Replaces fixed 8s timeouts in three log-follow tests: the first rotation test now detects "NEW_FILE_MARKER" in stdout and "switched to new log file" in stderr; the stdout-filtered rotation test watches for "new stdout msg" with a grace period to catch stderr leaks; the file-append test terminates when "APPENDED_LINE" appears.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

  • bitsocialnet/bitsocial-cli#77: The changes implement the exact fixes proposed: expanding the restart-detection timeout and replacing fixed-delay log-follow kill timers with output-anchored runFollowUntil completion.
  • bitsocialnet/bitsocial-cli#75: The runFollowUntil refactoring directly addresses the Windows CI flakiness by driving subprocess shutdown off observed output rather than a fixed wall-clock deadline.

Possibly related PRs

  • bitsocialnet/bitsocial-cli#43: Updates test/cli/logs.test.ts log-follow tests using output-anchored completion, exercising the follow-mode file-reading behavior in src/cli/commands/logs.ts.
  • bitsocialnet/bitsocial-cli#71: Both PRs modify test/cli/daemon-kubo-restart-race.test.ts kubo restart/shutdown timing for issue #70; the main PR's expanded restart-detection window builds on the regression test coverage from the retrieved PR.
  • bitsocialnet/bitsocial-cli#41: The main PR updates test/cli/logs.test.ts to follow bitsocial logs -f by observing output markers, complementing the retrieved PR's watchFile → userspace polling implementation change.

Poem

🐰 A log file watched by timing's cruel hand,
Now springs to life on markers it can understand.
No more the clock; let output be the guide—
CI flakes flee fast when polls turn the tide!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the two main changes: deflaking the logs -f follow tests and increasing the kubo restart-race detection window, matching the PR objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/harden-flaky-follow-restart-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Rinse12 added 2 commits June 8, 2026 11:59
The 'stops kubo when daemon exits during a restart cycle' test failed on
ubuntu CI at line 418 (expect(kuboRestarted).toBe(true)) — the same
restart-detect precondition class as the sibling test in
daemon-kubo-restart-race.test.ts.

Two problems found by reading the CI job log:

1. Diagnostics gap: the daemon DEBUG-log dump only fired when the LATER
   assertion (kuboStoppedAfterKill, line 451) failed, not the restart-detect
   at line 418 that actually fails. So the CI log contained only the bare
   assertion with no daemon timeline — couldn't see what kubo did. Extracted
   the dump into dumpDiagnostics() and call it on BOTH precondition failures.

2. Window too tight under CI contention: vitest runs test files in parallel
   forks (fileParallelism) and a 2-vCPU ubuntu runner spawns many kubo nodes
   concurrently (this test ran alongside daemon-kubo-restart-race.test.ts's 4
   kubo nodes). Restart lands in ~13s locally but was observed taking >20s on
   CI. Widened the detect window 20s -> 45s and raised the test timeout
   60s -> 120s to match the sibling test.

If it still fails, dumpDiagnostics() now prints the daemon's kubo
restart/kill timeline so we can distinguish contention from a real bug.

Refs #77
The 'continues watching old file' test failed again on ubuntu CI, this time
waiting the full 25s safety cap with APPENDED_LINE never appearing — so it is
NOT a timing-window flake. The two sibling follow tests (which detect a NEW
file via the 3s full-reread poll) passed fast; only this one, which appends to
the file already being followed (surfaced by the incremental byte-offset poll),
fails. Cannot reproduce locally even under CPU saturation (32 cores).

Add targeted diagnostics that fire when APPENDED_LINE is missing, to pin down
which of three causes it is on the next CI run:
  - append never reached disk      (appendTriggered / appendError + on-disk content)
  - logs -f never detected it      (on-disk has the line, captured stdout does not)
  - output truncated on SIGINT exit (timedOut + line on disk but not captured)

Also surface appendFile rejections (previously swallowed) and have the helper
report whether it hit the safety cap.

Refs #77
@Rinse12 Rinse12 merged commit 9767222 into master Jun 8, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant