test: deflake logs -f follow tests and kubo restart-race detect window#78
Conversation
The logs -f rotation tests killed the CLI on a fixed 8s timer anchored to process spawn, so the budget for the awaited output had to share that window with CLI cold-start (oclif/pkc-js import). On a slow CI runner startup ate the budget and the process was killed before the second marker was emitted (macos failure: stdout had the initial line but not APPENDED_LINE). Replace the fixed window with a shared runFollowUntil() helper that kills as soon as the awaited output is observed (with an optional grace period and a generous 25s safety cap), decoupling correctness from startup latency. Tests now finish in ~1-5s locally instead of always 8s. Also widen the restart-race setup precondition (restarted-kubo detect) from 30s to 60s: it must absorb the injected 5s ready-delay plus restart under runner load. Ran 43.8s before failing on ubuntu CI; ~13s locally. Still well within the test's 120s cap. Refs #77
|
Warning Review limit reached
More reviews will be available in 28 minutes and 57 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR hardens two test suites to reduce CI timing flakiness: the kubo restart-race test gains a documented, expanded detection window, and three log-follow tests replace fixed-timeout kill logic with output-anchored termination via a new ChangesTest Timing Hardening
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The 'stops kubo when daemon exits during a restart cycle' test failed on ubuntu CI at line 418 (expect(kuboRestarted).toBe(true)) — the same restart-detect precondition class as the sibling test in daemon-kubo-restart-race.test.ts. Two problems found by reading the CI job log: 1. Diagnostics gap: the daemon DEBUG-log dump only fired when the LATER assertion (kuboStoppedAfterKill, line 451) failed, not the restart-detect at line 418 that actually fails. So the CI log contained only the bare assertion with no daemon timeline — couldn't see what kubo did. Extracted the dump into dumpDiagnostics() and call it on BOTH precondition failures. 2. Window too tight under CI contention: vitest runs test files in parallel forks (fileParallelism) and a 2-vCPU ubuntu runner spawns many kubo nodes concurrently (this test ran alongside daemon-kubo-restart-race.test.ts's 4 kubo nodes). Restart lands in ~13s locally but was observed taking >20s on CI. Widened the detect window 20s -> 45s and raised the test timeout 60s -> 120s to match the sibling test. If it still fails, dumpDiagnostics() now prints the daemon's kubo restart/kill timeline so we can distinguish contention from a real bug. Refs #77
The 'continues watching old file' test failed again on ubuntu CI, this time waiting the full 25s safety cap with APPENDED_LINE never appearing — so it is NOT a timing-window flake. The two sibling follow tests (which detect a NEW file via the 3s full-reread poll) passed fast; only this one, which appends to the file already being followed (surfaced by the incremental byte-offset poll), fails. Cannot reproduce locally even under CPU saturation (32 cores). Add targeted diagnostics that fire when APPENDED_LINE is missing, to pin down which of three causes it is on the next CI run: - append never reached disk (appendTriggered / appendError + on-disk content) - logs -f never detected it (on-disk has the line, captured stdout does not) - output truncated on SIGINT exit (timedOut + line on disk but not captured) Also surface appendFile rejections (previously swallowed) and have the helper report whether it hit the safety cap. Refs #77
Fixes the two flaky CI failures seen on master run 27133947713 (macos + ubuntu). Both tests pass locally; the failures were CI-runner-load timing races, confirmed by reading the job logs.
Changes
logs -ffollow tests (logs.test.ts) — macos failureThe three
bitsocial logs -f log file rotationtests killed the CLI on a fixed 8s timer anchored to process spawn. That window had to cover CLI cold-start (oclif/pkc-js import, several seconds) plus printing the initial marker, detecting the file change, and flushing the second marker. On a slow runner, startup ate the budget and the process was SIGINT'd before the awaited line was emitted — macos saw the initial line but neverAPPENDED_LINE.Replaced the inline duplicated promise logic with a shared
runFollowUntil()helper that:doneWhen), decoupling correctness from startup latency;graceMsso the--stdoutfilter test can still catch an erroneously-unfiltered stderr line before killing (negative assertion stays meaningful);testTimeoutso a genuine hang still fails as an assertion, not a timeout).These tests now finish in ~1–5s locally instead of always 8s, while tolerating up to 25s of startup slowness on CI.
kubo restart-race test (
daemon-kubo-restart-race.test.ts) — ubuntu failureThe failing line (
expect(kuboRestarted).toBe(true)) is a setup precondition, not the assertion under test: after/shutdown, the daemon must auto-restart kubo before we can verify it dies on SIGTERM. The 30s detect window was too tight given the test injectsPKC_CLI_TEST_IPFS_READY_DELAY_MS=5000and the runner may be loaded with parallel kubo nodes. It ran 43.8s before failing on ubuntu CI but lands in ~13s locally. Widened to 60s — still well within the test's 120s cap.No production code changed; test-harness robustness only.
Verification
Both files pass locally after the change:
logs.test.ts: 29/29 passed (file 51s → 36s)daemon-kubo-restart-race.test.ts: 4/4 passedRefs #77
Summary by CodeRabbit