fix(gc): add random delay to avoid a cluster of nodes run GC and reboot RPC services at the same time#6639
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a small randomized delay to the Snapshot GC scheduling check so each GC cycle samples a jitter and requires an extra epoch gap before running snapshot GC. Changes touch the scheduler logic, user docs, and changelog; references issue Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files
... and 2 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
e5a5c3b to
e278108
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/db/gc/snapshot.rs (1)
211-211: Consider logging the random delay for observability.The scheduled GC log message includes
snap_gc_interval_epochsbut not thegc_interval_random_delay_epochsthat was applied. Logging this value would help operators understand why different nodes trigger GC at different times.💡 Proposed enhancement
- tracing::info!(%car_db_head_epoch, %head_epoch, %network_head_epoch, %snap_gc_interval_epochs, "Snap GC scheduled"); + tracing::info!(%car_db_head_epoch, %head_epoch, %network_head_epoch, %snap_gc_interval_epochs, %gc_interval_random_delay_epochs, "Snap GC scheduled");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/db/gc/snapshot.rs` at line 211, The log line using tracing::info! currently emits car_db_head_epoch, head_epoch, network_head_epoch and snap_gc_interval_epochs but omits the actual random delay applied; update the tracing::info! call in snapshot.rs (the site that logs "Snap GC scheduled") to include gc_interval_random_delay_epochs (or whatever local variable holds the random delay) so operators can see the applied delay value; ensure the log fields include %gc_interval_random_delay_epochs alongside the existing %... fields and keep the message text unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/db/gc/snapshot.rs`:
- Line 211: The log line using tracing::info! currently emits car_db_head_epoch,
head_epoch, network_head_epoch and snap_gc_interval_epochs but omits the actual
random delay applied; update the tracing::info! call in snapshot.rs (the site
that logs "Snap GC scheduled") to include gc_interval_random_delay_epochs (or
whatever local variable holds the random delay) so operators can see the applied
delay value; ensure the log fields include %gc_interval_random_delay_epochs
alongside the existing %... fields and keep the message text unchanged.
e278108 to
a46482c
Compare
…ot RPC services at the same time
a46482c to
0c711ec
Compare
LesnyRumcajs
left a comment
There was a problem hiding this comment.
Should we add a note in the GC docs? And a changelog entry?
@LesnyRumcajs Fixed. |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
docs/docs/users/guides/gc.md (2)
8-8: Fix spacing in the default interval sentence.Minor formatting: add a space in “20160 epochs(7 days)”.
Proposed tweak
-The default GC interval is 20160 epochs(7 days). +The default GC interval is 20160 epochs (7 days).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/docs/users/guides/gc.md` at line 8, The sentence "The default GC interval is 20160 epochs(7 days)." has a missing space before the parenthesis; update the text to "The default GC interval is 20160 epochs (7 days)." and ensure the environment variable name FOREST_SNAPSHOT_GC_INTERVAL_EPOCHS remains unchanged.
9-9: Tighten grammar and keep the docs’ friendly tone.“avoid a cluster of nodes run GC” is ungrammatical. Consider a smoother, approachable phrasing.
Proposed tweak
-Note that, an extra random small delay is added to the GC interval on every GC cycle to avoid a cluster of nodes run GC and reboot RPC services at the same time. +Note: an extra small random delay is added to each GC cycle so a whole cluster doesn’t all run GC (and reboot RPC) at once.Based on learnings: “Maintain a playful, approachable tone in Forest project documentation (Markdown files) rather than strictly formal/corporate language.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/docs/users/guides/gc.md` at line 9, Fix the ungrammatical phrase in the sentence that reads "avoid a cluster of nodes run GC and reboot RPC services at the same time" by rewording it to something smooth and friendly (e.g., "avoid many nodes running GC and rebooting RPC services at the same time" or "prevent a cluster of nodes from all running GC and restarting RPC services simultaneously") while preserving the playful, approachable tone used elsewhere in the Forest docs; update the sentence in the GC guide (the line describing the extra random small delay added to the GC interval) to one of these clearer phrasings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@docs/docs/users/guides/gc.md`:
- Line 8: The sentence "The default GC interval is 20160 epochs(7 days)." has a
missing space before the parenthesis; update the text to "The default GC
interval is 20160 epochs (7 days)." and ensure the environment variable name
FOREST_SNAPSHOT_GC_INTERVAL_EPOCHS remains unchanged.
- Line 9: Fix the ungrammatical phrase in the sentence that reads "avoid a
cluster of nodes run GC and reboot RPC services at the same time" by rewording
it to something smooth and friendly (e.g., "avoid many nodes running GC and
rebooting RPC services at the same time" or "prevent a cluster of nodes from all
running GC and restarting RPC services simultaneously") while preserving the
playful, approachable tone used elsewhere in the Forest docs; update the
sentence in the GC guide (the line describing the extra random small delay added
to the GC interval) to one of these clearer phrasings.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
CHANGELOG.mddocs/docs/users/guides/gc.md
✅ Files skipped from review due to trivial changes (1)
- CHANGELOG.md
Summary of changes
Changes introduced in this pull request:
Reference issue to close (if applicable)
Closes #6594
Other information and links
Change checklist
Outside contributions
Summary by CodeRabbit