Skip to content

Conversation

@hanabi1224
Copy link
Contributor

@hanabi1224 hanabi1224 commented Nov 21, 2025

Summary of changes

Changes introduced in this pull request:

Reference issue to close (if applicable)

Closes #6259

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Enhanced error handling in daemon startup to gracefully log warnings instead of hard failures
    • Enabled actor loading in stateless node mode for proper system initialization

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 21, 2025

Walkthrough

The pull request removes the --skip-load-actors flag from Forest node startup in test scripts and adds conditional guards in the daemon startup logic to prevent calling maybe_rewind_heaviest_tipset when in stateless mode or when actor loading is skipped. Error handling for the rewind call is also changed from fatal to non-fatal logging.

Changes

Cohort / File(s) Summary
Test script flag removals
scripts/tests/calibnet_stateless_rpc_check.sh, scripts/tests/harness.sh
Removed --skip-load-actors flag from Forest binary invocations in node startup commands
Daemon startup guard logic
src/daemon/mod.rs
Added conditional guards to maybe_rewind_heaviest_tipset call to skip execution when stateless mode is enabled or skip_load_actors is true; changed error handling from propagating to logging as warning

Sequence Diagram(s)

sequenceDiagram
    participant Main as Forest Main
    participant Daemon as Daemon
    participant Rewind as maybe_rewind_heaviest_tipset
    participant Store as Actor Store

    Main->>Daemon: start_services()
    
    rect rgb(240, 248, 255)
    Note over Daemon: NEW: Check stateless && skip_load_actors flags
    end
    
    alt NOT (stateless OR skip_load_actors)
        Daemon->>Rewind: Call rewind
        alt Success
            Rewind->>Store: Access actor data
            Rewind-->>Daemon: Success
        else Error (e.g., missing actors)
            rect rgb(255, 240, 240)
            Note over Daemon: NEW: Log warning instead of fail
            end
            Rewind-->>Daemon: Error
            Daemon->>Daemon: Log error as warning
            Daemon-->>Main: Continue
        end
    else Stateless OR skip_load_actors
        rect rgb(255, 250, 205)
        Note over Daemon: NEW: Skip rewind call
        end
        Daemon-->>Main: Continue
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • The test script changes are purely cosmetic flag removal with consistent patterns across files.
  • The daemon logic change is straightforward: adding guards and converting an error handler from fatal to non-fatal.
  • Specific areas for attention:
    • Verify that the conditional guards (!stateless && !skip_load_actors) correctly reflect the intended behavior when actors are unavailable.
    • Confirm that converting the rewind error to a warning is the appropriate resolution and doesn't mask legitimate issues.
    • Validate that removing the flag from test scripts won't cause unintended side effects in the stateless test scenarios.

Possibly related PRs

Suggested reviewers

  • LesnyRumcajs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the primary fix: addressing an error caused by the --skip-load-actors flag.
Linked Issues check ✅ Passed The PR removes the --skip-load-actors flag from startup commands and modifies rewind logic to skip execution when skip_load_actors is set, directly addressing issue #6259's requirement to fix Forest's failure with this flag.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the --skip-load-actors error; modifications are limited to relevant test scripts and daemon initialization logic.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hm/fix-error-in-skip-load-actors

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 264c99b and a686bca.

📒 Files selected for processing (3)
  • scripts/tests/calibnet_stateless_rpc_check.sh (1 hunks)
  • scripts/tests/harness.sh (1 hunks)
  • src/daemon/mod.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5886
File: src/daemon/db_util.rs:236-259
Timestamp: 2025-08-11T14:00:47.060Z
Learning: In Forest's snapshot import (`src/daemon/db_util.rs`), when F3 CID is present in snapshot metadata but the actual F3 data is missing, this should cause a hard error as it indicates snapshot corruption. Only errors during the F3 import process itself (after data is successfully retrieved) should be non-fatal and logged.
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6057
File: src/cli/subcommands/f3_cmd.rs:0-0
Timestamp: 2025-09-09T10:37:17.947Z
Learning: hanabi1224 prefers having default timeouts (like 10m for --no-progress-timeout) to prevent commands from hanging indefinitely, even when the timeout flag isn't explicitly provided by users. This fail-fast approach is preferred over requiring explicit flag usage.
📚 Learning: 2025-09-16T12:55:26.955Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6079
File: src/chain_sync/metrics.rs:6-7
Timestamp: 2025-09-16T12:55:26.955Z
Learning: HEAD_EPOCH references in shell scripts (like scripts/tests/calibnet_eth_mapping_check.sh) that extract data from `forest-cli info show` are unrelated to Rust metrics constants with the same name and should not be flagged when metrics cleanup is performed.

Applied to files:

  • scripts/tests/calibnet_stateless_rpc_check.sh
  • scripts/tests/harness.sh
📚 Learning: 2025-08-07T09:37:03.079Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5886
File: scripts/tests/calibnet_eth_mapping_check.sh:49-49
Timestamp: 2025-08-07T09:37:03.079Z
Learning: In Forest project scripts, `forest_wait_api` (called within `forest_init`) ensures basic RPC service readiness, while `forest_wait_for_healthcheck_ready` performs additional comprehensive checks including DB backfill completion. When using `--backfill-db` flag, basic RPC operations can proceed after `forest_init`, but operations requiring complete DB backfill should wait for `forest_wait_for_healthcheck_ready`.

Applied to files:

  • scripts/tests/harness.sh
🧬 Code graph analysis (1)
src/daemon/mod.rs (3)
src/tool/subcommands/api_cmd/test_snapshot.rs (1)
  • ctx (131-182)
src/tool/subcommands/api_cmd/generate_test_snapshot.rs (1)
  • ctx (104-157)
src/rpc/methods/sync.rs (1)
  • ctx (177-252)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: tests
  • GitHub Check: tests-release
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: Build Ubuntu
  • GitHub Check: All lint checks
🔇 Additional comments (3)
scripts/tests/harness.sh (1)

89-89: LGTM - Redundant flag removed correctly.

The removal of --skip-load-actors is correct since --stateless implies --skip-load-actors (as noted in past review comments). This eliminates redundancy and aligns with the daemon's updated logic that guards actor loading when in stateless mode.

scripts/tests/calibnet_stateless_rpc_check.sh (1)

13-13: LGTM - Consistent redundancy removal.

This change mirrors the update in scripts/tests/harness.sh, removing the redundant --skip-load-actors flag since --stateless mode inherently skips actor loading.

src/daemon/mod.rs (1)

598-603: Verify that non-fatal error handling for state validation is appropriate in this context.

The review concern is valid. The function's documentation indicates it validates that the state tree exists in the blockstore and the actor bundle version matches configuration. If this validation fails, converting the error to a warning contradicts the codebase's philosophy on state errors (per learnings: state corruption should be fatal, not logged).

Key inconsistencies identified:

  • The guard conditions prevent calling the function in stateless mode—correct. But they don't justify ignoring validation failures when the function IS called.
  • All other state_manager calls in daemon startup use ? (fail-fast), except this one and non-critical background cache warmup.
  • No documentation or comments explain why this particular state validation error is recoverable.
  • No downstream state validation found to catch issues that rewind failures might mask.

The behavior change warrants either: (1) propagating the error with ?, (2) adding explicit comments justifying why state validation failures are non-fatal, or (3) implementing additional validation to ensure daemon state remains consistent after rewind failure.


Comment @coderabbitai help to get the list of available commands and usage tips.

pkill -9 forest || true
local filter_list=$1

$FOREST_PATH --chain calibnet --encrypt-keystore false --log-dir "$LOG_DIRECTORY" --skip-load-actors --stateless --rpc-filter-list "$filter_list" &
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--stateless implies --skip-load-actors

@hanabi1224 hanabi1224 marked this pull request as ready for review November 21, 2025 10:20
@hanabi1224 hanabi1224 requested a review from a team as a code owner November 21, 2025 10:20
@LesnyRumcajs LesnyRumcajs added this pull request to the merge queue Nov 21, 2025
Merged via the queue into main with commit d1ec2c8 Nov 21, 2025
46 checks passed
@LesnyRumcajs LesnyRumcajs deleted the hm/fix-error-in-skip-load-actors branch November 21, 2025 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Forest fails on skip-load-actors on devnet

3 participants