Skip to content

Conversation

@LesnyRumcajs
Copy link
Member

@LesnyRumcajs LesnyRumcajs commented Dec 10, 2025

Summary of changes

This fixes a critical issue in the mpool selection logic. That was caught by the MpoolSelect Killer.

2025-12-09T23:29:21.665081Z  WARN forest::message_pool::msgpool::selection: optimal selection failed to pack a block; picked 31 messages with random selection

thread 'tokio-runtime-worker' (2435) panicked at src/message_pool/msgpool/selection.rs:550:32:
index out of bounds: the len is 3 but the index is 3
...
from scripts/mpool_select_killer.rb:17:in `<main>'

It seemed like the author in 2021 made an off-by-one line mistake when translating Go code, and Rust variable shadowing rules were okay with that.

		if chains[last].valid {
			for i := last; i < len(chains)-1; i++ {
				if chains[i].BeforeEffective(chains[i+1]) {
					break
				}
				chains[i], chains[i+1] = chains[i+1], chains[i]
			}
		}

vs

            if chains[last].valid {
                for i in last..chains.len() - 1 {
                    if chains[i].cmp_effective(&chains[i + 1]) == Ordering::Greater {
                        break;
                    }
                }

                chains.key_vec.swap(i, i + 1);
            }

Changes introduced in this pull request:

  • fixed the faulty logic to match Lotus. Sadly, it's not easily reproducible.

Reference issue to close (if applicable)

Closes

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

  • Bug Fixes
    • Fixed a panic that could occur under certain message pool conditions.
    • Fixed the Filecoin.MpoolSelect RPC method.

✏️ Tip: You can customize this high-level summary in your review settings.

@LesnyRumcajs LesnyRumcajs requested a review from a team as a code owner December 10, 2025 11:15
@LesnyRumcajs LesnyRumcajs requested review from akaladarshi and hanabi1224 and removed request for a team December 10, 2025 11:15
@LesnyRumcajs LesnyRumcajs marked this pull request as draft December 10, 2025 11:15
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 10, 2025

Walkthrough

Fixes a panic occurring in message pool conditions by relocating a swap operation inside a loop within the selection logic. The change is a minimal structural adjustment that preserves functional behavior while correcting the underlying issue referenced in PR #6325.

Changes

Cohort / File(s) Summary
Documentation Updates
CHANGELOG.md
Adds fixed entry documenting the panic resolution in message pool MpoolSelect RPC method handling
Bug Fix
src/message_pool/msgpool/selection.rs
Relocates swap operation inside loop body after ordering verification; preserves per-iteration behavior and loop exit condition

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

  • The change is highly localized with minimal logic alteration
  • Structural relocation preserves existing behavior, reducing cognitive load
  • No new complexity or control flow modifications introduced

Suggested reviewers

  • akaladarshi
  • hanabi1224

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: panic in mpool selection' directly and clearly summarizes the main change—fixing a panic in the message pool selection logic, which is the primary objective of this PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-mempool-crash

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4008dd and e9ec5f4.

📒 Files selected for processing (2)
  • CHANGELOG.md (1 hunks)
  • src/message_pool/msgpool/selection.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5867
File: src/ipld/util.rs:461-487
Timestamp: 2025-08-08T12:11:55.266Z
Learning: Forest (src/ipld/util.rs, Rust): In UnorderedChainStream::poll_next, dropping `extract_sender` (when no more tipsets and the extract queue is empty) is the intended shutdown signal for workers. Any subsequent attempt to enqueue work after this drop is a logic error and should be treated as an error; do not change `send()` to ignore a missing sender.
Learnt from: LesnyRumcajs
Repo: ChainSafe/forest PR: 5907
File: src/rpc/methods/state.rs:523-570
Timestamp: 2025-08-06T15:44:33.467Z
Learning: LesnyRumcajs prefers to rely on BufWriter's Drop implementation for automatic flushing rather than explicit flush() calls in Forest codebase.
📚 Learning: 2025-09-02T10:05:34.350Z
Learnt from: akaladarshi
Repo: ChainSafe/forest PR: 5923
File: src/rpc/registry/actors/miner.rs:221-223
Timestamp: 2025-09-02T10:05:34.350Z
Learning: For miner actor ChangeOwnerAddress and ChangeOwnerAddressExported methods: versions 8-10 use bare Address as parameter type, while versions 11+ use ChangeOwnerAddressParams. This reflects the evolution of the Filecoin miner actor parameter structures across versions.

Applied to files:

  • CHANGELOG.md
📚 Learning: 2025-08-08T12:11:55.266Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5867
File: src/ipld/util.rs:461-487
Timestamp: 2025-08-08T12:11:55.266Z
Learning: Forest (src/ipld/util.rs, Rust): In UnorderedChainStream::poll_next, dropping `extract_sender` (when no more tipsets and the extract queue is empty) is the intended shutdown signal for workers. Any subsequent attempt to enqueue work after this drop is a logic error and should be treated as an error; do not change `send()` to ignore a missing sender.

Applied to files:

  • src/message_pool/msgpool/selection.rs
🧬 Code graph analysis (1)
src/message_pool/msgpool/selection.rs (1)
src/chain_sync/chain_follower.rs (1)
  • chains (555-581)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: tests-release
  • GitHub Check: tests
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: Build Ubuntu
  • GitHub Check: All lint checks
🔇 Additional comments (3)
CHANGELOG.md (1)

44-44: CHANGELOG entry is clear and well-placed.

The fix is documented appropriately in the unreleased section with the PR reference and affected RPC method. No changes needed.

src/message_pool/msgpool/selection.rs (2)

544-549: Fix correctly relocates swap operation inside the loop to prevent out-of-bounds access.

The swap of chains.key_vec[i] and chains.key_vec[i+1] now executes only when the effective performance ordering check passes (i.e., when <= and we continue bubbling down). This prevents the off-by-one panic that occurred when the swap was executed unconditionally outside the loop bounds.

Loop bounds are safe: the loop iterates i from last to chains.len() - 2, so i + 1 never exceeds chains.len() - 1.


768-776: Consistent bubble-down pattern confirmed.

The similar swap logic in the merge_and_trim function at lines 768–776 follows the same corrected pattern: ordering check followed by conditional swap inside the loop. This consistency indicates the fix has been uniformly applied across the message selection logic.


Comment @coderabbitai help to get the list of available commands and usage tips.

@LesnyRumcajs LesnyRumcajs marked this pull request as ready for review December 10, 2025 11:21
@LesnyRumcajs LesnyRumcajs added this pull request to the merge queue Dec 10, 2025
Merged via the queue into main with commit 27a4609 Dec 10, 2025
56 of 57 checks passed
@LesnyRumcajs LesnyRumcajs deleted the fix-mempool-crash branch December 10, 2025 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants