Skip to content

refactor: msg pool to make more structured part 2#7006

Draft
akaladarshi wants to merge 4 commits intomainfrom
akaladarshi/msgpool-refactor-phase2
Draft

refactor: msg pool to make more structured part 2#7006
akaladarshi wants to merge 4 commits intomainfrom
akaladarshi/msgpool-refactor-phase2

Conversation

@akaladarshi
Copy link
Copy Markdown
Collaborator

@akaladarshi akaladarshi commented May 6, 2026

Summary of changes

Changes introduced in this pull request:

  • This PR is part 2 of restructuring of msg pool, it contains:

    • Local store which holds all the data related to the local state in the msgpool
    • Republish state holds structures for handling the republishing of the messages in the mempool
    • Created a cache struct which holds all the caches related to msgpool
  • This change should be applied on top of the refactor: msg pool to make more structured #6965

  • Next part will have major changes:

    • Use Arc on MessagePool itself rather than each individual field, this will allow us to:
      • Pass the message pool directly into the headchange trigger which will become part of the MessagePool, instead of being a free function with unlimited params
      • Convert republish_cycle part of the MessagePool, instead of being a free function with unlimited params

Reference issue to close (if applicable)

Part of #7010

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Outside contributions

  • I have read and agree to the CONTRIBUTING document.
  • I have read and agree to the AI Policy document. I understand that failure to comply with the guidelines will lead to rejection of the pull request.

Summary by CodeRabbit

  • Refactor
    • Improved internal message pool architecture by consolidating cache management and streamlining message republishing coordination for better code maintainability.

@akaladarshi akaladarshi requested a review from a team as a code owner May 6, 2026 07:19
@akaladarshi akaladarshi requested review from hanabi1224 and sudo-shashank and removed request for a team May 6, 2026 07:19
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

Review Change Stack

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5663401d-fa80-4f78-a01d-3a2beadeb3d7

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

This PR consolidates message pool cache and state management by introducing Caches struct grouping four LRU caches, LocalStore centralizing local address/message tracking, and RepublishState replacing ad-hoc channel-and-hashset republish coordination, with all call sites and tasks updated to use the new abstractions.

Changes

Message Pool State Consolidation

Layer / File(s) Summary
New Data Structures
src/message_pool/msgpool/republish.rs, src/message_pool/msgpool/local_store.rs, src/message_pool/msgpool/msg_pool.rs
RepublishState tracks republished CIDs with async trigger channel; LocalStore manages local addresses and messages with thread-safe reads; Caches struct groups bls_sig, sig_val, key, and state_nonce LRU caches.
Module Wiring
src/message_pool/msgpool/mod.rs
Module declarations include new local_store and republish submodules; imports reorganized to expose StateNonceCacheKey, RepublishState, and LocalStore.
MessagePool Struct and Constructor
src/message_pool/msgpool/msg_pool.rs
MessagePool replaces individual cache fields and ad-hoc local/republish tracking with caches: Caches, local: Arc<LocalStore>, and republish: Arc<RepublishState>; constructor instantiates all new structures and spawns background tasks.
Message Republishing Logic
src/message_pool/msgpool/mod.rs
republish_pending_messages and head_change accept RepublishState and LocalStore references; messages marked via republish.mark_republished() and republishing triggered via republish.trigger().await?.
MessagePool Method Updates
src/message_pool/msgpool/msg_pool.rs, src/message_pool/msgpool/selection.rs
All methods access caches via self.caches.*, local messages via self.local.*, and pass updated cache/republish references to internal functions; signature verification, message recovery, state sequencing refactored accordingly.
Async Task Closures
src/message_pool/msgpool/msg_pool.rs
Head-change and republish task closures refactored to capture and clone new shared state (caches, republish, local) instead of standalone cache variables.
Test Updates
src/message_pool/msgpool/mod.rs
Unit tests for LocalStore and RepublishState validate insertion semantics, trigger channels, and set operations; test helper updated to reference pool.caches.sig_val.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • LesnyRumcajs
  • sudo-shashank
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main refactoring work, but lacks specificity about the key structural changes (LocalStore, RepublishState, Caches).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch akaladarshi/msgpool-refactor-phase2
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch akaladarshi/msgpool-refactor-phase2

Comment @coderabbitai help to get the list of available commands and usage tips.

@akaladarshi akaladarshi requested a review from LesnyRumcajs May 6, 2026 07:19
Base automatically changed from akaladarshi/msgpool-refactor to main May 6, 2026 15:47
@akaladarshi akaladarshi force-pushed the akaladarshi/msgpool-refactor-phase2 branch from dd74a63 to a86d8c4 Compare May 7, 2026 07:15
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/message_pool/msgpool/local_store.rs`:
- Around line 26-28: The add() method currently unconditionally appends
resolved_from to self.local_addrs causing duplicates; change add() to avoid
duplicates by inserting only if the address is not already present (e.g., check
self.local_addrs.read().contains(&resolved_from) or convert local_addrs to a
HashSet and insert), so known_local_addrs() no longer grows per-message and
republish_pending_messages() won't re-resolve the same sender repeatedly; update
any code that assumes a Vec to handle the new container if you switch to
HashSet.

In `@src/message_pool/msgpool/mod.rs`:
- Around line 275-284: The republish trigger is using
RepublishState::mark_republished (which inserts) causing the logic to wake on
new CIDs instead of on CIDs already republished; change the check to a read-only
membership test by calling a new or existing RepublishState::was_republished
(implement it to return republished.contains(cid) without mutating state) and
use that in both loops (the branches around mpool_ctx.remove_from_selected_msgs
and the repub flag) so you only set repub = true when the CID was already in the
republished set.

In `@src/message_pool/msgpool/msg_pool.rs`:
- Around line 485-493: The load_local() implementation iterates
LocalStore::snapshot_msgs() (a HashSet) in non-deterministic order which causes
add() to fail with sequencing errors (SequenceTooLow, NonceGap,
DuplicateSequence) and may silently drop messages; fix by collecting
snapshot_msgs() into a vector, sort it deterministically by sender and
message().sequence before iterating, then call self.add(...) for each; update
the add() error handling in the closure used in load_local() so SequenceTooLow
still triggers local.remove_msg(&k) but other errors are either logged/warned
(including error kind) and left in local_msgs (or retried) rather than silently
ignored, referencing load_local, LocalStore::snapshot_msgs, add,
local.remove_msg, and the Error variants
SequenceTooLow/NonceGap/DuplicateSequence.

In `@src/message_pool/msgpool/republish.rs`:
- Around line 39-44: The trigger() method currently uses
self.trigger.send_async(()).await which can await and block head_change() when
the 4-slot wakeup buffer is full; replace the await send with a non-blocking
self.trigger.try_send(()) and treat a Full error as a no-op (return Ok(()))
because a full buffer already indicates a pending wake, while mapping other
errors into Error::Other with the error details. Keep the function signature and
callers (head_change(), republish_pending_messages()) unchanged; only change
send_async() -> try_send() and handle TrySendError::Full by dropping the signal
and returning Ok(()) while converting other TrySendError variants into the
existing Error::Other format.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 36a9a2fb-5de4-4a0e-85d3-0e9d2a8e6759

📥 Commits

Reviewing files that changed from the base of the PR and between 868b7c2 and a86d8c4.

📒 Files selected for processing (5)
  • src/message_pool/msgpool/local_store.rs
  • src/message_pool/msgpool/mod.rs
  • src/message_pool/msgpool/msg_pool.rs
  • src/message_pool/msgpool/republish.rs
  • src/message_pool/msgpool/selection.rs

Comment thread src/message_pool/msgpool/local_store.rs Outdated
Comment on lines +26 to +28
pub(in crate::message_pool) fn add(&self, msg: SignedMessage, resolved_from: Address) {
self.local_addrs.write().push(resolved_from);
self.local_msgs.write().insert(msg);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Deduplicate local senders on insert.

Line 27 appends every resolved_from, so known_local_addrs() grows with message count rather than sender count. republish_pending_messages() walks this list on every cycle, so one busy local account turns into unbounded duplicate address resolution and snapshot work.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/message_pool/msgpool/local_store.rs` around lines 26 - 28, The add()
method currently unconditionally appends resolved_from to self.local_addrs
causing duplicates; change add() to avoid duplicates by inserting only if the
address is not already present (e.g., check
self.local_addrs.read().contains(&resolved_from) or convert local_addrs to a
HashSet and insert), so known_local_addrs() no longer grows per-message and
republish_pending_messages() won't re-resolve the same sender repeatedly; update
any code that assumes a Vec to handle the new container if you switch to
HashSet.

Comment on lines 275 to 284
for msg in smsgs {
mpool_ctx.remove_from_selected_msgs(&msg.from(), msg.sequence(), &mut rmsgs)?;
if !repub && republished.write().insert(msg.cid()) {
if !repub && republish.mark_republished(msg.cid()) {
repub = true;
}
}
for msg in msgs {
mpool_ctx.remove_from_selected_msgs(&msg.from, msg.sequence, &mut rmsgs)?;
if !repub && republished.write().insert(msg.cid()) {
if !repub && republish.mark_republished(msg.cid()) {
repub = true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The republish trigger is checking the wrong condition.

mark_republished() returns true when the CID was not in the current-cycle set yet. Using that as the trigger condition flips the behavior: any previously unseen block message wakes the republisher, while a block that actually includes one of this cycle's republished messages does not. This needs a read-only membership check instead of an inserting check.

Suggested direction
- if !repub && republish.mark_republished(msg.cid()) {
+ if !repub && republish.was_republished(&msg.cid()) {
     repub = true;
 }

RepublishState::was_republished should read republished.contains(cid) without mutating the set.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/message_pool/msgpool/mod.rs` around lines 275 - 284, The republish
trigger is using RepublishState::mark_republished (which inserts) causing the
logic to wake on new CIDs instead of on CIDs already republished; change the
check to a read-only membership test by calling a new or existing
RepublishState::was_republished (implement it to return
republished.contains(cid) without mutating state) and use that in both loops
(the branches around mpool_ctx.remove_from_selected_msgs and the repub flag) so
you only set repub = true when the CID was already in the republished set.

Comment on lines +485 to 493
pub fn load_local(&self) -> Result<(), Error> {
for k in self.local.snapshot_msgs() {
self.add(k.clone()).unwrap_or_else(|err| {
if err == Error::SequenceTooLow {
warn!("error adding message: {:?}", err);
local_msgs.remove(&k);
self.local.remove_msg(&k);
}
})
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, check the structure of the message_pool module
find src/message_pool -name "*.rs" | head -20

Repository: ChainSafe/forest

Length of output: 713


🏁 Script executed:

#!/bin/bash
# Examine the load_local method and surrounding context
sed -n '480,510p' src/message_pool/msgpool/msg_pool.rs

Repository: ChainSafe/forest

Length of output: 865


🏁 Script executed:

#!/bin/bash
# Find LocalStore definition
fd "local_store" src/message_pool --type f

Repository: ChainSafe/forest

Length of output: 100


🏁 Script executed:

#!/bin/bash
# Search for snapshot_msgs implementation
rg "snapshot_msgs" src/message_pool --context 5 -A 10

Repository: ChainSafe/forest

Length of output: 2618


🏁 Script executed:

#!/bin/bash
# Find LocalStore struct definition to see what local_msgs is
sed -n '1,100p' src/message_pool/msgpool/local_store.rs | grep -A 50 "struct LocalStore"

Repository: ChainSafe/forest

Length of output: 1568


🏁 Script executed:

#!/bin/bash
# Get more of local_store.rs to see the full struct and local_msgs type
head -80 src/message_pool/msgpool/local_store.rs

Repository: ChainSafe/forest

Length of output: 2421


🏁 Script executed:

#!/bin/bash
# Check the add method to understand sequencing logic
sed -n '400,550p' src/message_pool/msgpool/msg_pool.rs | head -100

Repository: ChainSafe/forest

Length of output: 3490


🏁 Script executed:

#!/bin/bash
# Find the MessagePool::add method implementation
rg "pub fn add\(" src/message_pool/msgpool/msg_pool.rs -A 50 | head -100

Repository: ChainSafe/forest

Length of output: 1982


🏁 Script executed:

#!/bin/bash
# Also search for any sequence-related validation in add
rg "sequence" src/message_pool/msgpool/msg_pool.rs | head -30

Repository: ChainSafe/forest

Length of output: 1652


🏁 Script executed:

#!/bin/bash
# Look at the Error enum to understand what errors can occur
cat src/message_pool/errors.rs

Repository: ChainSafe/forest

Length of output: 1661


Sort local messages by sender and sequence before replaying on startup.

LocalStore::snapshot_msgs() returns messages from a HashSet, which iterates in non-deterministic order. Since add() enforces strict sequencing (if sequence > msg.message().sequence { return Err(SequenceTooLow) }), replaying out-of-sequence messages can fail with errors like NonceGap or DuplicateSequence. The closure only handles SequenceTooLow, so other errors are silently ignored—the message is not added to the pending store but remains in local_msgs, resulting in silent data loss on restart.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/message_pool/msgpool/msg_pool.rs` around lines 485 - 493, The
load_local() implementation iterates LocalStore::snapshot_msgs() (a HashSet) in
non-deterministic order which causes add() to fail with sequencing errors
(SequenceTooLow, NonceGap, DuplicateSequence) and may silently drop messages;
fix by collecting snapshot_msgs() into a vector, sort it deterministically by
sender and message().sequence before iterating, then call self.add(...) for
each; update the add() error handling in the closure used in load_local() so
SequenceTooLow still triggers local.remove_msg(&k) but other errors are either
logged/warned (including error kind) and left in local_msgs (or retried) rather
than silently ignored, referencing load_local, LocalStore::snapshot_msgs, add,
local.remove_msg, and the Error variants
SequenceTooLow/NonceGap/DuplicateSequence.

Comment thread src/message_pool/msgpool/republish.rs Outdated
Comment on lines +39 to +44
pub(in crate::message_pool) async fn trigger(&self) -> Result<(), Error> {
self.trigger
.send_async(())
.await
.map_err(|e| Error::Other(format!("Republish receiver dropped: {e}")))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, explore the republish.rs file structure and find REPUB_TRIGGER_CAPACITY
cat -n src/message_pool/msgpool/republish.rs | head -60

Repository: ChainSafe/forest

Length of output: 2234


🏁 Script executed:

#!/bin/bash
# Find the constant definition and full context of trigger channel
rg "REPUB_TRIGGER_CAPACITY" src/message_pool/ -A 2 -B 2

Repository: ChainSafe/forest

Length of output: 763


🏁 Script executed:

#!/bin/bash
# Find where trigger() is called (especially from head_change)
rg "\.trigger\(\)" src/message_pool/ -B 3 -A 3

Repository: ChainSafe/forest

Length of output: 1016


🏁 Script executed:

#!/bin/bash
# Look for the receiver side handling of the trigger channel
rg "trigger.*recv|recv.*trigger" src/message_pool/ -B 2 -A 5

Repository: ChainSafe/forest

Length of output: 42


🏁 Script executed:

#!/bin/bash
# Find where the receiver (rx) from RepublishState::new() is used
rg "RepublishState::new" src/message_pool/ -A 10 -B 2

Repository: ChainSafe/forest

Length of output: 3329


🏁 Script executed:

#!/bin/bash
# Search for where the trigger receiver is consumed
rg "rx\.recv|recv_async|republish.*rx" src/message_pool/ -B 3 -A 5

Repository: ChainSafe/forest

Length of output: 1496


🏁 Script executed:

#!/bin/bash
# Look at the broader head_change context to understand the impact
rg "head_change" src/message_pool/msgpool/mod.rs -B 5 -A 30

Repository: ChainSafe/forest

Length of output: 7696


🏁 Script executed:

#!/bin/bash
# Search for where repub_trigger_rx is used in msg_pool.rs
cat -n src/message_pool/msgpool/msg_pool.rs | grep -A 50 "repub_trigger_rx"

Repository: ChainSafe/forest

Length of output: 4703


🏁 Script executed:

#!/bin/bash
# Search for republish task spawning or loop
rg "republish|repub.*spawn|spawn.*repub" src/message_pool/msgpool/msg_pool.rs -B 3 -A 10

Repository: ChainSafe/forest

Length of output: 4949


🏁 Script executed:

#!/bin/bash
# Look for any async loop involving republish
rg "loop.*repub|repub.*loop|republish_loop" src/message_pool/ -B 5 -A 15

Repository: ChainSafe/forest

Length of output: 1588


Use try_send() instead of send_async() to avoid blocking head-change processing on a full wakeup buffer.

trigger() is called from the head_change() loop to signal the republish task. Since this channel is purely a wakeup mechanism with a 4-slot buffer, using send_async() can unnecessarily block head-change processing if the buffer fills while the republish task is executing republish_pending_messages(). A full buffer already indicates the republish loop is scheduled to wake, so the signal should be dropped rather than block.

Proposed change
 pub(in crate::message_pool) async fn trigger(&self) -> Result<(), Error> {
-    self.trigger
-        .send_async(())
-        .await
-        .map_err(|e| Error::Other(format!("Republish receiver dropped: {e}")))
+    match self.trigger.try_send(()) {
+        Ok(()) | Err(flume::TrySendError::Full(())) => Ok(()),
+        Err(flume::TrySendError::Disconnected(())) => {
+            Err(Error::Other("Republish receiver dropped".to_owned()))
+        }
+    }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub(in crate::message_pool) async fn trigger(&self) -> Result<(), Error> {
self.trigger
.send_async(())
.await
.map_err(|e| Error::Other(format!("Republish receiver dropped: {e}")))
}
pub(in crate::message_pool) async fn trigger(&self) -> Result<(), Error> {
match self.trigger.try_send(()) {
Ok(()) | Err(flume::TrySendError::Full(())) => Ok(()),
Err(flume::TrySendError::Disconnected(())) => {
Err(Error::Other("Republish receiver dropped".to_owned()))
}
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/message_pool/msgpool/republish.rs` around lines 39 - 44, The trigger()
method currently uses self.trigger.send_async(()).await which can await and
block head_change() when the 4-slot wakeup buffer is full; replace the await
send with a non-blocking self.trigger.try_send(()) and treat a Full error as a
no-op (return Ok(())) because a full buffer already indicates a pending wake,
while mapping other errors into Error::Other with the error details. Keep the
function signature and callers (head_change(), republish_pending_messages())
unchanged; only change send_async() -> try_send() and handle TrySendError::Full
by dropping the signal and returning Ok(()) while converting other TrySendError
variants into the existing Error::Other format.

@akaladarshi akaladarshi marked this pull request as draft May 7, 2026 07:29
@akaladarshi akaladarshi added the RPC requires calibnet RPC checks to run on CI label May 7, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

❌ Patch coverage is 95.56962% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.25%. Comparing base (868b7c2) to head (a86d8c4).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/message_pool/msgpool/mod.rs 72.72% 1 Missing and 2 partials ⚠️
src/message_pool/msgpool/msg_pool.rs 92.85% 2 Missing and 1 partial ⚠️
src/message_pool/msgpool/republish.rs 97.91% 0 Missing and 1 partial ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
src/message_pool/msgpool/local_store.rs 100.00% <100.00%> (ø)
src/message_pool/msgpool/selection.rs 86.82% <100.00%> (ø)
src/message_pool/msgpool/republish.rs 97.91% <97.91%> (ø)
src/message_pool/msgpool/mod.rs 91.21% <72.72%> (-0.10%) ⬇️
src/message_pool/msgpool/msg_pool.rs 87.48% <92.85%> (-0.16%) ⬇️

... and 5 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 868b7c2...a86d8c4. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

RPC requires calibnet RPC checks to run on CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant