Conversation
WalkthroughIntroduces end-to-end nonce handling: per-sender locking (MpoolLocker), global nonce serialization/persistence (NonceTracker), ID→key resolution and caching, stricter nonce-gap rules, RPC wiring to use the new signing/sign-and-push flow, plus docs, tests, and CI/script tweaks. Changes
Sequence DiagramsequenceDiagram
participant Client as Client
participant RPC as RPC<br/>MpoolPushMessage
participant Locker as MpoolLocker
participant Tracker as NonceTracker
participant Mpool as MessagePool
participant Signer as KeyMgmt<br/>sign_message
participant Store as SettingsStore
Client->>RPC: mpool_push_message(msg)
RPC->>RPC: estimate gas / resolve sender→key_addr
RPC->>Locker: take_lock(key_addr)
activate Locker
RPC->>RPC: validate wallet balance
RPC->>Tracker: sign_and_push(message, key, eth_chain_id)
activate Tracker
Tracker->>Tracker: acquire global nonce mutex
Tracker->>Mpool: get_sequence(from_addr)
activate Mpool
Mpool->>Mpool: resolve_to_key & compute state-including-tipset sequence
Mpool->>Store: read persisted nonce (if needed)
Mpool-->>Tracker: mpool_nonce
deactivate Mpool
Tracker->>Tracker: choose nonce = max(mpool_nonce, persisted)
Tracker->>Signer: sign_message(key, message, eth_chain_id)
activate Signer
Signer-->>Tracker: SignedMessage
deactivate Signer
Tracker->>Mpool: push(signed_msg)
activate Mpool
Mpool-->>Tracker: push result
deactivate Mpool
Tracker->>Store: save_nonce(addr, nonce+1) (log on failure)
Tracker-->>RPC: SignedMessage / CID
deactivate Tracker
deactivate Locker
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/tool/subcommands/api_cmd/generate_test_snapshot.rs (1)
146-160:⚠️ Potential issue | 🟠 MajorDon't bind
NonceTrackerto the shared snapshot-generation DB.
load_db()gives every snapshot run the sameArc<ReadOpsTrackingStore<ManyCar<ParityDb>>>, and this wrapper writes settings through toinner, nottracker. Withnonce_trackerpointed atstate_manager.blockstore_owned(), persisted nonce-cache updates will bleed into later snapshot generations and may still be absent from the exported minimal snapshot unless those keys get read again. This will make stateful fixtures order-dependent onceMpoolPushMessage/MpoolGetNoncesnapshots are added.Based on learnings, "when using ReadOpsTrackingStore to generate minimal snapshots, HEAD_KEY should be written to db.tracker (not db itself) before calling export_forest_car(), because the export reads from the tracker MemoryDB which accumulates only the accessed data during computation."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/tool/subcommands/api_cmd/generate_test_snapshot.rs` around lines 146 - 160, NonceTracker is being constructed against state_manager.blockstore_owned() which points at the full DB; instead bind the nonce cache to the ReadOpsTrackingStore tracker used for snapshot generation so persisted nonce writes go into the ephemeral tracker not the underlying DB. Change the NonceTracker creation to use the tracking store (the db.tracker returned by load_db() / the ReadOpsTrackingStore wrapper) rather than state_manager.blockstore_owned(), and ensure HEAD_KEY is written to db.tracker (tracker) before calling export_forest_car() so the export reads the tracked (accessed) keys only.src/message_pool/msgpool/mod.rs (1)
324-342:⚠️ Potential issue | 🟠 MajorCanonicalize the
rmsgskey as well.After this refactor
pendingis keyed by resolved key address, but Lines 333-342 still probermsgsby rawfrom. On a revert/apply where the same actor shows up asf0…on one side and key-address form on the other, the apply path misses the reverted entry, removes the pending entry instead, and then re-adds the reverted message at the end.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/message_pool/msgpool/mod.rs` around lines 324 - 342, The function remove_from_selected_msgs is checking rmsgs by the raw from address while pending is keyed by resolved key addresses; call resolve_to_key(api, key_cache, from, cur_ts) once at the start of the lookup and then use the resolved Address for both rmsgs and pending accesses (i.e., replace rmsgs.get_mut(from) with rmsgs.get_mut(&resolved) and use resolved in the remove(...) calls), ensuring you still propagate the Result error from resolve_to_key and keep the existing logic for removing the sequence from the temp map when present.
🧹 Nitpick comments (3)
scripts/tests/calibnet_wallet_check.sh (1)
76-76: Avoid scraping the humanlisttable forADDR_ONE.This script already had to change once because the
listlayout moved. Pulling the address out oftail | cutkeeps the test coupled to presentation-only output and will break again on the next formatting tweak. Prefer a command or output mode that returns just the address.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/tests/calibnet_wallet_check.sh` at line 76, The current extraction of ADDR_ONE by piping "$FOREST_WALLET_PATH list | tail -1 | cut -d ' ' -f2" scrapes the human-formatted table and is fragile; change the ADDR_ONE assignment to use a machine-readable/listing option or output mode from $FOREST_WALLET_PATH (e.g., a flag that emits just addresses or JSON) and parse that output (or use a --quiet/--format option) to reliably select the desired address, so the script uses the wallet CLI's non-presentational output instead of scraping the printed table.src/message_pool/mpool_locker.rs (1)
17-22: DocumentMpoolLocker::new().The public constructor is the only exposed
MpoolLockerAPI here without rustdoc.As per coding guidelines, "Document public functions and structs with doc comments".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/message_pool/mpool_locker.rs` around lines 17 - 22, Add a doc comment for the public constructor MpoolLocker::new() describing what a MpoolLocker represents and what the new() instance contains/initializes (e.g., an empty inner Mutex<HashMap> used to track per-message-pool locks), include any thread-safety or ownership notes relevant to callers, and place the triple-slash /// doc immediately above the impl block or the new() function so rustdoc will document it alongside the MpoolLocker API.src/message_pool/msgpool/test_provider.rs (1)
229-244: Consider adding BLS message support for completeness.The
messages_for_tipsetimplementation only collects signed messages. While this is consistent withTestApi's internal storage (which only storesSignedMessage), the productionChainStore::messages_for_tipsetcollects both unsigned BLS and signed SECP messages viaBlockMessages::for_tipset.This discrepancy is acceptable for current tests since they primarily use SECP messages, but if future tests require BLS message handling during head changes, this may need enhancement.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/message_pool/msgpool/test_provider.rs` around lines 229 - 244, messages_for_tipset currently only returns signed messages (using inner.bmsgs and ChainMessage::Signed), so add support for BLS/unsigned messages by extending TestApi's storage and collecting them in the same loop: add a container for BLS/unsigned messages (e.g., inner.bls_msgs or similar) and in messages_for_tipset iterate blocks' CIDs to append both signed messages (ChainMessage::Signed(Arc::new(...))) and unsigned/BLS messages (ChainMessage::Unsigned or the appropriate ChainMessage variant for BLS) into msgs before returning; update any TestApi methods that populate messages so tests can insert BLS messages into the new storage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CHANGELOG.md`:
- Line 40: The changelog entry currently references the PR number; update the
entry in CHANGELOG.md so it links to the issue `#4899` instead of PR `#6788`
(replace the PR link "[`#6788`](...)" with the issue link "[`#4899`](...)" and keep
the description "Fixed message pool nonce calculation to align with Lotus.") to
follow the repo convention of preferring issue references when both issue and PR
exist.
In `@scripts/tests/calibnet_delegated_wallet_check.sh`:
- Around line 87-100: The loop that polls DELEGATE_ADDR_REMOTE_THREE_BALANCE can
false-pass because it only compares to the prior observed balance and doesn’t
ensure MSG_DELEGATE_FOUR actually landed; change the logic in the block using
MSG_DELEGATE_FOUR, DELEGATE_ADDR_REMOTE_THREE_BALANCE and
DELEGATE_ADDR_THREE_BALANCE to either (A) wait for the message CID in
MSG_DELEGATE_FOUR to be included/confirmed before returning (preferred), or (B)
if keeping balance-polling, fail the script when the loop reaches the 20 retry
timeout instead of continuing silently (exit non-zero and log an error), and
keep updating DELEGATE_ADDR_REMOTE_THREE_BALANCE via $FOREST_WALLET_PATH
--remote-wallet balance each iteration; ensure the chosen approach references
MSG_DELEGATE_FOUR for waiting or uses an explicit exit on timeout.
In `@scripts/tests/calibnet_wallet_check.sh`:
- Around line 157-181: The loops currently only compare balances
(ETH_ADDR_TWO_BALANCE / ETH_ADDR_THREE_BALANCE) and will silently succeed after
retries even if the send commands (MSG_ETH, MSG_ETH_REMOTE) never produced a
message CID; change the logic to wait on the actual returned message CIDs from
MSG_ETH and MSG_ETH_REMOTE (extract the CID from MSG_ETH / MSG_ETH_REMOTE) and
poll the node/wallet for that CID confirmation, or at minimum make the 20-retry
timeout fatal by exiting non-zero if the CID was not observed; update the two
loops that reference ETH_ADDR_TWO_BALANCE and ETH_ADDR_THREE_BALANCE to use the
extracted CID variables and fail with exit 1 on timeout so the script cannot
pass silently.
In `@src/message_pool/mpool_locker.rs`:
- Around line 52-74: Tests use timing sleeps which makes them flaky; replace the
sleep-based handshakes in the two test blocks (the tasks spawned as t1/t2 that
call locker.take_lock and use first_entered/first_released/second_saw_first)
with a deterministic sync primitive such as tokio::sync::Notify or
tokio::sync::Barrier: have the first task signal (notify.notify_one() or
barrier.wait().await) immediately after it acquires the lock (where it currently
sets entered.store) and have the second task await that signal before attempting
its assertion (instead of sleeping), and likewise use a notify or barrier to
signal release instead of the 20ms/100ms sleeps; apply the same change to the
other test block referenced (the one around lines 94-116) so both tests no
longer rely on timing.
In `@src/message_pool/msgpool/msg_pool.rs`:
- Around line 307-315: The code masks resolve_to_key failures by using
unwrap_or(msg.from()), which can undercount nonces; change the scan to propagate
resolution errors instead: call resolve_to_key(api, key_cache, &msg.from(),
cur_ts)? (or otherwise handle the Result and return Err) instead of unwrap_or,
and adjust the surrounding function's signature/return path to propagate the
error; refer to resolve_to_key, messages_for_tipset, msg.from(), and next_nonce
when making this change so the failure during the current tipset scan is not
silently converted to the original address.
In `@src/message_pool/nonce_tracker.rs`:
- Around line 81-87: The code currently pushes the signed message with
mpool.push(smsg.clone()).await? and then calls self.save_nonce(&message.from,
nonce)? which can return an error and cause MpoolPushMessage to report failure
despite the message already being in the mempool; change this to make
persistence best-effort: call self.save_nonce(&message.from, nonce) but do not
propagate its error — instead catch/log the error (e.g. with error!/warn!) and
still return Ok(smsg). Keep the same call order (sign_message, mpool.push,
save_nonce) but ensure save_nonce failures do not convert the whole operation
into an error.
In `@src/rpc/methods/mpool.rs`:
- Around line 301-306: MpoolGetNonce is reading the nonce directly from mpool
via mpool.get_sequence while WalletSignMessage/MpoolPush use
ctx.nonce_tracker.sign_and_push, causing mismatch after restarts; change
MpoolGetNonce to query the same NonceTracker (e.g., call the nonce-tracking
getter on ctx.nonce_tracker instead of mpool.get_sequence) so both flows use the
same source, and ensure any other code paths that return sequence numbers
(mpool.get_sequence, MpoolPushMessage) are routed through NonceTracker APIs to
keep nonce state consistent.
---
Outside diff comments:
In `@src/message_pool/msgpool/mod.rs`:
- Around line 324-342: The function remove_from_selected_msgs is checking rmsgs
by the raw from address while pending is keyed by resolved key addresses; call
resolve_to_key(api, key_cache, from, cur_ts) once at the start of the lookup and
then use the resolved Address for both rmsgs and pending accesses (i.e., replace
rmsgs.get_mut(from) with rmsgs.get_mut(&resolved) and use resolved in the
remove(...) calls), ensuring you still propagate the Result error from
resolve_to_key and keep the existing logic for removing the sequence from the
temp map when present.
In `@src/tool/subcommands/api_cmd/generate_test_snapshot.rs`:
- Around line 146-160: NonceTracker is being constructed against
state_manager.blockstore_owned() which points at the full DB; instead bind the
nonce cache to the ReadOpsTrackingStore tracker used for snapshot generation so
persisted nonce writes go into the ephemeral tracker not the underlying DB.
Change the NonceTracker creation to use the tracking store (the db.tracker
returned by load_db() / the ReadOpsTrackingStore wrapper) rather than
state_manager.blockstore_owned(), and ensure HEAD_KEY is written to db.tracker
(tracker) before calling export_forest_car() so the export reads the tracked
(accessed) keys only.
---
Nitpick comments:
In `@scripts/tests/calibnet_wallet_check.sh`:
- Line 76: The current extraction of ADDR_ONE by piping "$FOREST_WALLET_PATH
list | tail -1 | cut -d ' ' -f2" scrapes the human-formatted table and is
fragile; change the ADDR_ONE assignment to use a machine-readable/listing option
or output mode from $FOREST_WALLET_PATH (e.g., a flag that emits just addresses
or JSON) and parse that output (or use a --quiet/--format option) to reliably
select the desired address, so the script uses the wallet CLI's
non-presentational output instead of scraping the printed table.
In `@src/message_pool/mpool_locker.rs`:
- Around line 17-22: Add a doc comment for the public constructor
MpoolLocker::new() describing what a MpoolLocker represents and what the new()
instance contains/initializes (e.g., an empty inner Mutex<HashMap> used to track
per-message-pool locks), include any thread-safety or ownership notes relevant
to callers, and place the triple-slash /// doc immediately above the impl block
or the new() function so rustdoc will document it alongside the MpoolLocker API.
In `@src/message_pool/msgpool/test_provider.rs`:
- Around line 229-244: messages_for_tipset currently only returns signed
messages (using inner.bmsgs and ChainMessage::Signed), so add support for
BLS/unsigned messages by extending TestApi's storage and collecting them in the
same loop: add a container for BLS/unsigned messages (e.g., inner.bls_msgs or
similar) and in messages_for_tipset iterate blocks' CIDs to append both signed
messages (ChainMessage::Signed(Arc::new(...))) and unsigned/BLS messages
(ChainMessage::Unsigned or the appropriate ChainMessage variant for BLS) into
msgs before returning; update any TestApi methods that populate messages so
tests can insert BLS messages into the new storage.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 9c53ddd6-e120-4024-a80f-227d8c503f0b
📒 Files selected for processing (24)
.github/workflows/forest.ymlCHANGELOG.mddocs/docs/developers/guides/nonce_handling.mdscripts/tests/calibnet_delegated_wallet_check.shscripts/tests/calibnet_wallet_check.shsrc/daemon/mod.rssrc/key_management/wallet_helpers.rssrc/message_pool/errors.rssrc/message_pool/mod.rssrc/message_pool/mpool_locker.rssrc/message_pool/msgpool/mod.rssrc/message_pool/msgpool/msg_pool.rssrc/message_pool/msgpool/provider.rssrc/message_pool/msgpool/selection.rssrc/message_pool/msgpool/test_provider.rssrc/message_pool/nonce_tracker.rssrc/rpc/methods/mpool.rssrc/rpc/methods/sync.rssrc/rpc/methods/wallet.rssrc/rpc/mod.rssrc/tool/offline_server/server.rssrc/tool/subcommands/api_cmd/generate_test_snapshot.rssrc/tool/subcommands/api_cmd/test_snapshot.rssrc/wallet/subcommands/wallet_cmd.rs
💤 Files with no reviewable changes (1)
- .github/workflows/forest.yml
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/message_pool/nonce_tracker.rs (1)
193-215: Good test coverage for sequential nonce assignment.The test verifies that consecutive
sign_and_pushcalls assign sequential nonces (0, 1). Consider adding a concurrent test that spawns multiple tasks callingsign_and_pushsimultaneously to verify the global mutex correctly prevents nonce collisions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/message_pool/nonce_tracker.rs` around lines 193 - 215, Add a new async test alongside test_sign_and_push_assigns_sequential_nonces that spawns multiple concurrent tasks which call NonceTracker::sign_and_push using the same tracker, mpool, wallet key (from make_test_pool_and_wallet) and sender to ensure no two returned SignedMessage instances share the same sequence; use make_test_nonce_store() to create the tracker and await all tasks (e.g., via join_all) then assert that the set of message().sequence values is contiguous and unique (0..n-1) to verify the global mutex prevents nonce collisions under concurrency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/message_pool/nonce_tracker.rs`:
- Around line 193-215: Add a new async test alongside
test_sign_and_push_assigns_sequential_nonces that spawns multiple concurrent
tasks which call NonceTracker::sign_and_push using the same tracker, mpool,
wallet key (from make_test_pool_and_wallet) and sender to ensure no two returned
SignedMessage instances share the same sequence; use make_test_nonce_store() to
create the tracker and await all tasks (e.g., via join_all) then assert that the
set of message().sequence values is contiguous and unique (0..n-1) to verify the
global mutex prevents nonce collisions under concurrency.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: ce841c06-e607-4d24-9a46-75e1d2eef067
📒 Files selected for processing (6)
scripts/tests/calibnet_delegated_wallet_check.shscripts/tests/calibnet_wallet_check.shsrc/daemon/mod.rssrc/message_pool/mpool_locker.rssrc/message_pool/msgpool/msg_pool.rssrc/message_pool/nonce_tracker.rs
✅ Files skipped from review due to trivial changes (1)
- src/message_pool/msgpool/msg_pool.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- scripts/tests/calibnet_delegated_wallet_check.sh
- src/message_pool/mpool_locker.rs
- scripts/tests/calibnet_wallet_check.sh
| ## State nonce calculation | ||
|
|
||
| The **state nonce** is the next expected nonce derived from on-chain data. It is | ||
| computed by `get_state_sequence` in `src/message_pool/msgpool/msg_pool.rs`: |
There was a problem hiding this comment.
Those should be permalinks in case anything changes within the code, i.e., renaming.
There was a problem hiding this comment.
Any reason not to put this information in the message pool Rust module as top-level Rust docs?
It kind of improves discoverability and maintenance burden.
There was a problem hiding this comment.
An additional benefit there is that if you mention functions/constants in the docs via links, anyone that modifies them get immediate feedback from the compiler, which reduces code/docs drift.
|
@sudo-shashank Can you please summarise what are the effective logic changes? |
| proceed in parallel for different senders while the nonce-critical section | ||
| remains serialized. | ||
|
|
||
| ## Nonce persistence |
There was a problem hiding this comment.
Why do we need to persiste them?
|
|
||
| # Fund delegated wallet from preloaded wallet | ||
| DELEGATE_FUND_AMT="2 micro FIL" | ||
| DELEGATE_FUND_AMT="3 micro FIL" |
| let tma = TestApi::default(); | ||
| let keystore = KeyStore::new(KeyStoreConfig::Memory).unwrap(); | ||
| let mut wallet = Wallet::new(keystore); | ||
| let key_addr = wallet.generate_addr(SignatureType::Bls).unwrap(); |
There was a problem hiding this comment.
Seems like there's a bit of codee duplication in the tests; can we extract some common setup logic?
Summary of changes
Changes introduced in this pull request:
forest-walletlist cmd to includeNoncecolumn.Reference issue to close (if applicable)
Closes #4899
Closes #3628
Other information and links
Change checklist
Outside contributions
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation