Skip to content

[MEL] - Prevent MEL node startup if have non-MEL entries in ConsensusDB#4449

Merged
rauljordan merged 7 commits intomasterfrom
verify-consensusdb-startingonMEL
Mar 23, 2026
Merged

[MEL] - Prevent MEL node startup if have non-MEL entries in ConsensusDB#4449
rauljordan merged 7 commits intomasterfrom
verify-consensusdb-startingonMEL

Conversation

@ganeshvanahalli
Copy link
Copy Markdown
Contributor

This PR makes it that If the inbox tracker / reader have DB values (meaning a node was started in non-MEL mode), we should error on startup if we enable the MEL flag.

Resolves NIT-4571

Copy link
Copy Markdown
Contributor

@bragaigor bragaigor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few minor comments

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 2, 2026

❌ 11 Tests Failed:

Tests completed Failed Passed Skipped
4527 11 4516 0
View the top 3 failed tests by shortest run time
TestAliasingFlaky
Stack Traces | -0.000s run time
... [CONTENT TRUNCATED: Keeping last 20 lines]
INFO [03-23|11:15:23.326] Persisted trie from memory database      nodes=110 flushnodes=0 size=13.31KiB flushsize=0.00B time="242.046µs" flushtime=0s gcnodes=0 gcsize=0.00B gctime="3.737µs"   livenodes=110 livesize=20.59KiB
INFO [03-23|11:15:23.326] Writing cached state to disk             block=6  hash=16d613..2ed3c8 root=580204..914557
INFO [03-23|11:15:23.326] Starting work on payload                 id=0x030c2f3ca3872673
INFO [03-23|11:15:23.326] Persisted trie from memory database      nodes=17  flushnodes=0 size=3.31KiB  flushsize=0.00B time="44.367µs"  flushtime=0s gcnodes=0 gcsize=0.00B gctime=0s          livenodes=93  livesize=17.28KiB
INFO [03-23|11:15:23.326] Writing cached state to disk             block=1  hash=fc6caa..6a62a4 root=2b3915..838bec
INFO [03-23|11:15:23.326] Persisted trie from memory database      nodes=24  flushnodes=0 size=4.41KiB  flushsize=0.00B time="65.289µs"  flushtime=0s gcnodes=0 gcsize=0.00B gctime=0s          livenodes=69  livesize=12.87KiB
INFO [03-23|11:15:23.326] Writing snapshot state to disk           root=77ae46..2fbcae
INFO [03-23|11:15:23.326] Persisted trie from memory database      nodes=0   flushnodes=0 size=0.00B    flushsize=0.00B time="19.29µs"   flushtime=0s gcnodes=0 gcsize=0.00B gctime=0s          livenodes=69  livesize=12.87KiB
INFO [03-23|11:15:23.326] Updated payload                          id=0x030c2f3ca3872673                      number=42 hash=355157..748e90 txs=1  withdrawals=0 gas=21000     fees=0.002086177744 root=4e5139..7ec46a elapsed="233.763µs"
INFO [03-23|11:15:23.326] Stopping work on payload                 id=0x030c2f3ca3872673                      reason=delivery
INFO [03-23|11:15:23.326] Blockchain stopped
INFO [03-23|11:15:23.327] Imported new potential chain segment     number=42 hash=355157..748e90 blocks=1  txs=1  mgas=0.021 elapsed="422.925µs" mgasps=49.654   triediffs=205.33KiB triedirty=0.00B
INFO [03-23|11:15:23.327] Chain head was updated                   number=42 hash=355157..748e90 root=4e5139..7ec46a elapsed="27.702µs"
INFO [03-23|11:15:23.327] Ethereum protocol stopped
INFO [03-23|11:15:23.327] Transaction pool stopped
INFO [03-23|11:15:23.327] Persisting dirty state                   head=33 root=10bd30..880db8 layers=33
INFO [03-23|11:15:23.328] Persisted dirty state to disk            size=162.18KiB elapsed="819.098µs"
INFO [03-23|11:15:23.328] Blockchain stopped
INFO [03-23|11:15:23.328] Submitted transaction                    hash=0xb31a044a6e7328e9dc7e368a664b26943305c58a9d4a5669512bc77266bc3a9f from=0xaF24Ca6c2831f4d4F629418b50C227DF0885613A nonce=4  recipient=0xaF24Ca6c2831f4d4F629418b50C227DF0885613A value=1,000,000,000,000
INFO [03-23|11:15:23.329] Starting work on payload                 id=0x03fab9d9863897ad
TestBatchPosterL1SurplusMatchesBatchGasFlaky
Stack Traces | 0.550s run time
... [CONTENT TRUNCATED: Keeping last 20 lines]
panic: runtime error: invalid memory address or nil pointer dereference [recovered, repanicked]
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2075dd2]

goroutine 11 [running]:
testing.tRunner.func1.2({0x37d3e80, 0x61d69b0})
	/opt/hostedtoolcache/go/1.25.8/x64/src/testing/testing.go:1872 +0x237
testing.tRunner.func1()
	/opt/hostedtoolcache/go/1.25.8/x64/src/testing/testing.go:1875 +0x35b
panic({0x37d3e80?, 0x61d69b0?})
	/opt/hostedtoolcache/go/1.25.8/x64/src/runtime/panic.go:783 +0x132
github.com/offchainlabs/nitro/arbnode.(*InboxTracker).GetBatchCount(0x13087900?)
	/home/runner/work/nitro/nitro/arbnode/inbox_tracker.go:210 +0x12
github.com/offchainlabs/nitro/arbnode.(*InboxTracker).FindInboxBatchContainingMessage(0x0, 0x7)
	/home/runner/work/nitro/nitro/arbnode/inbox_tracker.go:225 +0x2f
github.com/offchainlabs/nitro/system_tests.TestBatchPosterL1SurplusMatchesBatchGasFlaky(0xc0000ffdc0)
	/home/runner/work/nitro/nitro/system_tests/batch_poster_test.go:838 +0x725
testing.tRunner(0xc0000ffdc0, 0x41a2f88)
	/opt/hostedtoolcache/go/1.25.8/x64/src/testing/testing.go:1934 +0xea
created by testing.(*T).Run in goroutine 1
	/opt/hostedtoolcache/go/1.25.8/x64/src/testing/testing.go:1997 +0x465
TestRedisProduceComplex/one_producer,_all_consumers_are_active
Stack Traces | 1.260s run time
... [CONTENT TRUNCATED: Keeping last 20 lines]
�[36mDEBUG�[0m[03-23|15:40:26.192] consumer: xack                           �[36mcid�[0m=eeed71a9-1372-42ad-b8d2-645a1aab4993 �[36mmessageId�[0m=1774280425074-2
�[36mDEBUG�[0m[03-23|15:40:26.192] Redis stream consuming                   �[36mconsumer_id�[0m=9b402edc-0263-49ca-a2f8-0f1a58b5f6d5 �[36mmessage_id�[0m=1774280425075-1
�[36mDEBUG�[0m[03-23|15:40:26.192] consumer: setting result                 �[36mcid�[0m=9b402edc-0263-49ca-a2f8-0f1a58b5f6d5 �[36mmsgIdInStream�[0m=1774280425075-1  �[36mresultKeyInRedis�[0m=result-key:stream:f5bdbcfd-596f-43cb-ac6a-ceb3e2b0d249.1774280425075-1
�[36mDEBUG�[0m[03-23|15:40:26.193] consumer: xack                           �[36mcid�[0m=631dcff8-b245-4cb9-b4d3-1629b2254939 �[36mmessageId�[0m=1774280425052-4
�[36mDEBUG�[0m[03-23|15:40:26.197] consumer: xack                           �[36mcid�[0m=8f631fec-04de-492f-ac11-a518c7e0227d �[36mmessageId�[0m=1774280425075-0
�[36mDEBUG�[0m[03-23|15:40:26.197] consumer: xdel                           �[36mcid�[0m=631dcff8-b245-4cb9-b4d3-1629b2254939 �[36mmessageId�[0m=1774280425052-4
�[36mDEBUG�[0m[03-23|15:40:26.197] consumer: xack                           �[36mcid�[0m=3ba2fbcd-e728-47be-8085-9e4d5551ff72 �[36mmessageId�[0m=1774280425052-5
�[36mDEBUG�[0m[03-23|15:40:26.197] consumer: xdel                           �[36mcid�[0m=8f631fec-04de-492f-ac11-a518c7e0227d �[36mmessageId�[0m=1774280425075-0
�[36mDEBUG�[0m[03-23|15:40:26.197] consumer: xdel                           �[36mcid�[0m=3ba2fbcd-e728-47be-8085-9e4d5551ff72 �[36mmessageId�[0m=1774280425052-5
�[36mDEBUG�[0m[03-23|15:40:26.198] consumer: xdel                           �[36mcid�[0m=eeed71a9-1372-42ad-b8d2-645a1aab4993 �[36mmessageId�[0m=1774280425074-2
�[36mDEBUG�[0m[03-23|15:40:26.198] consumer: xack                           �[36mcid�[0m=9b402edc-0263-49ca-a2f8-0f1a58b5f6d5 �[36mmessageId�[0m=1774280425075-1
�[36mDEBUG�[0m[03-23|15:40:26.198] consumer: xack                           �[36mcid�[0m=c838c592-2940-4da8-967d-f2904abd22e6 �[36mmessageId�[0m=1774280425052-3
�[36mDEBUG�[0m[03-23|15:40:26.199] consumer: xdel                           �[36mcid�[0m=9b402edc-0263-49ca-a2f8-0f1a58b5f6d5 �[36mmessageId�[0m=1774280425075-1
�[36mDEBUG�[0m[03-23|15:40:26.199] consumer: xdel                           �[36mcid�[0m=c838c592-2940-4da8-967d-f2904abd22e6 �[36mmessageId�[0m=1774280425052-3
�[36mDEBUG�[0m[03-23|15:40:26.199] trimming                                 �[36mxTrimMinID�[0m=1774280425052-3  �[36mtrimmed�[0m=0 �[36mtrim-err�[0m=<nil>
�[36mDEBUG�[0m[03-23|15:40:26.254] checkResponses                           �[36mresponded�[0m=83 �[36merrored�[0m=0 �[36mchecked�[0m=99
�[36mDEBUG�[0m[03-23|15:40:26.264] redis producer: check responses starting
�[36mDEBUG�[0m[03-23|15:40:26.284] checkResponses                           �[36mresponded�[0m=16 �[36merrored�[0m=0 �[36mchecked�[0m=16
�[31mERROR�[0m[03-23|15:40:26.284] Error from XpendingExt in getting PEL for auto claim �[31merr�[0m="context canceled" �[31mpendingLen�[0m=0
--- FAIL: TestRedisProduceComplex/one_producer,_all_consumers_are_active (1.26s)

📣 Thoughts on this report? Let Codecov know! | Powered by Codecov

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 3, 2026

Codecov Report

❌ Patch coverage is 35.29412% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 35.12%. Comparing base (2fd04bf) to head (7a5d00b).
⚠️ Report is 8 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4449      +/-   ##
==========================================
+ Coverage   34.54%   35.12%   +0.58%     
==========================================
  Files         497      497              
  Lines       58907    58934      +27     
==========================================
+ Hits        20347    20701     +354     
+ Misses      34965    34550     -415     
- Partials     3595     3683      +88     

bragaigor
bragaigor previously approved these changes Mar 3, 2026
Copy link
Copy Markdown
Contributor

@bragaigor bragaigor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

diegoximenes
diegoximenes previously approved these changes Mar 6, 2026
@diegoximenes diegoximenes assigned eljobe and unassigned diegoximenes Mar 6, 2026
Copy link
Copy Markdown
Member

@eljobe eljobe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be prohibitively expensive to write some unit tests for this?
I think it would be great to be sure we have the right branches covered.

arbnode/node.go Outdated
return err
}
if hasDelayedMessageCountKey {
return errors.New("MEL being initialized when DB already has stale keys from inbox reader")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if this error were distinguishable from the one on line 836 when we see them in a log. Like, it would be good to mention that this one found a stale DelayedMessageCountKey and the other one found a SequencerBatchCountKey.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the tests

@eljobe eljobe assigned ganeshvanahalli and unassigned eljobe Mar 9, 2026
@amsanghi amsanghi removed their request for review March 13, 2026 08:14
@ganeshvanahalli ganeshvanahalli removed their assignment Mar 19, 2026
eljobe
eljobe previously approved these changes Mar 23, 2026
Base automatically changed from raul/mel-pr3-core to master March 23, 2026 10:29
@rauljordan rauljordan dismissed stale reviews from eljobe, diegoximenes, and bragaigor March 23, 2026 10:29

The base branch was changed.

@eljobe eljobe enabled auto-merge March 23, 2026 11:00
@eljobe eljobe added this pull request to the merge queue Mar 23, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 23, 2026
@rauljordan rauljordan added this pull request to the merge queue Mar 23, 2026
@rauljordan rauljordan changed the title Prevent MEL node startup if have non-MEL entries in ConsensusDB [MEL] - Prevent MEL node startup if have non-MEL entries in ConsensusDB Mar 23, 2026
Merged via the queue into master with commit 02a5905 Mar 23, 2026
40 of 41 checks passed
@rauljordan rauljordan deleted the verify-consensusdb-startingonMEL branch March 23, 2026 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants