fix: ntx builder adheres to note limit and store apply_block race condition fixed by Mirko-von-Leipzig · Pull Request #1508 · 0xMiden/node

Mirko-von-Leipzig · 2026-01-13T09:18:29Z

Caution

This PR targets main

Fixes several issues:

Race condition in the store's apply_block
- A request cancellation (e.g. due to timeout from gRPC client) halts the process in some arbitrary location.
- Solved by moving the process to a spawned task so it isn't cancelled by the gRPC method.
Ntx builder not adhering to the protocol note limit in the checker
Ntx builder not notes from error'd txs as failed
Improve telemetry more

crates/ntx-builder/src/transaction.rs

bobbinth

Looks good! Than you! I left a few comments/questions inline.

CHANGELOG.md

bobbinth · 2026-01-14T01:45:28Z

crates/ntx-builder/src/builder/mod.rs

+                            // Always mark notes as failed. They can get retried eventually.
+                            state.notes_failed(candidate, notes.as_slice(), block_num);
+
                            state.candidate_failed(candidate);


Probably not for this PR, but there seems to be some inconsistency in docs.

Specifically, doc comments for State::candidate_failed() say "All notes in the candidate will be marked as failed" - but I'm not sure that's true (otherwise, we probably wouldn't need to call State::notes_failed() right before State::candidate_failed() - right?).

Also, I'm curious what the rationale was for skipping certain error types before. Was the idea that if, for example, proving failed, then it is likely not the notes fault and there, we shouldn't penalize these notes as "failed"? If so, I would probably expand the comment to indicate that regardless of the source of the failure, we mark notes as failed so that they could be removed from the pending note set in case of unexpected failures (which is not ideal, and may need to be revisited later).

Probably not for this PR, but there seems to be some inconsistency in docs.

Yes, though on next none of this code exists because its all refactored into the actor model. This is still running the centralized version so I'm.. less concerned.

Also, I'm curious what the rationale was for skipping certain error types before. Was the idea that if, for example, proving failed, then it is likely not the notes fault and there, we shouldn't penalize these notes as "failed"?

I think it was a conservative implementation, where we only marked notes that actually failed the check. And that likely all other errors were caused by other external factors.

crates/ntx-builder/src/transaction.rs

bobbinth · 2026-01-14T02:08:28Z

crates/store/src/server/block_producer.rs

+        // We perform the apply_block work in a separate task. This prevents the caller cancelling
+        // the request and thereby cancelling the task at an arbitrary point of execution.
+        //
+        // Normally this shouldn't be a problem, however our apply_block isn't quite ACID compliant
+        // so things get a bit messy. This is more a temporary hack-around to minimize this risk.
+        let this = self.clone();
+        tokio::spawn(


My understanding is that this makes it so that apply_block() always completes, even if the gRPC requests is canceled/times out - right? And the goal of this is to prevent the database getting stuck in the locked state - correct? If so, I'd maybe mention the last part ore explicitly in the comment.

Also, AFAICT, this doesn't prevent the block producer from "de-syncing" from the store. Basically, what we could have is:

Block producer builds a block and send it to the store.

apply_block() takes too long.

Block producer's gRPC request times out.

The tasks doesn't get canceled and the block gets inserted into the DB.

At this point, the store will be at block $n$ while the block producer will be at block $n-1$. If we set reasonable timeouts, this should be extremely rare, but the block producer should be able to recover from this. I'm assuming that's not the case yet - and if so, let's create an issue for this.

My understanding is that this makes it so that apply_block() always completes, even if the gRPC requests is canceled/times out - right?

Yes, though its important to note that this only removes this problem for gRPC request cancellation. This doesn't prevent bugs within this task from placing us in a weird state. For example, its still possible that the in-memory data is updated but the database commit fails because we don't have ACID wrapping both. Guaranteeing this requires a bit of a rethink.

And the goal of this is to prevent the database getting stuck in the locked state - correct?

Not quite, this will likely just make the error different. The database "locked" was a result of having multiple block submission requests running concurrently. Request x times out, task is cancelled but the database portion is still running since it hasn't hit an await point yet. Request y is then submitted, and gets rejected since x is still holding the db write lock.

aka the error is a result of gRPC timeouts being exceeded, and we can't really change beyond changing the error message to something more palatable like "block already in-progress". I'd probably prefer to think this through more to incorporate the ACID guarantees as well, somehow.

Also, AFAICT, this doesn't prevent the block producer from "de-syncing" from the store.
...
I'm assuming that's not the case yet - and if so, let's create an issue for this.

Correct; and this is simply an artifact of our decentralized component design here. #1513.

Mirko-von-Leipzig added 7 commits January 13, 2026 10:30

Move span attributes into correct place

22ee75f

Bound max notes per tx to protocol standard

3c7e6b3

Always mark notes as failed on error

497a1e6

Fix/bandaid race condition with apply_block cancellation

5a2c965

Fixup issues brought up in #1504

5fe1157

changelog

ad677bf

appease clippy

f6057b1

Mirko-von-Leipzig mentioned this pull request Jan 13, 2026

feat(store): improve telemetry and tracing #1504

Merged

Mirko-von-Leipzig marked this pull request as ready for review January 13, 2026 09:32

Mirko-von-Leipzig requested a review from bobbinth January 13, 2026 09:33

bobbinth requested review from drahnr and sergerad January 13, 2026 18:30

drahnr reviewed Jan 13, 2026

View reviewed changes

crates/ntx-builder/src/transaction.rs Outdated Show resolved Hide resolved

drahnr approved these changes Jan 13, 2026

View reviewed changes

bobbinth approved these changes Jan 14, 2026

View reviewed changes

Mirko-von-Leipzig added 2 commits January 14, 2026 10:07

merge main

639f50c

review changes

c7f4599

Mirko-von-Leipzig merged commit 50c9ed1 into main Jan 15, 2026
7 checks passed

Mirko-von-Leipzig deleted the mirko/fix/ntx-builder branch January 15, 2026 06:04

Mirko-von-Leipzig mentioned this pull request Jan 19, 2026

chore: merge main #1531

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: ntx builder adheres to note limit and store apply_block race condition fixed#1508

fix: ntx builder adheres to note limit and store apply_block race condition fixed#1508
Mirko-von-Leipzig merged 9 commits intomainfrom
mirko/fix/ntx-builder

Mirko-von-Leipzig commented Jan 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

bobbinth left a comment

Uh oh!

Uh oh!

bobbinth Jan 14, 2026

Uh oh!

Mirko-von-Leipzig Jan 14, 2026

Uh oh!

Uh oh!

bobbinth Jan 14, 2026

Uh oh!

Mirko-von-Leipzig Jan 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Mirko-von-Leipzig commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

bobbinth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bobbinth Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

Mirko-von-Leipzig Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bobbinth Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

Mirko-von-Leipzig Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Mirko-von-Leipzig commented Jan 13, 2026 •

edited

Loading