Skip to content

Conversation

@yacovm
Copy link
Collaborator

@yacovm yacovm commented Oct 2, 2025

This commit makes the node request past ancestor blocks and notarizations it doesn't have.

It is needed in case the node notarizes one or more empty blocks while the rest of the nodes notarize blocks in these rounds.

@yacovm yacovm force-pushed the requestPreviousBlocks branch 2 times, most recently from b751a1f to 0571fa8 Compare October 9, 2025 12:58
@yacovm yacovm changed the title ... Request ancestor blocks for previous rounds Oct 9, 2025
@yacovm yacovm force-pushed the requestPreviousBlocks branch 3 times, most recently from 401e2f3 to 3374993 Compare October 9, 2025 17:42
This commit makes the node request past ancestor blocks and notarizations it doesn't have.

It is needed in case the node notarizes one or more empty blocks while the rest of the nodes notarize blocks in these rounds.

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>
@yacovm yacovm force-pushed the requestPreviousBlocks branch from 3374993 to 8cdf1c6 Compare October 10, 2025 20:20
Copy link
Collaborator

@samliok samliok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned this is a lot of code changes and additional complexity for a corner case. All we are trying to do is resend the notarization if we receive a block with a parent we don't know. Why can't our current logic with timeouts and replication solve this? When our node eventually times out on the round, it will send an empty vote. If that vote is for an old round, nodes will respond with their most recent round/seq as well as the round/seq for the stale empty vote.

Secondly and I think most importantly, if a node empty notarizes a round then it shouldn't be swayed and by notarizations right? Only a finalization should be able to change that nodes mind otherwise we can get a quorum of nodes that sign off on both an empty notarization & finalization for the same round.

recordedMessages := make(chan *Message, 100)
comm := &recordingComm{Communication: testutil.NoopComm(nodes), SentMessages: recordedMessages, BroadcastMessages: recordedMessages}

bb2 := &testutil.TestBlockBuilder{Out: make(chan *testutil.TestBlock, 1), BlockShouldBeBuilt: make(chan struct{}, 1)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why create a second BB?

t.Log("Last block is", lastBlock.BlockHeader().Seq, "for round", lastBlock.BlockHeader().Round)

leaderIndexOfLastBlock := (int(lastBlock.BlockHeader().Round)) % len(nodes)
voteOnLastBlock, err := testutil.NewTestVote(lastBlock, nodes[leaderIndexOfLastBlock])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this err value never gets checked

return nil, nil
}
seq := msg.NotarizedBlockRequest.Seq
for i, b := range blocks {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ik this is a test so it doesn't really matter as much, but would it be easier to store blocks as a map indexed by their seqs?


type NotarizedBlockResponse struct {
Block Block
VerifiedBlock VerifiedBlock
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should separate these into NotarizedBlockResponse and VerifiedNotarizedBlockResponse just like the other messages

e.increaseRound()
increasedRound = true
}
if increasedRound {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this important for this pr, or is it a separate bug fix?

func (e *Epoch) maybeCreateFinalizeVoteForAncestor(digest Digest) Digest {
for roundNum, round := range e.rounds {
if round.block.BlockHeader().Digest != digest {
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are checking for this, should we log a warn? every round in the rounds map should always have a block

zap.Int("size", len(recordBytes)),
zap.Stringer("digest", finalization.Finalization.BlockHeader.Digest))

e.finalizeAncestors(finalization.Finalization.Prev)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a short comment to why this is important, i feel like I am going to forget down the line

vote := message.Vote
from = vote.Signature.Signer

e.Logger.Debug("Handling block message", zap.Stringer("digest", md.Digest), zap.Uint64("round", md.Round))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this log being removed?

return e.Storage.NumBlocks()
}

func (e *Epoch) haveWeTimedOutOnRound(round uint64) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a method with a very similar name already. haveWeAlreadyTimedOutOnThisRound checks for timedOut while this checks for an emptyNotarization

return nil
}

if response.Block != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will always be true since we check the negation above

@yacovm
Copy link
Collaborator Author

yacovm commented Oct 14, 2025

Why can't our current logic with timeouts and replication solve this?

Because here we replicate rounds in the past, and the replication logic replicates rounds from the future.

When our node eventually times out on the round, it will send an empty vote. If that vote is for an old round, nodes will respond with their most recent round/seq as well as the round/seq for the stale empty vote.

Here the assumption is that all nodes are in the latest round. This doesn't apply for nodes that are behind.

Secondly and I think most importantly, if a node empty notarizes a round then it shouldn't be swayed and by notarizations right? Only a finalization should be able to change that nodes mind otherwise we can get a quorum of nodes that sign off on both an empty notarization & finalization for the same round.

If a node notarizes an empty round, it will not send a finalize vote for that round. However, it should still be receptive to blocks built on a valid alternative chain. There is no problem replicating notarizations as long as we remember not sending finalize votes for them.

Otherwise, a single node that missed one or more blocks may force the rest of the nodes to notarize empty rounds until it is the leader again.

Consider we have 4 nodes and one (node 1) has built and broadcast a block.
Nodes 2 and 3 received the block and send votes, but only node 2 manages to collect a notarization for the block.
Node 0 hasn't received the block at all, nor the votes or notarization.

The rest of the nodes (0, 1, 3) notarize an empty round for that round.
We are now in the next round where node 2 is the leader.
Now node 2 broadcasts a block built on top of the block that node 1 has built.
At this point, if nodes 0, 1 and 3 don't replicate the notarization and node 0 doesn't replicate the block, then we will notarize an empty block for that round and move to the next node, but this will incur needless wait time.

A bigger problem is if node 3 crashes in this round - then we will notarize an empty round for the block of node 2 and also for the block of node 3.

I'm concerned this is a lot of code changes and additional complexity for a corner case.

I tend to agree. If we're OK with unwanted and sub-optimal latency in case of a network failure then we can just agree to not address this corner case.

@yacovm yacovm closed this Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants