Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add height voting for chits #2102

Merged
merged 22 commits into from Oct 5, 2023
Merged

Add height voting for chits #2102

merged 22 commits into from Oct 5, 2023

Conversation

StephenButtolph
Copy link
Contributor

@StephenButtolph StephenButtolph commented Sep 27, 2023

Why this should be merged

Reduces the consensus dependency on vote bubbling by explicitly voting for the next block to accept.

How this works

Introduces a new field to PullQuery and PushQuery messages RequestedHeight and correspondingly adds PreferredIDByHeight to Chits messages.

When a node issues a query - it will request the block at height +1 to its last accepted block. When a virtuous node responds, this means that the requesting node can always fall back to apply the vote on top of its last accepted block. This is important because this converts bubbling of votes from a liveness requirement to a performance optimization. This significantly simplifies the requirements around transitive voting.

How this was tested

CI

@StephenButtolph StephenButtolph added consensus This involves consensus networking This involves networking bubble votes labels Sep 27, 2023
@StephenButtolph StephenButtolph added this to the v1.10.12 milestone Sep 27, 2023
Base automatically changed from early-bubble-votes to dev September 28, 2023 21:56
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is the most important file to have 👀 on

@StephenButtolph StephenButtolph marked this pull request as ready for review October 3, 2023 20:16
@StephenButtolph StephenButtolph self-assigned this Oct 3, 2023
zap.Stringer("lastAcceptedID", lastAcceptedID),
zap.Error(err),
)
acceptedAtHeight = lastAcceptedID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the implications of responding with an ID that is not actually the accepted ID at that height? My understanding is that this occurs when the current node does not have any accepted id at height because it has not sync'd to that height yet. Would it make more sense to return some sort of empty ID or does the response effectively not matter in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And similar question for below where it seems like you can return an ID for a height after the requested height.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

height because it has not sync'd to that height yet.

this may also occur if the height is pruned or it was never fetched (bc state synced)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally people should only be requesting heights "close" to tip here. This case specifically handles the case where it is only safe to return Accepted state. It would also be safe to simply drop the requests here. But we do this to avoid causing the peer to hit a timeout.

It's safe for us to return any Accepted blocks here - but the most "useful" blocks to send are the lastAcceptedID + acceptedIDAtRequestedHeight. If we don't have access to the acceptedID at the requested height, we can return any accepted ID - which we just default to the last accepted ID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably reduce our bandwidth a bit here by changing the proto format to have some logic for "inheriting" fields. In this code path we return 3 32-byte hashes but we could get away with sending 1-2 32-bytes hashes.

This feels like a follow-up though haha

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below - it is safe to return any preferred value.

Copy link
Contributor

@gyuho gyuho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just nits on docs, otherwise, lgtm

message/outbound_msg_builder.go Outdated Show resolved Hide resolved
message/outbound_msg_builder.go Outdated Show resolved Hide resolved
snow/networking/sender/sender.go Outdated Show resolved Hide resolved
snow/networking/sender/sender.go Outdated Show resolved Hide resolved
}

// Will record chits once [preferredID] and [preferredIDAtHeight] have been
// issued into consensus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, even if we failed here https://github.com/ava-labs/avalanchego/pull/2102/files#diff-1b154e2d9b5b0909a66199417df6ff79ee9a18e83075b39ef2acf3634558342eR724-R736 (non-populated preferred id at height set to preferred id), this is safe? can we doc regarding v1.11 upgrade?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The invariants for a correct node is that preferredID transitively references both preferredIDAtHeight and acceptedID. So it would be safe to default preferredIDAtHeight to either preferredID or acceptedID in the handler

snow/engine/snowman/transitive.go Outdated Show resolved Hide resolved
snow/engine/snowman/transitive.go Outdated Show resolved Hide resolved
ChainId: chainID[:],
RequestId: requestID,
PreferredId: preferredID[:],
PreferredIdAtHeight: preferredIDAtHeight[:],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also allows us to apply votes on a failed chit, right? Or am I too optimistic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the AcceptedID for failed chits. I don't think it would be safe to re-use the PreferredIDAtHeight in the same way unless we included an additional boolean (to say if PreferredIDAtHeight was accepted).

@@ -303,6 +303,7 @@ message PushQuery {
uint64 deadline = 3;
bytes container = 4;
EngineType engine_type = 5;
uint64 requested_height = 6;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we populate requestedHeight last in the messages above then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order doesn't actually matter... I felt like the code flowed more naturally with the order I used... The only reason the proto uses this order is because it is required for compatibility

@@ -293,7 +307,7 @@ func (t *Transitive) Chits(ctx context.Context, nodeID ids.NodeID, requestID uin
func (t *Transitive) QueryFailed(ctx context.Context, nodeID ids.NodeID, requestID uint32) error {
lastAccepted, ok := t.acceptedFrontiers.LastAccepted(nodeID)
if ok {
return t.Chits(ctx, nodeID, requestID, lastAccepted, lastAccepted)
return t.Chits(ctx, nodeID, requestID, lastAccepted, lastAccepted, lastAccepted)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

} else {
t.Sender.SendChits(ctx, nodeID, requestID, t.Consensus.Preference(), lastAccepted)
var ok bool
preferenceAtHeight, ok = t.Consensus.PreferenceAtHeight(requestedHeight)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the above check be <= last accepted? last accepted is no longer considered processing, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last accepted block is still tracked by the consensus instance (and is faster to access than asking the VM). This is documented on the PreferenceAtHeight method

zap.String("field", "PreferredIDAtHeight"),
zap.Error(err),
)
// TODO: Require this field to be populated correctly after v1.11.x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an issue for tracking this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to #1965

)
// If the requested height is higher than our preferred tip, we
// don't prefer anything at the requested height yet.
preferenceAtHeight = preference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • should we be responding with something empty here?
  • will this require us to go try and fetch this height? based on the logic, as far as I understand it, we shouldn't necessarily go fetch nextHeightToAccept as it may not exist yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. In this case - the peer is going to drop whatever we give them, as they are further ahead than we are (anything we vote for here is already accepted by the peer). So all we really care about is
    a) Respond so that they don't hit a timeout
    a) Don't give a byzantine response (aka, something we don't prefer)
  2. No - we will not attempt to fetch anything based on this logic

@StephenButtolph StephenButtolph mentioned this pull request Oct 4, 2023
11 tasks
// Because we only return accepted state here, it's fairly likely
// that the requested height is higher than the last accepted block.
// That means that this code path is actually quite common.
t.Ctx.Log.Debug("failed fetching accepted block",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we added the accepted block miss metric here too? or do we consider this a different scenario?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt like this was different. This case is generally expected to be triggered... whereas the other case isn't. We could add another metric for this case specifically (and the other cases). But I'm not sure how much we really care about them.

Copy link
Contributor

@gyuho gyuho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@StephenButtolph StephenButtolph added this pull request to the merge queue Oct 5, 2023
Merged via the queue into dev with commit 5f9bb08 Oct 5, 2023
16 checks passed
@StephenButtolph StephenButtolph deleted the bubble-vote-messaging branch October 5, 2023 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bubble votes consensus This involves consensus networking This involves networking
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

5 participants