New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add height voting for chits #2102
Conversation
Co-authored-by: Dan Laine <daniel.laine@avalabs.org> Signed-off-by: Stephen Buttolph <stephen@avalabs.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is the most important file to have 👀 on
zap.Stringer("lastAcceptedID", lastAcceptedID), | ||
zap.Error(err), | ||
) | ||
acceptedAtHeight = lastAcceptedID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the implications of responding with an ID that is not actually the accepted ID at that height? My understanding is that this occurs when the current node does not have any accepted id at height because it has not sync'd to that height yet. Would it make more sense to return some sort of empty ID or does the response effectively not matter in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And similar question for below where it seems like you can return an ID for a height after the requested height.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
height because it has not sync'd to that height yet.
this may also occur if the height is pruned or it was never fetched (bc state synced)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally people should only be requesting heights "close" to tip here. This case specifically handles the case where it is only safe to return Accepted state. It would also be safe to simply drop the requests here. But we do this to avoid causing the peer to hit a timeout.
It's safe for us to return any Accepted blocks here - but the most "useful" blocks to send are the lastAcceptedID + acceptedIDAtRequestedHeight. If we don't have access to the acceptedID at the requested height, we can return any accepted ID - which we just default to the last accepted ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably reduce our bandwidth a bit here by changing the proto format to have some logic for "inheriting" fields. In this code path we return 3 32-byte hashes but we could get away with sending 1-2 32-bytes hashes.
This feels like a follow-up though haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below - it is safe to return any preferred value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just nits on docs, otherwise, lgtm
} | ||
|
||
// Will record chits once [preferredID] and [preferredIDAtHeight] have been | ||
// issued into consensus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, even if we failed here https://github.com/ava-labs/avalanchego/pull/2102/files#diff-1b154e2d9b5b0909a66199417df6ff79ee9a18e83075b39ef2acf3634558342eR724-R736 (non-populated preferred id at height set to preferred id), this is safe? can we doc regarding v1.11 upgrade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The invariants for a correct node is that preferredID
transitively references both preferredIDAtHeight
and acceptedID
. So it would be safe to default preferredIDAtHeight
to either preferredID
or acceptedID
in the handler
ChainId: chainID[:], | ||
RequestId: requestID, | ||
PreferredId: preferredID[:], | ||
PreferredIdAtHeight: preferredIDAtHeight[:], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also allows us to apply votes on a failed chit, right? Or am I too optimistic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the AcceptedID
for failed chits. I don't think it would be safe to re-use the PreferredIDAtHeight
in the same way unless we included an additional boolean (to say if PreferredIDAtHeight
was accepted).
@@ -303,6 +303,7 @@ message PushQuery { | |||
uint64 deadline = 3; | |||
bytes container = 4; | |||
EngineType engine_type = 5; | |||
uint64 requested_height = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we populate requestedHeight last in the messages above then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order doesn't actually matter... I felt like the code flowed more naturally with the order I used... The only reason the proto uses this order is because it is required for compatibility
@@ -293,7 +307,7 @@ func (t *Transitive) Chits(ctx context.Context, nodeID ids.NodeID, requestID uin | |||
func (t *Transitive) QueryFailed(ctx context.Context, nodeID ids.NodeID, requestID uint32) error { | |||
lastAccepted, ok := t.acceptedFrontiers.LastAccepted(nodeID) | |||
if ok { | |||
return t.Chits(ctx, nodeID, requestID, lastAccepted, lastAccepted) | |||
return t.Chits(ctx, nodeID, requestID, lastAccepted, lastAccepted, lastAccepted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
} else { | ||
t.Sender.SendChits(ctx, nodeID, requestID, t.Consensus.Preference(), lastAccepted) | ||
var ok bool | ||
preferenceAtHeight, ok = t.Consensus.PreferenceAtHeight(requestedHeight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the above check be <= last accepted? last accepted is no longer considered processing, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last accepted block is still tracked by the consensus instance (and is faster to access than asking the VM). This is documented on the PreferenceAtHeight
method
zap.String("field", "PreferredIDAtHeight"), | ||
zap.Error(err), | ||
) | ||
// TODO: Require this field to be populated correctly after v1.11.x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an issue for tracking this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to #1965
) | ||
// If the requested height is higher than our preferred tip, we | ||
// don't prefer anything at the requested height yet. | ||
preferenceAtHeight = preference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- should we be responding with something empty here?
- will this require us to go try and fetch this height? based on the logic, as far as I understand it, we shouldn't necessarily go fetch
nextHeightToAccept
as it may not exist yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- In this case - the peer is going to drop whatever we give them, as they are further ahead than we are (anything we vote for here is already accepted by the peer). So all we really care about is
a) Respond so that they don't hit a timeout
a) Don't give a byzantine response (aka, something we don't prefer) - No - we will not attempt to fetch anything based on this logic
// Because we only return accepted state here, it's fairly likely | ||
// that the requested height is higher than the last accepted block. | ||
// That means that this code path is actually quite common. | ||
t.Ctx.Log.Debug("failed fetching accepted block", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we added the accepted block miss metric here too? or do we consider this a different scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I felt like this was different. This case is generally expected to be triggered... whereas the other case isn't. We could add another metric for this case specifically (and the other cases). But I'm not sure how much we really care about them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Why this should be merged
Reduces the consensus dependency on vote bubbling by explicitly voting for the next block to accept.
How this works
Introduces a new field to
PullQuery
andPushQuery
messagesRequestedHeight
and correspondingly addsPreferredIDByHeight
toChits
messages.When a node issues a query - it will request the block at height
+1
to its last accepted block. When a virtuous node responds, this means that the requesting node can always fall back to apply the vote on top of its last accepted block. This is important because this converts bubbling of votes from a liveness requirement to a performance optimization. This significantly simplifies the requirements around transitive voting.How this was tested
CI