-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagate unreliable peers upwards from the trie accumulator to the block synchronizer #3594
Propagate unreliable peers upwards from the trie accumulator to the block synchronizer #3594
Conversation
Signed-off-by: Alexandru Sardan <alexandru@casperlabs.io>
The only time a peer has `Unknown` quality is when it was just added to the peer list and we haven't heard anything from it. If a peer that has `Unknown` quality is promoted we now make it `Reliable` since it means that an item we required was found on that node. Also if a peer with `Unknown` quality is demoted, we make it `Unreliable`. Signed-off-by: Alexandru Sardan <alexandru@casperlabs.io>
The trie accumulator uses a snapshot of the peer list provided by the block synchronizer through the global state synchronizer in order to get the required tries. Propagate the unreliable peers (that didn't have the requested data) upwards to the global state synchronizer and then to the block synchronizer. This gives feedback that the original peer list provided should be changed if another global sync request is made. Demote the peers received from the global state synchronizer in the block synchronizer. Also add a test to verify that the peer list is changed based on the feedback from the global state sync components. Signed-off-by: Alexandru Sardan <alexandru@casperlabs.io>
bors try |
tryBuild succeeded: |
#[tokio::test] | ||
async fn global_state_sync_wont_stall_with_bad_peers() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test! 👍
Fix a bug that was pointed out in code review related to the number of peers returned when getting qualified peers. Also add some unit tests for the peer list. Signed-off-by: Alexandru Sardan <alexandru@casperlabs.io>
0ae4df5
to
1122fd2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor change requested, otherwise LGTM.
TrieAccumulator(TrieAccumulatorError), | ||
#[error("ContractRuntime failed to put a trie into global state: {0}")] | ||
PutTrie(engine_state::Error), | ||
#[error("trie accumulator encountered an error while fetching a trie; unreliable peers {0:?}")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should avoid mixing debug output into display output. There's utils::DisplayIter
which could probably help here.
Same comment for the PutTrie
variant below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 18b802d
Signed-off-by: Alexandru Sardan <alexandru@casperlabs.io>
#[derive(Debug, Clone)] | ||
pub(crate) struct Response { | ||
hash: Digest, | ||
unreliable_peers: Vec<NodeId>, | ||
} | ||
|
||
impl Response { | ||
pub(crate) fn new(hash: Digest, unreliable_peers: Vec<NodeId>) -> Self { | ||
Self { | ||
hash, | ||
unreliable_peers, | ||
} | ||
} | ||
|
||
pub(crate) fn hash(&self) -> &Digest { | ||
&self.hash | ||
} | ||
|
||
pub(crate) fn unreliable_peers(self) -> Vec<NodeId> { | ||
self.unreliable_peers | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could do a two state enum for Success / Failure, and flatten the Error as a field on the failure variant; this would allow places that return Result<Response, Error> to be simplified to just return the enum variant (etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could to that though I'm more partial to having separate types that are clearly delimited. What I think we can do is have a trait implemented for both the Response
and Error
that can extract the unreliable peers which would make the code looking at the response type potentially more nicer. WDYT?
assert!(peer_list.is_peer_reliable(&test_peer)); | ||
} | ||
|
||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might as well add the unreliable_peer_remains_unreliable_if_demoted test case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in baa3461
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a nit and a suggestion to consider, but LGTM, gj
bors try |
tryBuild succeeded: |
Signed-off-by: Alexandru Sardan <alexandru@casperlabs.io>
bors try |
tryBuild succeeded: |
bors merge |
Build succeeded: |
The trie accumulator uses a snapshot of the peer list provided by the block synchronizer through the global state synchronizer in order to get the required tries.
Because we didn't provide the block synchronizer any feedback on what the peers that dind't have the global state data were, the block synchronizer may re-issue the global state sync request with the same peer set as for the previous failed attempt. This may lead to delays in getting the global state data.
This PR adds:
PeerQuality
is handled when demoting and promoting peers; now peers withReliable
orUnreliable
quality will not be set asUnknown
when demoting or promoting respectively.These changes reduce the risk that the global state synchronizer gets stuck asking the same peers all over again.
Further potential improvements that can be made:
Fixes: #3563