New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peers list should include some unmatched broadhash peers to broadcast to #2025

Closed
jondubois opened this Issue May 11, 2018 · 5 comments

Comments

Projects
5 participants
@jondubois
Contributor

jondubois commented May 11, 2018

Expected behavior

When broadcasting, we should broadcast not only to peers which match our own broadhash but also to peers which do not match our broadhash (because they might be behind).

Actual behavior

Right now when doing a broadcast, a node will try to select 100 peers in total; if it finds fewer than 100 peers with a matching broadhash, then it will select remaining peers which do not match its broadhash.

If our node is able to find 100 peers which match its own broadhash, then it will not broadcast to any non-matching peers (after it selects 25 from those 100); this means that the remaining (non-matching) peers in the network may not receive the latest block from our node.

Steps to reproduce

Run on latest betanet; some of the nodes fall behind the latest height because some blocks are not reaching them.

Which version(s) does this affect? (Environment, OS, etc...)

Betanet 7.4

@jondubois jondubois self-assigned this May 11, 2018

@jondubois jondubois changed the title from Peers list should always include some unmatched broadhash peers to broadcast to to Peers list should include some unmatched broadhash peers to broadcast to May 11, 2018

@nazarhussain

This comment has been minimized.

Contributor

nazarhussain commented May 11, 2018

I believe broadcast should only be done to matched broad hash peers at least if we have them to maximum of our required limit. e.g. As we have 25 limit, so we should select 25 from matched broadhash and if we found less than we take more from unmatched boardhash. This is a good way to optimize network load if there is a real fork going on in the network. Consider that broadcast is almost a realtime event going in the network, and we should optimize it for network capacity.

The safeguard and fork recovery should rely on syncing mechanism. During syncing, it should ask look for every peer in network to build the histogram irrespective of broadhash. So it could pull the blocks and can perform fork recovery.

There can be edge cases in above syncing mechanism, that we should discuss and consider thoroughly further with actual cases.

@shuse2

This comment has been minimized.

Contributor

shuse2 commented May 11, 2018

I think main purpose of the broadcast is to broadcast whatever new information it gets to the peers that they know of.
I think for optimizing the network capacity, choosing peers that we connect to would be better.

I agree that syncing on the other hand, main purpose is to sync up to the network if the node falls behind. It should only take information from a "good peer" from network prospective (not node prospective).
However, all of the RPC call should be using the "good peer" from network prospective, too.

I think, in theory, we should be only connecting to "good peer" outbound and let "bad peer" to connect to you to become "good".
So broadcasting should be done in good peer & bad peer, while RPC call should be done only to "good peers".

@4miners

This comment has been minimized.

Member

4miners commented May 15, 2018

There is some inconsistency, possibly not the best logic also.

0.9.x flow:

  • save block
  • fire broadcast event:
    library.bus.message('newBlock', block, broadcast);
  • update broadhash:
    Transport.prototype.onNewBlock = function (block, broadcast) {
  • broadcast, but using previous broadhash
  • peers are notified about new headers when they receive broadcast

1.0.0 flow:

Verify.prototype.processBlock = function(block, broadcast, saveBlock, cb) {

  • broadcast block using current broadhash
  • save block
  • update broadhash
  • notify peers about new headers
    In that step we're calling Transport.broadcastHeaders which calls Peers.list without conditions. That function uses new broadhash - that behavior was probably not intended, as we should use same broadhash as when broadcasting the block.

So behavior is slightly different between versions. In 0.9.x we notify peers about our new broadhash at the moment that we broadcast to them, so every peer that we broadcasted to knows about our new headers. In 1.0.0 version we broadcast block prior to save step, then save and notify about our new headers, but we're notifying different peers that we broadcasted to (random 100 because of default limit). Notice, that we're broadcasting prior to apply and then nofify about headers after apply - other peers don't know that we already have that block and will broadcast it to us, there can be also some race conditions.

@jondubois

This comment has been minimized.

Contributor

jondubois commented May 15, 2018

We were able to find a fix to the betanet block propagation issue issue without having to change the broadcast algorithm. See #2031

Since the current broadcast algorithm is essentially the same as the one used in 0.9, I think that it makes sense to keep it as it is in order to minimise change/risk.

I agree with Shu though that it would be nice to not have to take into account the broadhash when doing broadcast but I guess it would require changes to peer discovery/selection first. I guess this would be an enhancement so could potentially be done after 1.0.

@jondubois

This comment has been minimized.

Contributor

jondubois commented May 22, 2018

This issue was abandoned in favour of #2042 - #2042 goes in a similar direction as this issue (in terms of loosening broadhash restrictions) but it demands a more complete set of changes. Recent beta tests have shown this approach to be successful. See #2043

@jondubois jondubois closed this May 22, 2018

Version 1.0.0 automation moved this from Open Issues to Closed Issues May 22, 2018

Sprint Board 07-05-18 automation moved this from New Issues to Closed Issues May 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment