New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trigger sync when unable to forge block due to low consensus #2073
Comments
@SargeKhan In what will it help? I doubt that sync will be able to complete during slot window, and even if - then it will affect next delegate because block will be forged at the very end of forging slot, so it's possible that it will reach rest of the network in next slot. |
We think the problem is not in the sync process or node staying behind.
|
@4miners, after discussing with the team, we've decided that we should start sync process if consensus is low and:
I think it's still a better option than not forging a block at all, and just missing a block slot. |
@SargeKhan I don't think that approach proposed here is a good one, for few reasons:
|
I agree with point 1, if consensus is low quite often that's probably the issue with propagation of blocks and that should be fixed on that layer. But the we develop syncing process as fail safe process to broadcasting. And it should run independent of knowing that (broadcast failed quite often or not). When our slot came (and concuss is low), we have to do these stuff in 10 seconds slot.
I feel that in that 10 seconds slot we should only do forging and broadcasting, and we should be prepared for it before. We have 109 slots (1090 secs) to prepare for forging, during that time we should prepare all pre-requisites for forging. One of which is to have high consensus and that requires syncing. So not just one block, even few blocks before time if we do syncing there are vital chances that forging slot goes fine. Your point 3 is interesting edge case, that we can cover by syncing on last slot of the round. So who ever the first one would be in delegate list of next round would have higher consensus. The summary of discussion is syncing is developed as fail safe process to broadcasting and it should be triggered on specific and defined use cases, not periodically. |
@nazarhussain Sync was never developed to be the fail safe to broadcasting. Main purpose of sync:
The time after which we trigger sync is not accidental. It's because when we didn't receive one block - we assume that delegate failed to forge that block and we not trigger sync. However 2 delegates that fail to forge a block in a row is highly unlikely - so we trigger sync after not receiving 2 blocks. Broadcasting process should be reliable, every forged block should hit all peers in the network, as soon as possible. Consensus should also be reliable - it was designed to mitigate possible chain splits by not allowing nodes to forge. Sync on last block of round is dangerous and can cause more issues. |
@4miners, I agree with your opinion. I will try investigating further why we broadcasts of blocks do not reach all the peers. And @MaciejBaj, what's your opinion on this topic? |
20 sec interval of sync is conciliatory anyway due to asynchronous nature of the process. Manually triggering the sync process from time to time we aren't changing the behaviour. Our main pain point is the broadcast process and the update headers not reaching the right amount of peers - 100 is our MAX_PEERS_LIMIT, possibly many more are stored on Peers List. As the result sync process triggers. In this case, it works as the recovery mechanism for the broadcast process not functioning correctly. I see the value in performing the sync process in one slot before when discovering that Consensus is poor. It's not the solution that guarantees that a block will be forged in the next slot, but increases that probability significantly by going back on the right track with the majority of the network. The corner case of the last block of the round might be ignored. Other 100 cases are valid.
It can happen anyway, would be nice to know more details. |
Not really matters if sync is triggered exactly when last receipt is 20 old or one second later (even 8 seconds later should be viable). Triggering sync manually because of low consensus (which happens too often currently because of issues with blocks/headers propagation) can cause more new issues than is expected to fix. In my opinion sync cannot be used as recovery or fail safe mechanism for broadcast process - it's a workaround to mitigate bad design/implementation. We should fix root cause instead of wasting time on countless patches that are not even proven to fix anything. Look at Consensus should be poor only when there is small fork or chain split. In large network it will not prevent from chain splits (requires only 51 peers to match our broadhash). I don't agree that sync will catch up with majority when we're at the beginning of a fork. Peers can have equal height, there can be no longer chain. Triggering sync manually one slot before delegate slot will increase possibility of a fork / chain split further. Possible scenario:
Another scenario:
Those scenarios meant to demonstrate that sync on unstable network will not fix network stability at all. There are also other bad scenarios. One again - we should fix the root cause - which is related to poor blocks propagation or exchange data (headers) between peers. Messing up with sync is bad. |
We will not proceed with that solution in favour of adjusting 'broadcastLimit' and 'maxRelays' constants. |
Parent #2080
Expected behavior
Before forging a block, if consensus is lower than minBroadhashConsensus value, then node should trigger sync operation.
Actual behavior
Node keeps logging consensus is low, and waits for sync trigger to happen.
Steps to reproduce
N/A
Which version(s) does this affect? (Environment, OS, etc...)
1.0.0
The text was updated successfully, but these errors were encountered: