Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double QBFT timers for lead rounds #2092

Closed
corverroos opened this issue Apr 12, 2023 · 2 comments
Closed

Double QBFT timers for lead rounds #2092

corverroos opened this issue Apr 12, 2023 · 2 comments
Assignees
Labels
protocol Protocol Team tickets

Comments

@corverroos
Copy link
Contributor

corverroos commented Apr 12, 2023

🎯 Problem to be solved

We currently reset QBFT round timers on receipt of any justified pre-prepare even for the current/active round. This results in non-leader increasing their round durations (up to double that of leader) which results in the round 1 leader sometimes getting very out of sync with the rest (since leaders always reset at the start of the round which is an effective noop). This causes sporadic consensus timeouts in the case where no redundant peers are online.

🛠️ Proposed solution

Refactor round timer to double the original round duration irrespective of when then "reset" is called.

@github-actions github-actions bot added the protocol Protocol Team tickets label Apr 12, 2023
@corverroos corverroos changed the title Increase QBFT round duration based on probable cause Do not reset QBFT timers for same round Apr 13, 2023
@corverroos
Copy link
Contributor Author

These logs show how the leader get out of sync with others due to non-leaders resetting round timer almost at the end of round 1
image

@corverroos corverroos changed the title Do not reset QBFT timers for same round Double QBFT timers for lead rounds Apr 17, 2023
@corverroos
Copy link
Contributor Author

corverroos commented Apr 17, 2023

So the idea behind reset the timer on JustifiedPrePrepare is: "short rounds if leader is down, extended rounds if leader is online"

the problem is the round extention isn't aligned. A different approach is to say, if I get a PrePrepare, I double the round duration. So instead of resetting, which subtracts the remaining round duration and adds a full round duration, one just add a round duration. That should mitigate network latency affects.

Previous

 |----1----|
     PP
      |----1----|

Suggested

 |----1----|
     PP
           |----1----|

So as long as the PrePrepare is received anywhere in the round, all nodes "extend" their rounds in the same way, so the rounds remain aligned to their relative start times.

Node1:    |----1----|
          PP
          |----1----|   

Node2:        |----1----|
                       PP
                       |----1----|

Node3:  |----1----|
             PP
             |----1----|

vs

Node1:    |----1----|
          PP
                    |----1----|   

Node2:        |----1----|
                       PP
                        |----1----|

Node3:  |----1----|
             PP
                  |----1----|

This means the round extension doesn't increase "out-of-syncness". Unless a node receives the PrePrepare in the next round. Then it does increase "out of syncness", but that is also preset in existing strategy and is the trade-off of extended rounds in general.

Node1:    |----1----|
          PP
                    |----1----|   

Node2:        |----1----|
                          PP
                        |----2----|

Node3:  |----1----|
             PP
                  |----1----|

One way to mitigate this is to omit large values inside the pre-prepare and by sending that as a separate second message. But this is out of scope of this ticket.

obol-bulldozer bot pushed a commit that referenced this issue Apr 17, 2023
Introduces alpha feature `QBFTDoubleLeadTimer` that attempts to address sporadic consensus timeouts on block proposals by doubling round timers on receive justified pre-prepare for the current round.

category: feature
ticket: #2092
obol-bulldozer bot pushed a commit that referenced this issue Apr 26, 2023
Refactors the timer selection alpha feature to AB test 3 different timers using a round-robin selection. This allows much better 1-to-1 comparison using metrics.  Note this introduces a 3rd timer, the exponential timer that the simulator showed is very robust and as performant as the increasing timer.

category: feature
ticket: #2092
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
protocol Protocol Team tickets
Projects
None yet
Development

No branches or pull requests

1 participant