New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check Steem issue 2658: Not producing block because node didn't wake up within 500ms of the slot time #1157

Closed
abitmore opened this Issue Jul 18, 2018 · 4 comments

Comments

Projects
2 participants
@abitmore
Member

abitmore commented Jul 18, 2018

steemit/steem#2658

Brief:

  • witness_node wakes up on every second to check whether it's time to produce a block; when block interval is 3 seconds, theoretically the node will try 3 times to produce its block on its time slot;
  • if the node missed the first chance (the first second), E.G. when latency of previous block is too high, on the 2nd and 3rd second, it's guaranteed (by the code) to fail due to "didn't wake up within 500ms".

if( llabs((scheduled_time - now).count()) > fc::milliseconds( 500 ).count() )
{
capture("scheduled_time", scheduled_time)("now", now);
return block_production_condition::lag;
}

Generally, if latency of previous block is too high, the node will try to produce another block on the first second of its time slot. But in the scenario when witness list got shuffled after the previous block, the node may "suddenly" find it's its turn to produce next block, but will fail as described above.

Note: this is a minor issue, IMHO low priority.

Possible solution:

  • loose the "500ms" limitation, for example, to "block interval" or so?
  • Or just remove the check?

Please discuss.

@abitmore abitmore added this to To do in Feature release (201810) via automation Aug 17, 2018

abitmore added a commit that referenced this issue Aug 17, 2018

@pmconrad

This comment has been minimized.

pmconrad commented Aug 23, 2018

We should define desirable behaviour before implementing fixes.

IMO, if a node hasn't received a block from its predecessor when its time slot has come, it should simply produce its own block in time.

  • Waiting until its own slot is nearly over is counterproductive, this will only propagate the problem to the next witness in line.
  • Producing a block on time is also desirable from the user's perspective, because that will get his transaction approved more quickly.
  • Not waiting for the previous block increases the chance of a fork between the previous block and the next. Waiting for the previous block increases the chance of a fork between this block and the next. So in the end waiting doesn't gain us anything (but increases the risk if the cause is not latency but witness failure).

There is one special case though: if the previous block has been received in time but takes a long time to apply (as is often the case in a maintenance block), then it makes sense to produce one even if our slot is nearing its end. I would extend the deadline in that special case only.

@abitmore

This comment has been minimized.

Member

abitmore commented Aug 23, 2018

@pmconrad I think your comment is not about the issue in OP, but more related to #504.

In this issue, the node was not waiting doing nothing until it's slot is nearly over, but on the contrary, it tried to produce a block on the 2nd second and the 3rd second but failed due to the timeout check. The reason why it didn't try to produce on the 1st second, is it was not its time slot to produce before received the high-latency block, that said, the high-latency block caused a schedule change.

@pmconrad

This comment has been minimized.

pmconrad commented Aug 27, 2018

Yes, I thought of a different scenario. Makes sense for a schedule change as well.

abitmore added a commit that referenced this issue Aug 27, 2018

Merge pull request #1266 from bitshares/1157-block-produce-timeout
Changed block producing timeout to 2500 ms (#1157)
@abitmore

This comment has been minimized.

Member

abitmore commented Aug 27, 2018

Fixed by #1266.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment