Check Steem issue 2658: Not producing block because node didn't wake up within 500ms of the slot time #1157

abitmore · 2018-07-18T20:14:57Z

Brief:

witness_node wakes up on every second to check whether it's time to produce a block; when block interval is 3 seconds, theoretically the node will try 3 times to produce its block on its time slot;
if the node missed the first chance (the first second), E.G. when latency of previous block is too high, on the 2nd and 3rd second, it's guaranteed (by the code) to fail due to "didn't wake up within 500ms".

bitshares-core/libraries/plugins/witness/witness.cpp

Lines 269 to 273 in af118f6

    
           if( llabs((scheduled_time - now).count()) > fc::milliseconds( 500 ).count() ) 
        
           { 
        
              capture("scheduled_time", scheduled_time)("now", now); 
        
              return block_production_condition::lag; 
        
           }

Generally, if latency of previous block is too high, the node will try to produce another block on the first second of its time slot. But in the scenario when witness list got shuffled after the previous block, the node may "suddenly" find it's its turn to produce next block, but will fail as described above.

Note: this is a minor issue, IMHO low priority.

Possible solution:

loose the "500ms" limitation, for example, to "block interval" or so?
Or just remove the check?

Please discuss.

pmconrad · 2018-08-23T12:59:53Z

We should define desirable behaviour before implementing fixes.

IMO, if a node hasn't received a block from its predecessor when its time slot has come, it should simply produce its own block in time.

Waiting until its own slot is nearly over is counterproductive, this will only propagate the problem to the next witness in line.
Producing a block on time is also desirable from the user's perspective, because that will get his transaction approved more quickly.
Not waiting for the previous block increases the chance of a fork between the previous block and the next. Waiting for the previous block increases the chance of a fork between this block and the next. So in the end waiting doesn't gain us anything (but increases the risk if the cause is not latency but witness failure).

There is one special case though: if the previous block has been received in time but takes a long time to apply (as is often the case in a maintenance block), then it makes sense to produce one even if our slot is nearing its end. I would extend the deadline in that special case only.

abitmore · 2018-08-23T16:14:18Z

@pmconrad I think your comment is not about the issue in OP, but more related to #504.

In this issue, the node was not waiting doing nothing until it's slot is nearly over, but on the contrary, it tried to produce a block on the 2nd second and the 3rd second but failed due to the timeout check. The reason why it didn't try to produce on the 1st second, is it was not its time slot to produce before received the high-latency block, that said, the high-latency block caused a schedule change.

pmconrad · 2018-08-27T12:45:33Z

Yes, I thought of a different scenario. Makes sense for a schedule change as well.

Changed block producing timeout to 2500 ms (#1157)

abitmore · 2018-08-27T17:02:27Z

Fixed by #1266.

abitmore added plugin bug 2a Discussion Needed Prompt for team to discuss at next stand up. 4a Low Priority Priority indicating minimal impact to system/user -OR- an inexpensive workaround exists labels Jul 18, 2018

abitmore added this to the 201810 - Non-Consensus-Changing Release milestone Aug 17, 2018

abitmore added a commit that referenced this issue Aug 17, 2018

Changed block producing timeout to 2500 ms (#1157)

d8136b7

abitmore mentioned this issue Aug 17, 2018

Changed block producing timeout to 2500 ms (#1157) #1266

Merged

abitmore added a commit that referenced this issue Aug 27, 2018

Merge pull request #1266 from bitshares/1157-block-produce-timeout

100f397

Changed block producing timeout to 2500 ms (#1157)

abitmore closed this as completed Aug 27, 2018

abitmore mentioned this issue Aug 27, 2018

Missing blocks due to high latency of maintenance block #504

Open

TheTaconator mentioned this issue Oct 15, 2018

Release Notes for BitShares Core 201810 #1381

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check Steem issue 2658: Not producing block because node didn't wake up within 500ms of the slot time #1157

Check Steem issue 2658: Not producing block because node didn't wake up within 500ms of the slot time #1157

abitmore commented Jul 18, 2018

pmconrad commented Aug 23, 2018

abitmore commented Aug 23, 2018

pmconrad commented Aug 27, 2018 •

edited

Loading

abitmore commented Aug 27, 2018

Check Steem issue 2658: Not producing block because node didn't wake up within 500ms of the slot time #1157

Check Steem issue 2658: Not producing block because node didn't wake up within 500ms of the slot time #1157

Comments

abitmore commented Jul 18, 2018

pmconrad commented Aug 23, 2018

abitmore commented Aug 23, 2018

pmconrad commented Aug 27, 2018 • edited Loading

abitmore commented Aug 27, 2018

pmconrad commented Aug 27, 2018 •

edited

Loading