-
Notifications
You must be signed in to change notification settings - Fork 0
dbft: skip sealing task awaiting during node sync #176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
feb21ed to
72418a6
Compare
|
Tested with stale watch-only T3 CN, works as expected: |
|
@roman-khimov, ready for review and merge. |
|
@chenquanyu, @txhsl, @nicolegys welcome to review and test, the problem is described in the commit message, but in short there's a possibility that stale CN node will hang during the sync process without this patch. |
roman-khimov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this affect initial sync only or can also happen on a synchronised node that was disconnected for some time?
consensus/dbft/dbft.go
Outdated
| c.requestTxs = f | ||
| } | ||
|
|
||
| // WithMinerInterrupted sets callback to indicate whether miner is interrupted sue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
due
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
consensus/dbft/dbft.go
Outdated
| // sync. In this case dBFT can't react properly on the newcoming blocks since no | ||
| // sealing task is expected from miner. | ||
| if c.minerInterrupted() { | ||
| log.Info("Skip chain block processing since miner is interrupted", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would look scary in logs. And it's a perfectly fine condition, not requiring "info".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
It affects any sync initiated by Downloader, since miner is tightly bound to the Downloader events and miner aways interrupts his work if Downloader is started: Lines 128 to 137 in 9faaba0
Right now I'm checking the latter case, will post the result here. |
OK, the node needs to be too stale for Downloader to be enabled (at least at the previous snap-sync block, AFAIU), so can't check it right now: Will create an issue to check it, but the fix should work in this case as far. |
If node is stale and far behind its peers, mining is aborted by the
signal of Downloader. If mining is aborted, then sealing task won't be
submitted to the consensus engine, and thus, dBFT awaiting for sealing
task may hang forever in case of multiple block batch processing by
Downloader. It may happen because dBFT's block notificaiton channel has
buffer size of 2 and block notification processing is managed by the
same eventLoop as sealing tasks awaiting. So dBFT blocks notifications
processing by sealing task awaiting, and thus, blocks the whole chain
from the further sync process:
```
2024-04-18T10:26:44.856+0300 DEBUG dbft@v0.0.0-20240130131324-6d59675ad370/dbft.go:229 caching message from future {"height": 6470, "view": 0, "cache": {}}
2024-04-18T10:26:45.264+0300 DEBUG dbft@v0.0.0-20240130131324-6d59675ad370/dbft.go:214 received message {"type": "Commit", "from": 4, "height": 6470, "view": 0, "my_height": 1, "my_view": 0}
2024-04-18T10:26:45.265+0300 DEBUG dbft@v0.0.0-20240130131324-6d59675ad370/dbft.go:229 caching message from future {"height": 6470, "view": 0, "cache": {}}
INFO [04-18|10:26:50.148] Imported new chain segment number=192 hash=775270..7b8b06 blocks=192 txs=0 mgas=0.000 elapsed=562.005ms mgasps=0.000 age=19h32m52s triedirty=0.00B
INFO [04-18|10:26:50.149] New block in the chain "dbft index"=1 "chain index"=192 hash=0x77527025ea64a0ceb26235efaf6ddc5430d0e2de15757869da2e90a1c57b8b06 "parent hash"=0xa9cde87fe65f34bf1f40982015e1a7e73b117a80fadf707b0deda67a34855092 primary=3 coinbase=0x1212000000000000000000000000000000000003 "mix digest"=0x23c10fa9c1fae49f6139db9c3ecff715de101c145e56e4d390a16e67d5c32e1b
INFO [04-18|10:26:50.149] Fetching latest sealing proposal "desired number"=193
INFO [04-18|10:26:50.152] Indexed transactions blocks=193 txs=0 tail=0 elapsed=3.786ms
INFO [04-18|10:26:50.770] Imported new chain segment number=384 hash=a5dac5..02375b blocks=192 txs=0 mgas=0.000 elapsed=472.546ms mgasps=0.000 age=18h57m12s triedirty=0.00B
INFO [04-18|10:26:51.344] Imported new chain segment number=576 hash=1d8a8e..3e8def blocks=192 txs=0 mgas=0.000 elapsed=570.798ms mgasps=0.000 age=18h21m29s triedirty=0.00B
INFO [04-18|10:26:52.221] Imported new chain segment number=960 hash=6fde0c..d75d78 blocks=384 txs=0 mgas=0.000 elapsed=868.154ms mgasps=0.000 age=17h10m6s triedirty=0.00B
INFO [04-18|10:27:07.683] Looking for peers peercount=1 tried=1 static=0
```
To fix this problem we have to omit block notifications processing until
the node sync process end, i.e. util the moment miner resumes its work.
Signed-off-by: Anna Shaleva <shaleva.ann@nspcc.ru>
If node is stale and far behind its peers, mining is aborted by the signal of Downloader. If mining is aborted, then sealing task won't be submitted to the consensus engine, and thus, dBFT awaiting for sealing task may hang forever in case of multiple block batch processing by Downloader. It may happen because dBFT's block notificaiton channel has buffer size of 2 and block notification processing is managed by the same eventLoop as sealing tasks awaiting. So dBFT blocks notifications processing by sealing task awaiting, and thus, blocks the whole chain from the further sync process:
To fix this problem we have to omit block notifications processing until the node sync process end, i.e. util the moment miner resumes its work.