-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
View Change process gets stuck at Share phase and restarts whenever one of the MPK miners goes down (or intentionally rejects share) during Share phase #190
Comments
|
The logs that helped identify this were from https://two.devnet-0chain.net/miner*. Following sequence of logs seems to repeat continuously.
|
Following is the full log file from miner01. |
Round timeout would not move to publish phase. If the miner received |
Hi @peterlimg, My concern is not regarding whether a miner can participate again when the DKG process restarts. |
allow share phase to partially fail and still move to publish phase #190
Closed by #191 |
In the DKGProcess() method in 0chain.net/miner/protocol_view_change.go,
a new phase event is allowed to execute on following conditions:
share
phasestart
phaseBut this logic breaks the view-change process because in the current implementation when a
share
phase function keeps returning error, the current phase doesn't move. Re-execution ofshare
phase is okay because periodic phase event would force miner to re-share it's shares to other MPK miners. But when the phase completion time (in terms of rounds) expires, the MinerSC smart contract moves topublish
phase, and apublish
phase event is received by all DKG miners. But the miners that were stuck in re-executingshare
phase because they were not able to get verification from at least one of the DKG miners and weren't able to move phase fromcontribute
toshare
, would not be able to proceed because of the condition 3 mentioned above. The current phase is atcontribute
, but the phase event received ispublish
(contribute
+ 2). This shows up as "dkg process -- jumping over a phase" debug log in 0chain.log.Solution for this would be to allow
share
phase to move current phase fromcontribute
toshare
, even if theshare
phase function fails to send shares to and receive verification from few of the MPK miners. And allow theshare
phase to be retried every "repeat" interval (until next phase is started), so that it might succeed this time if there were some network issues before. This will ensure the current phase has moved fromcontribute
toshare
duringshare
phase even if few MPK miners failed to verify shares (for whatever reason). And the newpublish
phase event would execute as expected in condition 3.Terminologies:
DKG Miners: All miners selected for DKG process
MPK Miners: All DKG miners who submitted their MPK to the BC
The text was updated successfully, but these errors were encountered: