-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chain rewind and failure to sync with err=missing parent
after non-graceful restart of nodes
#1117
Comments
In some instances where the node doesn't throw
and the node is unable to push blocks into the This issue appears to be fixed in ethereum/go-ethereum#20287 There's also a significant re-write involving the functionality for chain repair in |
hi @vdamle I did the following to reproduce the issue: Raft:
Istanbul:
Can you give me the following details about your network so that I could analyse further:
|
Hi @amalrajmani , thank you for taking a look at this. We have hit this error in multiple Kaleido environments running
|
@vdamle can you share the full geth log of the failed node. The log should have all the messages from the time it was restarted. |
Hi @amalrajmani sorry for the confusion caused by my previous response - the
command line arguments in same as earlier (same
|
hi @vdamle |
hi @vdamle If you run geth in gcmode=archive or do a graceful restart(docker stop/start. Don't use docker kill) this error won’t occur. |
Thanks for confirming, @amalrajmani . Correct, I'm aware of the fix in Geth I've checked in Quorum Slack about plans for moving to a release of Geth >= 1.9.20 and did not receive any response. I see a PR opened for moving to As one would expect, most nodes do not run as Do you have any estimate for when Quorum intends to incorporate a newer release of Geth that will address this issue? |
re-opening as we still need to validate on the latest GoQuorum |
Hi @vdamle. Can you test the reproduction of this issue using the |
@ricardolyn - Thanks for the PRs to move the Geth version forward! I will test this in the next day or so and let you know. |
@ricardolyn I've run into an unrelated issue using code from the latest master: https://go-quorum.slack.com/archives/C825QTQ1Z/p1616037777005000 . Would really like to resolve that before testing this, so that I don't have to test again with private transactions enabled. Will keep you posted on my progress. |
@vdamle any update on this testing? thank you |
Hi @ricardolyn - Apologies for the delay. I attempted to reproduce the issue with the changes in master and haven't been able to reproduce it. It seems ok to resolve this issue and re-visit with a new issue if we hit anything of this nature again. Thank you for the updates! |
that's good news @vdamle! thank you. we will be releasing soon this version after we finalise some validation. |
System information
Geth version:
Geth/v1.9.7-stable-c6ed3ed2(quorum-v20.10.0)
OS & Version:
linux-amd64
Branch, Commit Hash or Release:
quorum-v20.10.0
Expected behaviour
On a non-graceful shutdown/restart of a
gcmode=full, syncmode=full
node, it is expected that the local full block may rewind to a block in the past. However, the node must be able to sync to latest by fetching missing headers/blocks and rebuild any missing state.Actual behaviour
The node must be able to fetch missing headers/blocks and rebuild state from the previous full block to catch up with the rest of the chain. Instead, we see that the node
Another instance:
Steps to reproduce the behaviour
Perform a non-graceful restart of a node with blocks in both Ancients/Freezer and Level DB on
Quorum v20.10.0
Backtrace
None
The text was updated successfully, but these errors were encountered: