-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't restart node with cosmos-sdk v0.38.0 #5570
Comments
@AdityaSripal Is working on a fix to this. |
I agree that the fix will work for the current sdk. However, it does break the generality of MultiStore. A lot of work was made on such an abstract store than can use multiple sub-dbs, like ethereum patricia tree, under one root. This pruning approach only applied to the iavl substores. In 99+% of the cases currently this is the only substore used, so please make the fix and get v0.38.1 out. But also note that this adds tech debt (making rootmultistore only usable by the iavl substore), so please make an issue on that and start working on a proper design that doesn't couple the two so closely |
I think we can tackle this w/o introducing tech-debt, of which I've spent the better part of the last two months trying to reduce so I know the pain. Instead of introducing changes to the root multistore, can we push the fix down to the IAVL store -- most likely in |
Edited by @AdityaSripal
Cause of Bug
With the new Pruning changes, the IAVL only flushes to disk at each snapshot interval defined by the SDK
KeepEvery
parameter. On restart, the application should replay blocks from the last persisted version (or should replay from an empty state if nothing has been persisted). However, the CommitInfo needs to contain the last persisted commit, rather than the latest commit so that the tendermint process can restart the application correctly.Solution
A couple changes need to be integrated into the SDK
Here
{KeepRecent, FlushEvery}
form the IAVL PruningOptions{KeepRecent, KeepEvery}
.The SDK will on each commit of a
FlushEvery
version, remove the lastFlushEvery
version unless the last version is a snapshot version which is defined with theSnapshotEvery
parameter.Thanks to @ethanfrey and @zmanian for help diagnosing issue and helping with solution
End of edit
Summary of Bug
I started the migration of cyber to the latest SDK v0.38.0.
After refactoring of application and modules it built and ran but I found after node restart it crashes with consensus failure every time. I spent holidays trying to fix this think this is an application problem this but after tried to check bumped to 38 Gaia version and took the same issue.
Upgraded to 0.38.0 code, single node, start, stop, restart -> failure.
Stacktrace, restarting Gaia node
It looks like this is some storage issues. It first halts with mint module during BeginBlock but I checked that this is the same with other modules in OrderBeginBlockers.
Version
Cosmos-SDK release v0.38.0
Gaia b2f508950d11897fdc89924fad81b1045379a937
Steps to Reproduce
Take provided in version section gaia commit and
Note
I initially asked @ethanfrey about this and he confirmed SDK's issue in Wasmd project, CosmWasm/wasmd#54
Update
@ethanfrey provided more deep details, CosmWasm/wasmd#54
For Admin Use
The text was updated successfully, but these errors were encountered: