-
Notifications
You must be signed in to change notification settings - Fork 3.6k
1.8rc2 coredumps on recovering a corrupted state history #7420
Comments
rolled back the ZFS, repeated again, and now it runs smoothly. But I have 60GB of data in an archive. I will open it for downloading if you need |
Hey @cc32d9, could you give some more context on what happened to produce this segfault? |
@johndebord Ubuntu 18.04, gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04) I will make the archive available in a couple of hours. The context is the following: 1.8rc2 (gcc) with state history was running against head block of telos, then I produced a snapshot, and tried to start 1.8rc2 (clang) from it, with the result as described above. Then, rolled back to 1.8rc2 (gcc), waited a bit for it to catch up with the network, made a new snapshot, and 1.8rc2 (clang) started smoothly. so if you unpack the archive and start from snapshot that is in it, it will complain that state history is corrupted, will start recovering it, then will crash. |
There's a potential this is #7436 |
@cc32d9 We were unable to reproduce this problem. Here are the steps we took to test:
|
I don't think it's related to gcc. But rather related to an uncleanly closed state history archive. The recovery procedure seems to fail in some cases at the end of recovery. Could you try with the archive I provided? |
Now I have another, probably related, failure on the same server: I generated a snapshot, then stopped nodeos, but it appeared than it didn't stop cleanly. So I restored it from the same snapshot. As a result, nodeos occupies 100% of CPU and not opening any sockets. Seems like an endless loop. When starting from a snapshot, it was also recovering the state history. It seems like the state history recovery is not behaving well. |
@johndebord you don't need to replay the snapshot, but start from included state file |
I was running 1.8rc2 compiled with GCC, with state history enabled, and synchronized with Telos network. While being on the head block, I created a snapshot via
proucer_api_plugin
. Then I started 1.8rc2 from deb package with this snapshot, and it detected corruption in state history, started scanning it, and then coredumped. I'm packing and uploading the whole working directory now.The text was updated successfully, but these errors were encountered: