Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

1.8rc2 coredumps on recovering a corrupted state history #7420

Closed
cc32d9 opened this issue May 27, 2019 · 9 comments
Closed

1.8rc2 coredumps on recovering a corrupted state history #7420

cc32d9 opened this issue May 27, 2019 · 9 comments
Labels

Comments

@cc32d9
Copy link
Contributor

cc32d9 commented May 27, 2019

I was running 1.8rc2 compiled with GCC, with state history enabled, and synchronized with Telos network. While being on the head block, I created a snapshot via proucer_api_plugin. Then I started 1.8rc2 from deb package with this snapshot, and it detected corruption in state history, started scanning it, and then coredumped. I'm packing and uploading the whole working directory now.

root@eosio:/srv/telos# rm -r data/state/* data/blocks/
root@eosio:/srv/telos# /usr/bin/nodeos --data-dir /srv/telos/data --config-dir /srv/telos/etc --disable-replay-opts --snapshot=data/snapshots/snapshot-01b12ec80e18b8ee17e0a7f6a31cf7aa1fb2dbd6dc2f9fe870d307f28a14cb49.bin 
info  2019-05-27T09:24:13.693 nodeos    chain_plugin.cpp:556          plugin_initialize    ] initializing chain plugin
info  2019-05-27T09:24:13.713 nodeos    chain_plugin.cpp:408          operator()           ] Support for builtin protocol feature 'PREACTIVATE_FEATURE' (with digest of '0ec7e080177b2c02b278d5088611686b49d739925a92d9bfcacd7fc6b74053bd') is enabled without activation restrictions
info  2019-05-27T09:24:13.713 nodeos    chain_plugin.cpp:395          operator()           ] Support for builtin protocol feature 'ONLY_LINK_TO_EXISTING_PERMISSION' (with digest of '1a99a59d87e06e09ec5b028a9cbb7749b4a5ad8819004365d02dc4379a8b7241') is enabled with preactivation required
info  2019-05-27T09:24:13.713 nodeos    chain_plugin.cpp:395          operator()           ] Support for builtin protocol feature 'FORWARD_SETCODE' (with digest of '2652f5f96006294109b3dd0bbde63693f55324af452b799ee137a81a905eed25') is enabled with preactivation required
info  2019-05-27T09:24:13.713 nodeos    chain_plugin.cpp:395          operator()           ] Support for builtin protocol feature 'REPLACE_DEFERRED' (with digest of 'ef43112c6543b88db2283a2e077278c315ae2c84719a8b25f25cc88565fbea99') is enabled with preactivation required
info  2019-05-27T09:24:13.713 nodeos    chain_plugin.cpp:395          operator()           ] Support for builtin protocol feature 'NO_DUPLICATE_DEFERRED_ID' (with digest of '4a90c00d55454dc5b059055ca213579c6ea856967712a56017487886a4d4cc0f') is enabled with preactivation required
info  2019-05-27T09:24:13.713 nodeos    chain_plugin.cpp:395          operator()           ] Support for builtin protocol feature 'RAM_RESTRICTIONS' (with digest of '4e7bf348da00a945489b2a681749eb56f5de00b900014e137ddae39f48f69d67') is enabled with preactivation required
info  2019-05-27T09:24:13.713 nodeos    chain_plugin.cpp:395          operator()           ] Support for builtin protocol feature 'DISALLOW_EMPTY_PRODUCER_SCHEDULE' (with digest of '68dcaa34c0517d19666e6b33add67351d8c5f69e999ca1e37931bc410a297428') is enabled with preactivation required
info  2019-05-27T09:24:13.713 nodeos    chain_plugin.cpp:395          operator()           ] Support for builtin protocol feature 'ONLY_BILL_FIRST_AUTHORIZER' (with digest of '8ba52fe7a3956c5cd3a656a3174b931d3bb2abb45578befc59f283ecd816a405') is enabled with preactivation required
info  2019-05-27T09:24:13.713 nodeos    chain_plugin.cpp:395          operator()           ] Support for builtin protocol feature 'RESTRICT_ACTION_TO_SELF' (with digest of 'ad9e3d8f650687709fd68f4b90b41f7d825a365b02c23a636cef88ac2ac00c43') is enabled with preactivation required
info  2019-05-27T09:24:13.713 nodeos    chain_plugin.cpp:395          operator()           ] Support for builtin protocol feature 'FIX_LINKAUTH_RESTRICTION' (with digest of 'e0fb64b1085cc5538970158d05a009c24e276fb94e1a0bf6a528b48fbc4ff526') is enabled with preactivation required
info  2019-05-27T09:24:13.714 nodeos    chain_plugin.cpp:395          operator()           ] Support for builtin protocol feature 'GET_SENDER' (with digest of 'f0af56d2c5a48d60a4a5b5c903edfb7db3a736a94ed589d0b797df33ff9d3e1d') is enabled with preactivation required
info  2019-05-27T09:24:13.950 nodeos    http_plugin.cpp:465           plugin_initialize    ] configured http to listen on 0.0.0.0:8889
warn  2019-05-27T09:24:13.951 nodeos    producer_api_plugin.cp:145    plugin_initialize    ] 
**********SECURITY WARNING**********
*                                  *
* --        Producer API        -- *
* - EXPOSED to the LOCAL NETWORK - *
* - USE ONLY ON SECURE NETWORKS! - *
*                                  *
************************************

info  2019-05-27T09:24:13.951 nodeos    state_history_plugin.c:605    plugin_initialize    ] ip_port: 0.0.0.0:8081 host: 0.0.0.0 port: 8081 
info  2019-05-27T09:24:14.035 nodeos    state_history_log.hpp:217     open_log             ] trace_history.log has blocks 2-28389481
error 2019-05-27T09:24:14.143 nodeos    state_history_log.hpp:150     get_last_block       ] corrupt chain_state_history.log (2)
info  2019-05-27T09:24:14.143 nodeos    state_history_log.hpp:170     recover_blocks       ] recover chain_state_history.log
info  2019-05-27T09:40:31.794 nodeos    state_history_log.hpp:217     open_log             ] chain_state_history.log has blocks 2-28389480
info  2019-05-27T09:40:31.970 nodeos    http_plugin.cpp:412           operator()           ] configured http with Access-Control-Allow-Origin: *
Segmentation fault (core dumped)

@cc32d9
Copy link
Contributor Author

cc32d9 commented May 27, 2019

rolled back the ZFS, repeated again, and now it runs smoothly. But I have 60GB of data in an archive. I will open it for downloading if you need

@johndebord
Copy link
Contributor

Hey @cc32d9, could you give some more context on what happened to produce this segfault?
In particular, which operating system and version of GCC are you using? Also, getting access to that 60GB archive would be useful as well.

@cc32d9
Copy link
Contributor Author

cc32d9 commented May 28, 2019

@johndebord Ubuntu 18.04, gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04)

I will make the archive available in a couple of hours.

The context is the following: 1.8rc2 (gcc) with state history was running against head block of telos, then I produced a snapshot, and tried to start 1.8rc2 (clang) from it, with the result as described above.

Then, rolled back to 1.8rc2 (gcc), waited a bit for it to catch up with the network, made a new snapshot, and 1.8rc2 (clang) started smoothly.

so if you unpack the archive and start from snapshot that is in it, it will complain that state history is corrupted, will start recovering it, then will crash.

@cc32d9
Copy link
Contributor Author

cc32d9 commented May 28, 2019

@tbfleming
Copy link
Contributor

There's a potential this is #7436

@johndebord
Copy link
Contributor

@cc32d9 We were unable to reproduce this problem. Here are the steps we took to test:

  1. Built EOSIO/eos v1.8.0-rc2 with gcc

  2. Created a snapshot

  3. Started replay with said snapshot; no segfault encountered

  4. Grabbed and built the available debian package of EOSIO/eos v1.8.0-rc2 with clang

  5. Started replay with aforementioned snapshot; no segfault encountered

@cc32d9
Copy link
Contributor Author

cc32d9 commented Jun 3, 2019

I don't think it's related to gcc. But rather related to an uncleanly closed state history archive. The recovery procedure seems to fail in some cases at the end of recovery. Could you try with the archive I provided?

@cc32d9
Copy link
Contributor Author

cc32d9 commented Jun 3, 2019

Now I have another, probably related, failure on the same server: I generated a snapshot, then stopped nodeos, but it appeared than it didn't stop cleanly. So I restored it from the same snapshot. As a result, nodeos occupies 100% of CPU and not opening any sockets. Seems like an endless loop. When starting from a snapshot, it was also recovering the state history.

It seems like the state history recovery is not behaving well.

@cc32d9
Copy link
Contributor Author

cc32d9 commented Jun 6, 2019

@johndebord you don't need to replay the snapshot, but start from included state file

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants