Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash on exit with error "corrupted size vs. prev_size" #8450

Closed
matthewdarwin opened this issue Jan 16, 2020 · 11 comments
Closed

crash on exit with error "corrupted size vs. prev_size" #8450

matthewdarwin opened this issue Jan 16, 2020 · 11 comments
Assignees
Labels
bug

Comments

@matthewdarwin
Copy link

@matthewdarwin matthewdarwin commented Jan 16, 2020

To reproduce

  • compile 2.0.0
  • start nodeos with database-map-mode = locked.
  • stop nodeos normally and let it shutdown
  • observe the logs

Files are stored on ext4 SSD. Please advise if any additional environmental information is needed.

Note: using systemd here to manage nodeos

Jan 16 03:58:34 mar140 nodeos[5779]: info  2020-01-16T03:58:34.812 nodeos    net_plugin.cpp:3521           plugin_shutdown      ] exit shutdown
Jan 16 03:58:34 mar140 nodeos[5779]: CHAINBASE: Writing "reversible" database file, this could take a moment...
Jan 16 03:58:35 mar140 nodeos[5779]:               7% complete...
Jan 16 03:58:35 mar140 nodeos[5779]:            Syncing buffers...
Jan 16 03:58:35 mar140 nodeos[5779]:            Complete
Jan 16 03:58:36 mar140 nodeos[5779]: CHAINBASE: Writing "state" database file, this could take a moment...
Jan 16 03:58:37 mar140 nodeos[5779]:               0% complete...
Jan 16 03:58:38 mar140 nodeos[5779]:               3% complete...
Jan 16 03:58:39 mar140 nodeos[5779]:               6% complete...
Jan 16 03:58:40 mar140 nodeos[5779]:               9% complete...
Jan 16 03:58:41 mar140 nodeos[5779]:               12% complete...
Jan 16 03:58:42 mar140 nodeos[5779]:               14% complete...
Jan 16 03:58:43 mar140 nodeos[5779]:               17% complete...
Jan 16 03:58:44 mar140 nodeos[5779]:               20% complete...
Jan 16 03:58:45 mar140 nodeos[5779]:               23% complete...
Jan 16 03:58:46 mar140 nodeos[5779]:               26% complete...
Jan 16 03:58:47 mar140 nodeos[5779]:               29% complete...
Jan 16 03:58:48 mar140 nodeos[5779]:               32% complete...
Jan 16 03:58:49 mar140 nodeos[5779]:               37% complete...
Jan 16 03:58:50 mar140 nodeos[5779]:               48% complete...
Jan 16 03:58:51 mar140 nodeos[5779]:               58% complete...
Jan 16 03:58:52 mar140 nodeos[5779]:               68% complete...
Jan 16 03:58:53 mar140 nodeos[5779]:               79% complete...
Jan 16 03:58:54 mar140 nodeos[5779]:               89% complete...
Jan 16 03:58:54 mar140 nodeos[5779]:            Syncing buffers...
Jan 16 03:59:12 mar140 nodeos[5779]:            Complete
Jan 16 03:59:25 mar140 nodeos[5779]: corrupted size vs. prev_size
Jan 16 03:59:25 mar140 systemd[1]: nodeos.service: Main process exited, code=killed, status=6/ABRT
@matthewdarwin

This comment has been minimized.

Copy link
Author

@matthewdarwin matthewdarwin commented Jan 16, 2020

Note: the database is not corrupted and works fine on next start, so this issue is of the "annoyance" type rather than a serious problem that is corrupting data.

@matthewdarwin

This comment has been minimized.

Copy link
Author

@matthewdarwin matthewdarwin commented Jan 16, 2020

The pre-built eosio_2.0.0-1-ubuntu-18.04_amd64.deb works fine and does not crash

Prebuilt:

ldd /usr/bin/nodeos
        linux-vdso.so.1 (0x00007fff10689000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fceb3bea000)
        libssl.so.1.1 => /lib/x86_64-linux-gnu/libssl.so.1.1 (0x00007fceb3b58000)
        libcrypto.so.1.1 => /lib/x86_64-linux-gnu/libcrypto.so.1.1 (0x00007fceb386f000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fceb3651000)
        libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x00007fceb35ce000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fceb35ad000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fceb35a1000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fceb341e000)
        libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007fceb3404000)
        libicuuc.so.60 => /lib/x86_64-linux-gnu/libicuuc.so.60 (0x00007fceb304d000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fceb3033000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fceb2e72000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fceb6f89000)
        libicudata.so.60 => /lib/x86_64-linux-gnu/libicudata.so.60 (0x00007fceb12c7000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fceb1143000)

My build:

$ ldd /usr/bin/nodeos
        linux-vdso.so.1 (0x00007ffd8b3f8000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f84c12cf000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f84c12ae000)
        libtinfo.so.5 => /lib/x86_64-linux-gnu/libtinfo.so.5 (0x00007f84c1282000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f84c1064000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f84c0ee1000)
        libssl.so.1.1 => /lib/x86_64-linux-gnu/libssl.so.1.1 (0x00007f84c0e4f000)
        libcrypto.so.1.1 => /lib/x86_64-linux-gnu/libcrypto.so.1.1 (0x00007f84c0b64000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f84c0b5a000)
        libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f84c0ad7000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f84c0abd000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f84c08fc000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f84c12da000)

(quite different)

My build script:

./scripts/eosio_build.sh -s EOS -P -y -i xxxx

on

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
uname -a
Linux build 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64 x86_64 x86_64 GNU/Linux
@matthewdarwin

This comment has been minimized.

Copy link
Author

@matthewdarwin matthewdarwin commented Jan 16, 2020

(ubuntu build server is running inside a debian lxc container)

@matthewdarwin

This comment has been minimized.

Copy link
Author

@matthewdarwin matthewdarwin commented Jan 16, 2020

After further testing, it is not exactly consistent as to when the problem happens. Sometimes my custom compiled binary exists cleanly. So probably need to repeatedly start/stop nodeos to reproduce the problem.

@heifner

This comment has been minimized.

Copy link
Contributor

@heifner heifner commented Jan 24, 2020

I have one report of someone getting this error when not running database-map-mode = locked

@xebb82

This comment has been minimized.

Copy link

@xebb82 xebb82 commented Jan 24, 2020

I have one report of someone getting this error when not running database-map-mode = locked

On three different machines.
One of them in lxc and two directly on the host.

All three after the nodeos instances was restarted because they didn't sync blocks anymore.

@matthewdarwin

This comment has been minimized.

Copy link
Author

@matthewdarwin matthewdarwin commented Jan 24, 2020

My issues happened on just a normal nodeos restart, ie, I wasn't restarting nodeos to fix some problem like what Eric is reporting, but rather to update configuration or something.

@matthewdarwin

This comment has been minimized.

Copy link
Author

@matthewdarwin matthewdarwin commented Jan 24, 2020

I have also seen different signals. eg also SEGV.

@n8d

This comment has been minimized.

Copy link

@n8d n8d commented Jan 26, 2020

I have seen this error on 1.8 as well, and with default mapped map mode. I should be able to get a core file.

@n8d

This comment has been minimized.

Copy link

@n8d n8d commented Jan 26, 2020

Here is a backtrace from 1.8.9:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/nodeos --data-dir /data/eos/main --config-dir /etc/nodeos --config con'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

Thread 1 (Thread 0x7fa7d355d980 (LWP 24973)):
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007fa7d11ee801 in __GI_abort () at abort.c:79
#2  0x00007fa7d1237897 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fa7d1364b9a "%s\n")
    at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007fa7d123e90a in malloc_printerr (str=str@entry=0x7fa7d1362c9d "corrupted size vs. prev_size") at malloc.c:5350
#4  0x00007fa7d123eb0c in malloc_consolidate (av=av@entry=0x7f897c000020) at malloc.c:4456
#5  0x00007fa7d124603b in _int_free (have_lock=0, p=<optimized out>, av=0x7f897c000020) at malloc.c:4362
#6  __GI___libc_free (mem=0x7f897c3c4c50) at malloc.c:3124
#7  0x00000000008869f8 in boost::asio::detail::executor_op<boost::asio::detail::work_dispatcher<boost::asio::executor_binder<void eosio::http_plugin_impl::handle_http_request<eosio::detail::asio_with_stub_log<websocketpp::transport::asio::basic_socket::endpoint> >(websocketpp::server<eosio::detail::asio_with_stub_log<websocketpp::transport::asio::basic_socket::endpoint> >::connection_ptr)::{lambda()#1}, appbase::execution_priority_queue::executor> >, std::__1::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, std::__1::allocator<void>*, boost::system::error_code const&, unsigned long) ()
#8  0x000000000045a9ad in boost::asio::detail::scheduler::shutdown() ()
#9  0x000000000045dbc9 in std::__1::__shared_ptr_emplace<boost::asio::io_context, std::__1::allocator<boost::asio::io_context> >::__on_zero_shared() ()
#10 0x00000000004553b6 in appbase::application::exec() ()
#11 0x000000000044a1e1 in main ()```
@spoonincode spoonincode changed the title v2.0.0 will crash on exit with error "corrupted size vs. prev_size" when database-map-mode = locked crash on exit with error "corrupted size vs. prev_size" Jan 26, 2020
@spoonincode spoonincode added the bug label Jan 26, 2020
@heifner heifner self-assigned this Jan 27, 2020
@matthewdarwin

This comment has been minimized.

Copy link
Author

@matthewdarwin matthewdarwin commented Jan 27, 2020

Should be fixed now. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.