Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Witness_node crash when starting ElasticSearch plugin #2490

Closed
abitmore opened this issue Jul 23, 2021 · 2 comments · Fixed by #2495
Closed

Witness_node crash when starting ElasticSearch plugin #2490

abitmore opened this issue Jul 23, 2021 · 2 comments · Fixed by #2495

Comments

@abitmore
Copy link
Member

abitmore commented Jul 23, 2021

Bug Description

3118325ms th_a       application.cpp:1051          startup_plugins      ] Starting plugin elasticsearch

Thread 1 "witness_node" received signal SIGSEGV, Segmentation fault.
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
65      ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory.
(gdb) bt
#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
#1  0x00007ffff7bade10 in ?? () from /lib/x86_64-linux-gnu/libcurl.so.4
#2  0x00007ffff7bb73f8 in ?? () from /lib/x86_64-linux-gnu/libcurl.so.4
#3  0x00007ffff7bb89d1 in curl_multi_perform () from /lib/x86_64-linux-gnu/libcurl.so.4
#4  0x00007ffff7baee4b in curl_easy_perform () from /lib/x86_64-linux-gnu/libcurl.so.4
#5  0x0000555558afd1f8 in graphene::utilities::doCurl[abi:cxx11](graphene::utilities::CurlRequest&) ()
#6  0x0000555558afda00 in graphene::utilities::checkES(graphene::utilities::ES&) ()
#7  0x00005555583d8e31 in graphene::elasticsearch::elasticsearch_plugin::plugin_startup() ()
#8  0x00005555581038cb in graphene::app::detail::application_impl::startup_plugins() const ()
#9  0x000055555810b0ad in graphene::app::detail::application_impl::startup() ()
#10 0x000055555810b3a4 in graphene::app::application::startup() ()
#11 0x00005555580e47f4 in main ()

It's caused by a bug in curl: curl/curl#3548
Update: that bug was fixed in curl 7.68.0-1ubuntu2.6 , actually our issue is different.

Update:

  • the issue can not be stably reproduced - it is random.
  • the curl object is used like a global variable in the program, it has a long lifetime. Since we do not do cleanups after each query, we need to overwrite or reset the options before every query.
    • The crash is on a GET.
    • there may already be a POST sent before we sending the GET (see ElasticSearch plugin startup check is incomplete #2494).
    • we didn't specify CURLOPT_HTTPGET in doCurl() for GET, so it may be actually sending a POST, so libcurl may try to access CURLOPT_POSTFIELDS.
    • we didn't reset CURLOPT_POSTFIELDS in doCurl() for GET, which is a pointer and was pointing to a temporary variable which had been destructed already, the memory address may or may not be accessible. Anyway, accessing it is wrong.
    • So the correct fix for this issue should be resetting the options (see https://curl.se/libcurl/c/CURLOPT_CUSTOMREQUEST.html):

      ... (CURLOPT_CUSTOMREQUEST) is particularly useful, for example, for performing an HTTP DELETE request.
      To switch to a proper HEAD use CURLOPT_NOBODY, to switch to a proper POST use CURLOPT_POST or CURLOPT_POSTFIELDS and to switch to a proper GET use CURLOPT_HTTPGET.

  • By the way, there are quite some other design flaws in the plugin, ideally we should refactor it when got time.

Host Environment
Please provide details about the host environment. Much of this information can be found running: witness_node --version.

  • Host OS: Ubuntu 20.04.2 LTS
  • Host Physical RAM -
  • BitShares Version: 5.2.1
  • OpenSSL Version: 1.1.1f
  • Boost Version: 1.71
  • libcurl4-openssl-dev 7.68.0-1ubuntu2.6

Additional Context
The crash started to happen when I upgraded my server with sudo apt upgrade to upgrade ElasticSearch to 7.13.4, at the same time kernel and some other packages got upgraded too. After the upgrade, before a reboot, witness_node worked fine. After reboot, witness_node starts to crash.

The pre-built witness_node binary (with different versions of libraries statically linked) crashes too. So perhaps the issue is triggered by some changes in kernel.

@abitmore abitmore added this to the 5.3.0 - Feature Release milestone Jul 23, 2021
@abitmore abitmore added this to To do in Feature Release (6.1.0) via automation Jul 23, 2021
@abitmore abitmore linked a pull request Jul 23, 2021 that will close this issue
@abitmore abitmore moved this from To do to In testing in Feature Release (6.1.0) Jul 23, 2021
@abitmore
Copy link
Member Author

I am unable to stably reproduce this issue.

@abitmore abitmore removed this from In testing in Feature Release (6.1.0) Jul 25, 2021
@abitmore abitmore added this to New -Awaiting Core Team Evaluation in Project Backlog via automation Jul 25, 2021
@abitmore abitmore linked a pull request Jul 25, 2021 that will close this issue
@abitmore abitmore removed this from New -Awaiting Core Team Evaluation in Project Backlog Jul 25, 2021
@abitmore abitmore added this to To Do in Protocol Upgrade Release (6.0.0) via automation Jul 25, 2021
@abitmore abitmore moved this from To Do to In Testing in Protocol Upgrade Release (6.0.0) Jul 25, 2021
@abitmore
Copy link
Member Author

abitmore commented Aug 5, 2021

Fixed by #2495.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

1 participant