Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Websocket server crashes on quit #1784

Closed
1 of 17 tasks
abitmore opened this issue Jun 4, 2019 · 6 comments · Fixed by #2204
Closed
1 of 17 tasks

Websocket server crashes on quit #1784

abitmore opened this issue Jun 4, 2019 · 6 comments · Fixed by #2204
Assignees
Labels
3d Bug Classification indicating the existing implementation does not match the intention of the design

Comments

@abitmore
Copy link
Member

abitmore commented Jun 4, 2019

Bug Description

Reported in #1782 (comment):

Stack back trace:

Thread 1 "cli_wallet" received signal SIGSEGV, Segmentation fault.
0x0000000000d93f9c in fc::http::detail::websocket_server_impl::websocket_server_impl()::{lambda(std::weak_ptr<void>)#6}::operator()(std::weak_ptr<void>) const::{lambda()#1}::operator()() const ()

(gdb) bt
#0  0x0000000000d93f9c in fc::http::detail::websocket_server_impl::websocket_server_impl()::{lambda(std::weak_ptr<void>)#6}::operator()(std::weak_ptr<void>) const::{lambda()#1}::operator()() const ()
#1  0x0000000000d94259 in fc::detail::void_functor_run<fc::http::detail::websocket_server_impl::websocket_server_impl()::{lambda(std::weak_ptr<void>)#6}::operator()(std::weak_ptr<void>) const::{lambda()#1}>::run(void*, fc::http::detail::websocket_server_impl::websocket_server_impl()::{lambda(std::weak_ptr<void>)#6}::operator()(std::weak_ptr<void>) const::{lambda()#1}) ()
#2  0x0000000000cd6cd2 in fc::task_base::run_impl() ()
#3  0x0000000000cd4a8a in fc::thread_d::process_tasks() ()
#4  0x0000000000cd50e4 in fc::thread_d::start_process_tasks(long) ()
#5  0x0000000000f10ce1 in make_fcontext ()
#6  0x0000000000000000 in ?? ()

Mentioned in #1782 (comment):

~websocket_server_impl() calls _server.stop_listening() which triggers the callback registered by set_fail_handler(), which may call _server_thread.async().wait() which won't block current thread so the websocket_server_impl object could be destructed first. When the code inside async() executes, _connection and _close no longer exist, thus the crash.

There is a race condition on the if( _server.is_listening() ) check, sometimes the _server_thread.async() won't be called, in this case it won't crash.

Actually, there is another race condition, the entire fail handler may execute after the server object is destructed, which may even lead to a crash on if( _server.is_listening() ), so IMHO we need to wait for the callback in the destructor.

By the way, better sync websocket_tls_server_impl implementation details with websocket_server_impl, ideally refactor the code to remove duplicate code (like bitshares/bitshares-fc#136).

Impacts
Describe which portion(s) of BitShares Core may be impacted by this bug. Please tick at least one box.

  • API (the application programming interface)
  • Build (the build process or something prior to compiled code)
  • CLI (the command line wallet)
  • Deployment (the deployment process after building such as Docker, Travis, etc.)
  • DEX (the Decentralized EXchange, market engine, etc.)
  • P2P (the peer-to-peer network for transaction/block propagation)
  • Performance (system or user efficiency, etc.)
  • Protocol (the blockchain logic, consensus, validation, etc.)
  • Security (the security of system or user data, etc.)
  • UX (the User Experience)
  • Other (please add below)

Steps To Reproduce
Starting cli_wallet with -H then press Ctrl+D after a few seconds.
See #1784 (comment) for a shell script for testing.

Host Environment
Ubuntu 16.04, boost 1.58, openssl 1.0.2g.

Additional Context (optional)
Witness_node is likely impacted by the same issue.

CORE TEAM TASK LIST

  • Evaluate / Prioritize Bug Report
  • Refine User Stories / Requirements
  • Define Test Cases
  • Design / Develop Solution
  • Perform QA/Testing
  • Update Documentation
@abitmore abitmore added the 3d Bug Classification indicating the existing implementation does not match the intention of the design label Jun 4, 2019
@abitmore abitmore added this to the Future Feature Release milestone Jun 4, 2019
@abitmore abitmore self-assigned this Jun 4, 2019
@abitmore
Copy link
Member Author

abitmore commented Jun 4, 2019

#1303 (comment) is related. When this issue got fixed, I think we can remove the sleep added in #1626.

@OpenLedgerApp
Copy link
Contributor

@abitmore, do you have any progress with this task?
If you don’t mind we can take this issue to us.

@abitmore
Copy link
Member Author

abitmore commented Aug 5, 2019

I've made some progress in my local repository. Waiting for bitshares/bitshares-fc#134 to be merged to avoid conflicts.

@abitmore
Copy link
Member Author

abitmore commented Jun 21, 2020

Updated OP to describe the issue (at least I think) more accurately.

Update: updated OP again, reverted last update and amended more info from #1782 (comment) into to OP.

@abitmore
Copy link
Member Author

A simple shell script for testing:

#!/bin/sh
count=0
while :; do
  count=$(($count+1))
  echo count=$count
  (sleep 1; echo) | ./cli_wallet -s ws://127.0.0.1:8090/ -H 127.0.0.1:8099 || break
  echo
done
echo count=$count

@abitmore abitmore linked a pull request Jun 22, 2020 that will close this issue
@abitmore
Copy link
Member Author

Should be fixed by #2204.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3d Bug Classification indicating the existing implementation does not match the intention of the design
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants