Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.17.0: Assertion `task && "When an ares socket is closed we should have a handle for it"' failed. #22841

Open
m4z opened this issue Aug 3, 2021 · 1 comment

Comments

@m4z
Copy link

m4z commented Aug 3, 2021

Description:

Our staging server (13 users, ~2 active) crashed for no apparent reason shortly after a user sync script ran (the script runs every 5 minutes, ran fine before and after manual restart of RC, and changed/POSTed nothing in this run.) After manual restart it works fine.

(Crash behaviour is similar to #22062, also reported by myself, but the error message is different.)

Steps to reproduce:

  1. (Unclear)
  2. Server dies with the log entries posted below
  3. We had to manually restart RC.

Expected behavior:

Run, RC, keep on running! 😇 (Or at least attempt to restart, which could be achieved by systemd's Restart option. Since the systemd unit is created by rocketchatctl, I guess I'll file an issue there, too.)

Actual behavior:

systemd service exited, see logs below.

Server Setup Information:

  • Version of Rocket.Chat Server: 3.17.0 (Commit Hash: d4ce52e, Commit Branch: HEAD)
  • Operating System: Debian GNU/Linux 9.13 (stretch)
  • Deployment Method: rocketchatctl
  • Number of Running Instances: 2 (one on this staging VM, one on a separate production VM)
  • DB Replicaset Oplog: the startup log says "Enabled"
  • NodeJS Version: v12.18.4 - x64
  • MongoDB Version: 4.0.19 (mmapv1)

Client Setup Information

  • Desktop App or Browser Version: n/a (no GUI client involved)
  • Operating System: n/a

Additional context

What's (maybe) special about our setup is that:

  • it's behind a corporate proxy (or more precisely, two different proxies, for HTTP and HTTPS) and not reachable from the outside world, which caused and causes a couple of problems (f.e. traefik can't request a LetsEncrypt certificate and creates an invalid "[0-9a-f.].traefik.default" cert)
  • the machine has both IPv6 and IPv4 addresses, not all of them connected to the Internet, which caused (and might cause) a couple of problems (see f.e. Problems when using multiple network interfaces install.sh#41)

Relevant logs:

(last log entry, 8 seconds before the error:)
rocketchat[1678]: API ➔ debug POST: /api/v1/logout

rocketchat[1678]: /usr/local/bin/node[1678]: ../src/cares_wrap.cc:361:void node::cares_wrap::{anonymous}::ares_sockstate_cb(void*, ares_socket_t, int, int): Assertion `task && "When an ares socket is closed we should have a handle for it"' failed.
rocketchat[1678]:  1: 0xa093f0 node::Abort() [/usr/local/bin/node]
rocketchat[1678]:  2: 0xa0946e  [/usr/local/bin/node]
rocketchat[1678]:  3: 0x98e81a  [/usr/local/bin/node]
rocketchat[1678]:  4: 0x185d901 ares__close_sockets [/usr/local/bin/node]
rocketchat[1678]:  5: 0x186649e  [/usr/local/bin/node]
rocketchat[1678]:  6: 0x18668df  [/usr/local/bin/node]
rocketchat[1678]:  7: 0x1866ad5  [/usr/local/bin/node]
rocketchat[1678]:  8: 0x1348228  [/usr/local/bin/node]
rocketchat[1678]:  9: 0x1335e4f uv_run [/usr/local/bin/node]
rocketchat[1678]: 10: 0xa4b665 node::NodeMainInstance::Run() [/usr/local/bin/node]
rocketchat[1678]: 11: 0x9da5a8 node::Start(int, char**) [/usr/local/bin/node]
rocketchat[1678]: 12: 0x7f6b0d8d92e1 __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6]
rocketchat[1678]: 13: 0x979215  [/usr/local/bin/node]
traefik[869]: time="2021-08-03T09:18:11Z" level=error msg="vulcand/oxy/forward/websocket: Error when copying from backend to client: websocket: close 1006 (abnormal closure): unexpected EOF"
traefik[869]: time="2021-08-03T09:18:11Z" level=error msg="vulcand/oxy/forward/websocket: Error when copying from backend to client: websocket: close 1006 (abnormal closure): unexpected EOF"
systemd[1]: rocketchat.service: Main process exited, code=killed, status=6/ABRT
systemd[1]: rocketchat.service: Unit entered failed state.
systemd[1]: rocketchat.service: Failed with result 'signal'.
@debdutdeb
Copy link
Member

Hi m4z

Hmm, thanks for reporting this. I see you have opened other issues in the past as well on the install.sh repo.

I'll take a look at all of them tomorrow. As for adding a restart policy to the systemd service, yes, makes sense. I'll see to it.

Thanks again :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants