Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MicroOVN built with latest Dqlite crashes on cluster formation. #684

Closed
mkalcok opened this issue Aug 15, 2024 · 5 comments
Closed

MicroOVN built with latest Dqlite crashes on cluster formation. #684

mkalcok opened this issue Aug 15, 2024 · 5 comments

Comments

@mkalcok
Copy link

mkalcok commented Aug 15, 2024

Today we noticed that when MicroOVN is build with the dqlite from master branch it crashes when the third member joins the cluster with following message in microovn.daemon service logs:

Aug 15 09:31:19 microovn-basic-cluster-3 microovn.daemon[2258]: time="2024-08-15T09:31:19Z" level=info msg=" - binding https socket" network="10.112.216.95:6443"
Aug 15 09:31:19 microovn-basic-cluster-3 microovn.daemon[2258]: microovnd: ./src/unix/core.c:302: uv__finish_close: Assertion `handle->flags & UV_HANDLE_CLOSING' failed.
Aug 15 09:31:19 microovn-basic-cluster-3 systemd[1]: snap.microovn.daemon.service: Main process exited, code=dumped, status=6/ABRT

This problem did not occur before today.

Steps to reproduce:

  • Get MicroOVN code that uses Dqlite from master (rev. e58d1e0)
  • Build and install MicroOVN on three hosts.
    • LXD containers can be used for running MicroOVN, they just need openvswitch kernel module. You can launch them with lxc launch ubuntu:noble <container_name> --config linux.kernel_modules=openvswitch
      Bootstrap the cluster and add new members
# On host 1
microovn cluster bootstrap
microovn cluster add <host2-hostname>
microovn cluster add <host3-hostname>

Join the cluster with 2nd member

# On host 2
microovn cluster join <token>

Join the cluster with 3rd member

# On host 3
microovn cluster join <token>
# Following message will be displayed:
# Error: Post "http://control.socket/core/control": EOF

Observe MicroOVN logs on the 3rd node

# On host 3
journalctl -xfeu snap.microovn.daemon

The service on the 3rd host will be in restart loop, but the first attempt to start will show above mentioned error.

Workaround:
Pin the dqlite dependency

@cole-miller
Copy link
Contributor

Thanks, seeing this in the musl CI runs for #682 as well. I'm investigating.

@cole-miller
Copy link
Contributor

This is almost certainly a regression from #681. I'll prepare a revert.

@cole-miller
Copy link
Contributor

cole-miller commented Aug 16, 2024

This crash reproduces quite consistently in the CI runs for #682 when running ./unit-test conn/exec/close_while_in_flight (but not locally, frustratingly, even using the identical Docker image). It seems to be an issue with the management of the new thread pool's uv_async handle, maybe a use-after-free (since the affected handle looks corrupted when I peek at it in GDB). Might be related to the test hang that we've been seeing occasionally. But I don't think that specifically can be what was affecting MicroOVN because that's not using the new thread pool at all, it's ifdef-ed out in the default configuration. So I think we have two bugs that present in superficially the same way, one of which was introduced by #681 and one of which concerns the thread pool.

@cole-miller
Copy link
Contributor

#686 should fix the thread pool side of this, root cause of the MicroOVN crash remains to be determined.

@cole-miller
Copy link
Contributor

Fixed by #687

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants