New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests leak file descriptors #1380

Closed
marmarek opened this Issue Nov 5, 2015 · 10 comments

Comments

Projects
None yet
2 participants
@marmarek
Member

marmarek commented Nov 5, 2015

When running a lot of tests in one call it eventually leads to "Too many open files" error.
Some preliminary debugging with strace reveals that:

  1. Most of leaked FDs are libvirt connections (pipe, socket), but some of them are also QubesDB connections (socket)
  2. Libvirt connections are made only in two places: VMMConnection.init_vmm_connection, SystemTestsMixin.setUp
  3. All leaked libvirt connections are from the second place (SystemTestsMixin)
  4. Theoretically those are cleaned in SystemTestsMixin.tearDown, but they aren't - the function is called, but self.conn.close() does not close them

I guess it's because the connection is still referenced from a couple of QubesVm objects (vm.libvirt_domain), which aren't garbage collected because of some circular dependency (or some other reason). The same probably applies to QubesDB connections.

@marmarek marmarek added this to the Release 3.1 milestone Nov 5, 2015

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Nov 5, 2015

Member

@woju any idea? This wouldn't be the first place where not properly releasing QubesVM objects have some downsides...

Member

marmarek commented Nov 5, 2015

@woju any idea? This wouldn't be the first place where not properly releasing QubesVM objects have some downsides...

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju Nov 5, 2015

Member

On Wed, Nov 04, 2015 at 05:44:11PM -0800, Marek Marczykowski-Górecki wrote:

@woju any idea? This wouldn't be the first place where not properly
releasing QubesVM objects have some downsides...

If self.conn.close() won't really close them, that's bug in libvirt or
that python wrapper of theirs...

Member

woju commented Nov 5, 2015

On Wed, Nov 04, 2015 at 05:44:11PM -0800, Marek Marczykowski-Górecki wrote:

@woju any idea? This wouldn't be the first place where not properly
releasing QubesVM objects have some downsides...

If self.conn.close() won't really close them, that's bug in libvirt or
that python wrapper of theirs...

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Nov 5, 2015

Member

self.conn.close() only decrease refcount, by design:
http://libvirt.org/html/libvirt-libvirt-host.html#virConnectClose

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Nov 5, 2015

self.conn.close() only decrease refcount, by design:
http://libvirt.org/html/libvirt-libvirt-host.html#virConnectClose

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju Nov 5, 2015

Member

On Thu, Nov 05, 2015 at 01:21:58PM -0800, Marek Marczykowski-Górecki wrote:

self.conn.close() only decrease refcount, by design:
http://libvirt.org/html/libvirt-libvirt-host.html#virConnectClose

Ah, if so, then judging by http://libvirt.org/git/?p=libvirt-python.git;a=blob;f=libvirt-override-virConnect.py#l11
the descriptor is freed when the virConnect is garbage collected.
See https://docs.python.org/2/reference/datamodel.html#object.__del__,
particulary the "Note:" there.

Member

woju commented Nov 5, 2015

On Thu, Nov 05, 2015 at 01:21:58PM -0800, Marek Marczykowski-Górecki wrote:

self.conn.close() only decrease refcount, by design:
http://libvirt.org/html/libvirt-libvirt-host.html#virConnectClose

Ah, if so, then judging by http://libvirt.org/git/?p=libvirt-python.git;a=blob;f=libvirt-override-virConnect.py#l11
the descriptor is freed when the virConnect is garbage collected.
See https://docs.python.org/2/reference/datamodel.html#object.__del__,
particulary the "Note:" there.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Nov 5, 2015

Member

Great. So how can we fix that? Preferably in some generic way...
As a workaround, I see we may simply iterate over each QubesVM object
and release vm.libvirt_domain (del?). But this is neither elegant, nor
complete (probably will not handle qubes.xml reload, VM restarts and
such).

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Nov 5, 2015

Great. So how can we fix that? Preferably in some generic way...
As a workaround, I see we may simply iterate over each QubesVM object
and release vm.libvirt_domain (del?). But this is neither elegant, nor
complete (probably will not handle qubes.xml reload, VM restarts and
such).

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju Nov 6, 2015

Member

On Thu, Nov 05, 2015 at 03:42:04PM -0800, Marek Marczykowski-Górecki wrote:

Great. So how can we fix that? Preferably in some generic way...
As a workaround, I see we may simply iterate over each QubesVM object
and release vm.libvirt_domain (del?). But this is neither elegant, nor
complete (probably will not handle qubes.xml reload, VM restarts and
such).

We could make a wrapper around the object, which will be wrapper's
private attribute. No one else will have reference to it. The wrapper
will pass most of the calls untouched, but the close() call will be done
in such a way that will ensure freeing the descriptor.

Member

woju commented Nov 6, 2015

On Thu, Nov 05, 2015 at 03:42:04PM -0800, Marek Marczykowski-Górecki wrote:

Great. So how can we fix that? Preferably in some generic way...
As a workaround, I see we may simply iterate over each QubesVM object
and release vm.libvirt_domain (del?). But this is neither elegant, nor
complete (probably will not handle qubes.xml reload, VM restarts and
such).

We could make a wrapper around the object, which will be wrapper's
private attribute. No one else will have reference to it. The wrapper
will pass most of the calls untouched, but the close() call will be done
in such a way that will ensure freeing the descriptor.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Nov 6, 2015

Member

Ok, a simple test reveals that libvirt_conn.close() doesn't work at all:

c = libvirt.open('xen:///')
c.close()
del c

And the socket is still open. I'm still not sure where the problem is

  • our usage, or libvirt pythin bindings. But I tend to think the later
    one.

Anyway, can we not open a new libvirt connection for every single test?
Instead, reuse vmm.libvirt_conn for that. Any drawbacks?

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Nov 6, 2015

Ok, a simple test reveals that libvirt_conn.close() doesn't work at all:

c = libvirt.open('xen:///')
c.close()
del c

And the socket is still open. I'm still not sure where the problem is

  • our usage, or libvirt pythin bindings. But I tend to think the later
    one.

Anyway, can we not open a new libvirt connection for every single test?
Instead, reuse vmm.libvirt_conn for that. Any drawbacks?

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Nov 6, 2015

Member

Found it. When some event loop implementation is registered (libvirt.virEventRegisterDefaultImpl), connections are closed as part of that event loop (libvirt.virEventRunDefaultImpl). We do not call it (when running tests).
Actually FD leak is not the only problem - the other is not sending keep-alives - which we had to disable on daemon side because of that.

Background

  1. We need that event loop for handling libvirt events, in Qubes Manager (through QubesWatch)
  2. libvirt.virEventRegisterDefaultImpl (or other libvirt.virEventRegisterImpl) must be called before opening the connection

For now (R3.1 or even backport for R3.0)

I see two options here:

  1. Always start a event loop, not only in Qubes Manager
  2. Try to register event loop only when needed (which probably would involve opening another connection after registering it)

I think I've already tried the second option some time ago (but I maybe not tried with the second connection). That would be better (performance wise) than starting event loop for every qvm-ls. On the other hand, qvm-backup could use keep-alives. Anyway I'll retry the second option and fallback to the first one.

For R4.0/R4.1 (qubesd)

It will not be needed anymore, since "client" tools will not speak to libvirt directly.

Member

marmarek commented Nov 6, 2015

Found it. When some event loop implementation is registered (libvirt.virEventRegisterDefaultImpl), connections are closed as part of that event loop (libvirt.virEventRunDefaultImpl). We do not call it (when running tests).
Actually FD leak is not the only problem - the other is not sending keep-alives - which we had to disable on daemon side because of that.

Background

  1. We need that event loop for handling libvirt events, in Qubes Manager (through QubesWatch)
  2. libvirt.virEventRegisterDefaultImpl (or other libvirt.virEventRegisterImpl) must be called before opening the connection

For now (R3.1 or even backport for R3.0)

I see two options here:

  1. Always start a event loop, not only in Qubes Manager
  2. Try to register event loop only when needed (which probably would involve opening another connection after registering it)

I think I've already tried the second option some time ago (but I maybe not tried with the second connection). That would be better (performance wise) than starting event loop for every qvm-ls. On the other hand, qvm-backup could use keep-alives. Anyway I'll retry the second option and fallback to the first one.

For R4.0/R4.1 (qubesd)

It will not be needed anymore, since "client" tools will not speak to libvirt directly.

@marmarek marmarek self-assigned this Nov 7, 2015

marmarek added a commit to QubesOS/qubes-core-admin that referenced this issue Nov 7, 2015

utils/QubesWatch: register libvirt event loop only when really launched
Registering event implementation in libvirt and then not calling it is
harmful, because libvirt expects it working. Known drawbacks:
- keep-alives are advertised as supported but not really sent (cause
  dropping connections)
- connections are not closed (sockets remains open, effectively leaking
  file descriptors)

So call libvirt.virEventRegisterDefaultImpl only when it will be really
used (libvirt.virEventRunDefaultImpl called), which means calling it in
QubesWatch. Registering events implementation have effect only on new
libvirt connections, so start a new one for QubesWatch.

Fixes QubesOS/qubes-issues#1380

marmarek added a commit to QubesOS/qubes-core-admin that referenced this issue Nov 15, 2015

core: ensure that QubesDB connections are closed when disposing a VM …
…collection

There are some circular dependencies (TemplateVM.appvms,
NetVM.connected_vms, and probably more), which prevents garbage
collector from cleaning them.

Fixes QubesOS/qubes-issues#1380

(cherry picked from commit d388838)

marmarek added a commit to QubesOS/qubes-core-admin that referenced this issue Nov 15, 2015

utils/QubesWatch: register libvirt event loop only when really launched
Registering event implementation in libvirt and then not calling it is
harmful, because libvirt expects it working. Known drawbacks:
- keep-alives are advertised as supported but not really sent (cause
  dropping connections)
- connections are not closed (sockets remains open, effectively leaking
  file descriptors)

So call libvirt.virEventRegisterDefaultImpl only when it will be really
used (libvirt.virEventRunDefaultImpl called), which means calling it in
QubesWatch. Registering events implementation have effect only on new
libvirt connections, so start a new one for QubesWatch.

Fixes QubesOS/qubes-issues#1380

(cherry picked from commit 0695e7b)
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Nov 15, 2015

Member

Automated announcement from builder-github

The package qubes-core-dom0-3.0.26-1.fc20 has been pushed to the r3.0 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

Member

marmarek commented Nov 15, 2015

Automated announcement from builder-github

The package qubes-core-dom0-3.0.26-1.fc20 has been pushed to the r3.0 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Nov 29, 2015

Member

Automated announcement from builder-github

The package qubes-core-dom0-3.0.26-1.fc20 has been pushed to the r3.0 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

Member

marmarek commented Nov 29, 2015

Automated announcement from builder-github

The package qubes-core-dom0-3.0.26-1.fc20 has been pushed to the r3.0 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment