New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to attach vif to netvm #199

Closed
marmarek opened this Issue Mar 8, 2015 · 8 comments

Comments

Projects
None yet
3 participants
@marmarek
Member

marmarek commented Mar 8, 2015

Reported by rafal on 6 Apr 2011 10:25 UTC
After many create/destroy domain cycles, xen is unable to do network-attach to netvm. In Netvm logs, there is:
~# udevadm monitor
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent

KERNEL[add /devices/xen-backend/vif-76-0
(xen-backend)
Apr 6 06:08:00 localhost kernel: 2516.786792 vif vif-76-0: 2 writing
feature-sg
Apr 6 06:08:00 localhost kernel: [ 2516.787018] vif vif-76-0: xenbus:
failed to write error node for backend/vif/76/0 (2 writing feature-sg)
Apr 6 06:08:00 localhost kernel: [ 2516.787532] vif vif-76-0: 2
xenbus_dev_probe on backend/vif/76/0
Apr 6 06:08:00 localhost kernel: [ 2516.787719] vif vif-76-0: xenbus:
failed to write error node for backend/vif/76/0 (2 xenbus_dev_probe on
backend/vif/76/0)
UDEV [1302084480.848507] add /devices/xen-backend/vif-76-0
(xen-backend)

the hotplug script is not called, the vif76.0 device is not present.
Nothing in xen logs, not dom0 logs.

Migrated-From: https://wiki.qubes-os.org/ticket/199

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by rafal on 6 Apr 2011 11:17 UTC
Error 2 is ENOENT. However, if I

  1. pause netvm1 (it has xid 1)
  2. do manual xm network-attach test7 backend=netvm1
    then the /local/domain/1/backend/vif/XID/0 is present, along with keys in it.
  3. unpause netvm1
    the same error.
Member

marmarek commented Mar 8, 2015

Comment by rafal on 6 Apr 2011 11:17 UTC
Error 2 is ENOENT. However, if I

  1. pause netvm1 (it has xid 1)
  2. do manual xm network-attach test7 backend=netvm1
    then the /local/domain/1/backend/vif/XID/0 is present, along with keys in it.
  3. unpause netvm1
    the same error.
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Modified by rafal on 6 Apr 2011 11:54 UTC

Member

marmarek commented Mar 8, 2015

Modified by rafal on 6 Apr 2011 11:54 UTC

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Modified by joanna on 17 Apr 2011 16:09 UTC

Member

marmarek commented Mar 8, 2015

Modified by joanna on 17 Apr 2011 16:09 UTC

@marmarek marmarek self-assigned this Mar 8, 2015

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by joanna on 28 May 2011 09:02 UTC
This will likely gone in Xen 4.1 that we use in Beta 2 now. So, I'm closing this now, and in case somebody discovered it on Beta 2, it should be reopened.

Member

marmarek commented Mar 8, 2015

Comment by joanna on 28 May 2011 09:02 UTC
This will likely gone in Xen 4.1 that we use in Beta 2 now. So, I'm closing this now, and in case somebody discovered it on Beta 2, it should be reopened.

@marmarek marmarek added the notanissue label Mar 8, 2015

@marmarek marmarek closed this Mar 8, 2015

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by rafal on 20 Jul 2011 15:15 UTC
The issue is still present in beta2.
This time, there is warning_slowpath in /var/log/messages in firewallvm, followed by

vif vif-68-0: xenbus: failed to write error node for backend/vif/68/0 (2)

In order to reproduce, it is enough to just run/destroy a domain in a loop, e.g.:

while qvm-run -a personal --pass_io 'echo alive' | grep -q alive ; do echo still alive; qvm-kill personal; done

After ca 60 iterations, firewallvm is unable to attach a device. And VM will take 300s to boot due to xenbus warnings.

The problem seems to be caused by two factors:

  1. there is a limit on the number of xenstore keys a domain can create
  2. if a backend is not dom0, then the backend has no privilege to remove e.g. backend/vif/client-xid
    key upon device detach (e.g. upon domain termination)

This issue is likely to affect all non-dom0 backends, not only vifs.

The solution is to do
xenstore-chmod /local/domain/$backend-xid/vif/client-xid w"$backend-xid
when creating the key. The key is created by xl, so the proper place for the patch is libxl. Reassigning to Marek, who knows libxl already :) If possible, do it generically, not only for vifs, but for all backends.

Member

marmarek commented Mar 8, 2015

Comment by rafal on 20 Jul 2011 15:15 UTC
The issue is still present in beta2.
This time, there is warning_slowpath in /var/log/messages in firewallvm, followed by

vif vif-68-0: xenbus: failed to write error node for backend/vif/68/0 (2)

In order to reproduce, it is enough to just run/destroy a domain in a loop, e.g.:

while qvm-run -a personal --pass_io 'echo alive' | grep -q alive ; do echo still alive; qvm-kill personal; done

After ca 60 iterations, firewallvm is unable to attach a device. And VM will take 300s to boot due to xenbus warnings.

The problem seems to be caused by two factors:

  1. there is a limit on the number of xenstore keys a domain can create
  2. if a backend is not dom0, then the backend has no privilege to remove e.g. backend/vif/client-xid
    key upon device detach (e.g. upon domain termination)

This issue is likely to affect all non-dom0 backends, not only vifs.

The solution is to do
xenstore-chmod /local/domain/$backend-xid/vif/client-xid w"$backend-xid
when creating the key. The key is created by xl, so the proper place for the patch is libxl. Reassigning to Marek, who knows libxl already :) If possible, do it generically, not only for vifs, but for all backends.

@marmarek marmarek removed the notanissue label Mar 8, 2015

@marmarek marmarek reopened this Mar 8, 2015

@marmarek

This comment has been minimized.

Show comment
Hide comment

@marmarek marmarek closed this Mar 8, 2015

@whohoho

This comment has been minimized.

Show comment
Hide comment
@whohoho

whohoho Apr 24, 2018

Do you have a copy of the commit that fixes this issue? git.qubes-os.org is not there anymore.

Thanks

whohoho commented Apr 24, 2018

Do you have a copy of the commit that fixes this issue? git.qubes-os.org is not there anymore.

Thanks

@marmarek

This comment has been minimized.

Show comment
Hide comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment