Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network attach/detach problem with kernel 4.14.13 or above. #3657

Closed
Zrubi opened this issue Mar 5, 2018 · 20 comments
Closed

network attach/detach problem with kernel 4.14.13 or above. #3657

Zrubi opened this issue Mar 5, 2018 · 20 comments
Labels
C: kernel P: critical Priority: critical. Between "major" and "blocker" in severity. r3.2-dom0-stable r4.0-dom0-stable T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.

Comments

@Zrubi
Copy link
Member

Zrubi commented Mar 5, 2018

Qubes OS version:

R3.2
(and probably latest 4.0-rc)


Steps to reproduce the behavior:

Changing the NetVM of a proxyVM, takes longer time than before, but
succeed - at least according to Qubes Manager (and qvm tools)
But after the change, no traffic visible on the virtual interfaces, so
no networking after that move.

There is no error messages related this issue.
But If I want to detach this bugged ProxyVM from the connected AppVM
(means: If try to change the connected AppVM's netVM) it is failing
with the following error message:

Internal error: libxenlight failed to detach network device

From this point you are not able to start any new apps from the bugged VM,
(Restarting the affected VMs helps, unbtil you not try to change the netVM)

It seem that in case of using 4.14.12 kernel in dom0 and in VM's
everything is working fine.

ANY newer kernel causing several xen related issues including the
problem I described before.

@andrewdavidwong andrewdavidwong added T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. C: kernel labels Mar 6, 2018
@andrewdavidwong andrewdavidwong added this to the Release 3.2 updates milestone Mar 6, 2018
@Zrubi
Copy link
Member Author

Zrubi commented Mar 29, 2018

This issue is affecting the just released 4.0 (with kernel 4.14.18)
edit: downgrade to 4.14.12 works in case of 4.0 too:
sudo qubes-dom0-update --action=downgrade kernel-latest-4.14.12 kernel-latest-qubes-vm-4.14.12

I do not believe if this is issue is model specific, but my hardware is a Lenovo T450

@sorandom
Copy link

sorandom commented Apr 2, 2018

In qubes 4.0 (rc4 but fully upgraded, kernel 4.14.18-1) I am getting the same issue. VMs will exhibit this behaviour intermittently after a resume from suspend. Hardware is a Librem 13v2.

Network traffic will not pass through them. In this case, I had the particular VM acting as a VPN firewall and its xterm was usable but would not pass traffic. (appVMs -> appfw -> vpn -> fw (buggy) -> sys-net) Thus, I attempted to set my VPN vm's netvm property to "" so I could restart the firewall.

It manifests as "qvm-prefs: error: no such property: 'netvm'". Journald below:

<time> dom0 qubesd[17004]: unhandled exception while calling src=b'dom0' meth=b'admin.vm.property.Set' dest=b'sys-vpn' arg=b'netvm' len(untrusted_payload)=0
<time> dom0 qubesd[17004]: Traceback (most recent call last):
<time> dom0 qubesd[17004]:   File "/usr/lib/python3.5/site-packages/qubes/api/__init__.py", line 262, in respond
<time> dom0 qubesd[17004]:     untrusted_payload=untrusted_payload)
<time> dom0 qubesd[17004]:   File "/usr/lib64/python3.5/asyncio/futures.py", line 381, in __iter__
<time> dom0 qubesd[17004]:     yield self  # This tells Task to wait for completion.
<time> dom0 qubesd[17004]:   File "/usr/lib64/python3.5/asyncio/tasks.py", line 310, in _wakeup
<time> dom0 qubesd[17004]:     future.result()
<time> dom0 qubesd[17004]:   File "/usr/lib64/python3.5/asyncio/futures.py", line 294, in result
<time> dom0 qubesd[17004]:     raise self._exception
<time> dom0 qubesd[17004]:   File "/usr/lib64/python3.5/asyncio/tasks.py", line 240, in _step
<time> dom0 qubesd[17004]:     result = coro.send(None)
<time> dom0 qubesd[17004]:   File "/usr/lib64/python3.5/asyncio/coroutines.py", line 210, in coro
<time> dom0 qubesd[17004]:     res = func(*args, **kw)
<time> dom0 qubesd[17004]:   File "/usr/lib/python3.5/site-packages/qubes/api/admin.py", line 243, in vm_property_set
<time> dom0 qubesd[17004]:     untrusted_payload=untrusted_payload)
<time> dom0 qubesd[17004]:   File "/usr/lib/python3.5/site-packages/qubes/api/admin.py", line 263, in _property_set
<time> dom0 qubesd[17004]:     setattr(dest, self.arg, newvalue)
<time> dom0 qubesd[17004]:   File "/usr/lib/python3.5/site-packages/qubes/vm/__init__.py", line 569, in __set__
<time> dom0 qubesd[17004]:     super(VMProperty, self).__set__(instance, value)
<time> dom0 qubesd[17004]:   File "/usr/lib/python3.5/site-packages/qubes/__init__.py", line 260, in __set__
<time> dom0 qubesd[17004]:     name=self.__name__, newvalue=value, oldvalue=oldvalue)
<time> dom0 qubesd[17004]:   File "/usr/lib/python3.5/site-packages/qubes/events.py", line 198, in fire_event
<time> dom0 qubesd[17004]:     pre_event=pre_event)
<time> dom0 qubesd[17004]:   File "/usr/lib/python3.5/site-packages/qubes/events.py", line 166, in _fire_event
<time> dom0 qubesd[17004]:     effect = func(self, event, **kwargs)
<time> dom0 qubesd[17004]:   File "/usr/lib/python3.5/site-packages/qubes/vm/mix/net.py", line 418, in on_property_pre_set_netvm
<time> dom0 qubesd[17004]:     self.detach_network()
<time> dom0 qubesd[17004]:   File "/usr/lib/python3.5/site-packages/qubes/vm/mix/net.py", line 326, in detach_network
<time> dom0 qubesd[17004]:     vm=self))
<time> dom0 qubesd[17004]:   File "/usr/lib/python3.5/site-packages/qubes/app.py", line 94, in wrapper
<time> dom0 qubesd[17004]:     return attr(*args, **kwargs)
<time> dom0 qubesd[17004]:   File "/usr/lib64/python3.5/site-packages/libvirt.py", line 1170, in detachDevice
<time> dom0 qubesd[17004]:     if ret == -1: raise libvirtError ('virDomainDetachDevice() failed', dom=self)
<time> dom0 qubesd[17004]: libvirt.libvirtError: internal error: libxenlight failed to detach network device
<time> dom0 qrexec[14428]: qubes.NotifyUpdates: sys-net -> dom0: allowed to dom0

@sorandom
Copy link

sorandom commented Apr 2, 2018

sudo qubes-dom0-update --action=downgrade kernel-latest-4.14.12 kernel-latest-qubes-vm-4.14.12

This did not work for me.

[user@dom0 ~]$ sudo qubes-dom0-update --action=downgrade kernel-latest-4.14.12 kernel-latest-qubes-vm-4.14.12
Using <vm> as UpdateVM to download updates for Dom0; this may take some time...
Last metadata expiration check: 0:00:52 ago on Mon Apr  2 11:00:45 2018.
Packages for argument kernel-latest-4.14.12 available, but not installed.
Packages for argument kernel-latest-qubes-vm-4.14.12 available, but not installed.
Error: No packages marked for downgrade.

I can get it to download but not install:

[user@dom0 ~]$ sudo qubes-dom0-update kernel-latest-4.14.12 kernel-latest-qubes-vm-4.14.12
Using <vm> as UpdateVM to download updates for Dom0; this may take some time...
Fedora 25 - x86_64 - Updates                    1.3 MB/s |  24 MB     00:19
Fedora 25 - x86_64                              1.8 MB/s |  50 MB     00:28
Qubes Dom0 Repository (updates)                 513 kB/s | 3.0 MB     00:06
Qubes Templates repository                       19 kB/s | 8.3 kB     00:00
Last metadata expiration check: 0:00:00 ago on Mon Apr  2 11:55:59 2018.
Dependencies resolved.
================================================================================
 Package             Arch   Version                    Repository          Size
================================================================================
Installing:
 kernel-latest       x86_64 1000:4.14.12-1.pvops.qubes qubes-dom0-current  46 M
 kernel-latest-qubes-vm
                     x86_64 1000:4.14.12-1.pvops.qubes qubes-dom0-current  63 M

Transaction Summary
================================================================================
Install  2 Packages

Total download size: 108 M
Installed size: 108 M
DNF will only download packages for the transaction.
Downloading Packages:
(1/2): kernel-latest-4.14.12-1.pvops.qubes.x86_ 881 kB/s |  46 MB     00:53
(2/2): kernel-latest-qubes-vm-4.14.12-1.pvops.q 753 kB/s |  63 MB     01:25
--------------------------------------------------------------------------------
Total                                           1.3 MB/s | 108 MB     01:25
Complete!
The downloaded packages were saved in cache until the next successful transaction.
You can remove cached packages by executing 'dnf clean packages'.
Qubes OS Repository for Dom0                              81 MB/s | 209 kB     00:00
No package kernel-latest-4.14.12 available.
No package kernel-latest-qubes-vm-4.14.12 available.
Error: Unable to find a match.

@Zrubi
Copy link
Member Author

Zrubi commented Apr 3, 2018

I do not know what is the real problem here, but as a workaround, you can try:
sudo qubes-dom0-update kernel-latest kernel-latest-qubes-vm

Then you can downgrade to the right one:
sudo qubes-dom0-update --action=downgrade kernel-latest-4.14.12 kernel-latest-qubes-vm-4.14.12

@sorandom
Copy link

sorandom commented Apr 9, 2018

I was able to downgrade the VM kernel to 4.14.12, and now sys-net reliably freezes after every suspend, it's not intermittent as before. However, network attach/detach failure does not seem to be quite as big of an issue anymore, but still does happen. I'll keep testing.

@marmarek
Copy link
Member

With fully updated 4.0 (as of today), and 4.14.34 kernel in VM, this still doesn't work. Minimal test case I have is simple: detach network from a VM (qvm-prefs testvm netvm '').
This supposedly complete successfully, but in fact only backend side is cleaned up, on fronted side there is still (not working now) eth0. Additionally, xenstored process use 100% CPU for some time (a minute or so).
Using sysrq+t I've found xenwatch kernel thread waiting on something during device removal:

[  168.361621] xenbus          S    0    21      2 0x80000000
[  168.361630] Call Trace:
[  168.361636]  ? __schedule+0x3df/0x880
[  168.361642]  schedule+0x32/0x80
[  168.361653]  xenbus_thread+0x5d5/0x9b0
[  168.361661]  ? __wake_up_common+0x96/0x180
[  168.361668]  ? remove_wait_queue+0x60/0x60
[  168.361675]  kthread+0xff/0x130
[  168.361686]  ? xb_read+0x1b0/0x1b0
[  168.361694]  ? kthread_create_on_node+0x70/0x70
[  168.361703]  ret_from_fork+0x35/0x40
[  168.361714] xenwatch        D    0    22      2 0x80000000
[  168.361722] Call Trace:
[  168.361727]  ? __schedule+0x3df/0x880
[  168.361735]  schedule+0x32/0x80
[  168.361743]  xennet_remove+0x7b/0x1b0 [xen_netfront]
[  168.361756]  ? remove_wait_queue+0x60/0x60
[  168.361763]  xenbus_dev_remove+0x4f/0xa0
[  168.361771]  device_release_driver_internal+0x157/0x220
[  168.361782]  bus_remove_device+0xe5/0x150
[  168.361793]  device_del+0x1cf/0x300
[  168.361802]  ? xenbus_dev_remove+0xa0/0xa0
[  168.361811]  device_unregister+0x16/0x60
[  168.361821]  xenbus_dev_changed+0x9f/0x1d0
[  168.361829]  ? finish_wait+0x3c/0x80
[  168.361838]  xenwatch_thread+0xcf/0x170
[  168.362019]  ? remove_wait_queue+0x60/0x60
[  168.362026]  kthread+0xff/0x130
[  168.362032]  ? find_watch+0x40/0x40
[  168.362039]  ? kthread_create_on_node+0x70/0x70
[  168.362047]  ret_from_fork+0x35/0x40

@HW42
Copy link

HW42 commented Apr 19, 2018

@marmarek: Can you confirm that this does not happen with 4.16.2? (At least for me that's the case).

@HW42
Copy link

HW42 commented Apr 19, 2018

Ah, it seems this commit missed stable.

@marmarek
Copy link
Member

@marmarek: Can you confirm that this does not happen with 4.16.2? (At least for me that's the case).

Yes, on 4.16.2 it works correctly.

HW42 added a commit to HW42/qubes-linux-kernel that referenced this issue Apr 19, 2018
@qubesos-bot
Copy link

Automated announcement from builder-github

The package kernel-4.14.35-1.pvops.qubes has been pushed to the r4.0 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

@qubesos-bot
Copy link

Automated announcement from builder-github

The package kernel-4.14.35-1.pvops.qubes has been pushed to the r3.2 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

@qubesos-bot
Copy link

Automated announcement from builder-github

The package kernel-4.14.35-1.pvops.qubes has been pushed to the r4.0 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

@qubesos-bot
Copy link

Automated announcement from builder-github

The package kernel-4.14.57-1.pvops.qubes has been pushed to the r3.2 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

@andrewdavidwong andrewdavidwong added the P: critical Priority: critical. Between "major" and "blocker" in severity. label Aug 3, 2018
@andrewdavidwong
Copy link
Member

Still a problem on R3.2 stable with kernel-4.14.57-1.pvops.qubes.

@francuss
Copy link

I do not know what is the real problem here, but as a workaround, you can try:
sudo qubes-dom0-update kernel-latest kernel-latest-qubes-vm

Then you can downgrade to the right one:
sudo qubes-dom0-update --action=downgrade kernel-latest-4.14.12 kernel-latest-qubes-vm-4.14.12

I tried this but it is worse than before, is there a way to go back to the state it was before trying this?

@marmarek
Copy link
Member

There were two more related fixes in kernel, last one in 4.14.72. Can anybody check if that still happen in kernel-qubes-vm-4.14.74 (the one in current-testing)?

@marmarek
Copy link
Member

I tried this but it is worse than before, is there a way to go back to the state it was before trying this?

You can set default kernel in global settings.

@francuss
Copy link

francuss commented Oct 13, 2018 via email

@marmarek
Copy link
Member

You can set default kernel in global settings. —
This was 3.2, not 4

Still, in qubes manager -> system -> global settings.

@esote
Copy link

esote commented Nov 23, 2018

@marmarek @andrewdavidwong The commit @HW42 referenced is present in 4.14.74. So I think its safe to close this issue unless anyone else verifies they still experience it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: kernel P: critical Priority: critical. Between "major" and "blocker" in severity. r3.2-dom0-stable r4.0-dom0-stable T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
Development

No branches or pull requests

8 participants