Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make libvirt notice network device disconnection when backend domain is shut down #1426

Closed
marmarek opened this issue Nov 14, 2015 · 3 comments
Assignees
Labels
C: Xen P: major Priority: major. Between "default" and "critical" in severity. r4.0-dom0-stable release notes This issue should be mentioned in the release notes. T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality.

Comments

@marmarek
Copy link
Member

Xen toolstack used in R3.0 (libxl + libvirt) does not have any device monitoring mechanisms. If the device is detached by the VM itself (for any reason), or some external tool (like xl), the toolstack will still think the device is present and connected (also some settings of xen network frontend and backend will not be cleaned up because of that). The only way to tell the toolstack that device is no longer connected, is to detach it from VM (action initiated by the user, through qubes manager or qvm-prefs, not by backend driver in response to shutting down the domain). But if the device is no longer there, such detach action would fail. And more importantly trying to attach it again (after starting backend domain again) would also fail, because libvirt still thinks the device is already there (detach action failed, right?).

This is the main reason why NetVM restart isn't working.
This is similar issue as #1082 (which is about block devices).

@marmarek marmarek added T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality. C: Xen P: major Priority: major. Between "default" and "critical" in severity. release notes This issue should be mentioned in the release notes. labels Nov 14, 2015
@marmarek marmarek added this to the Release 3.2 milestone Nov 14, 2015
marmarek added a commit to marmarek/old-qubes-core-admin that referenced this issue Nov 15, 2015
This is workaround for missing libxl/libvirt functionality: QubesOS/qubes-issues#1426

Also it should improve system shutdown time, as this is the situation
where all the VMs are shutting down simultaneously.

Fixes QubesOS/qubes-issues#1425
marmarek added a commit to QubesOS/qubes-core-admin that referenced this issue Nov 15, 2015
This is workaround for missing libxl/libvirt functionality: QubesOS/qubes-issues#1426

Also it should improve system shutdown time, as this is the situation
where all the VMs are shutting down simultaneously.

Fixes QubesOS/qubes-issues#1425

(cherry picked from commit 7359e39)
andrewdavidwong added a commit that referenced this issue May 31, 2016
@marmarek marmarek modified the milestones: Release 3.2, Far in the future Jun 17, 2016
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Jun 6, 2017
With libvirt in place, this isn't enough - libvirt also keep VM
configuration in its memory and adjusting xenstore doesn't change that.
In fact changing xenstore behind it back make it even worse in some
situations.

QubesOS/qubes-issues#1426
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Dec 14, 2017
There may be cases when VM providing the network to other VMs is started
later - for example VM restart. While this is rare case (and currently
broken because of QubesOS/qubes-issues#1426), do not assume it will
always be the case.
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Dec 14, 2017
There may be cases when VM providing the network to other VMs is started
later - for example VM restart. While this is rare case (and currently
broken because of QubesOS/qubes-issues#1426), do not assume it will
always be the case.
@marmarek marmarek modified the milestones: Far in the future, Release 4.0 updates Feb 23, 2019
@marmarek marmarek self-assigned this Feb 23, 2019
@marmarek
Copy link
Member Author

Since we have reliable(*) domain-shutdown event in qubesd, I'm going to emulate this missing libvirt feature in qubesd by detaching network interfaces at (backend) domain shutdown/kill/crash.

(*) even if libvirt fails to report shutdown event, it will be triggered just before subsequent domain start.

But if the device is no longer there, such detach action would fail.

This part was fixed as part of #3163

marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Feb 24, 2019
Since we have more reliable domain-shutdown event delivery (it si
guaranteed to be delivered before subsequent domain start, even if
libvirt fails to report it), it's better to move detach_network call to
domain-shutdown handler. This way, frontend domain will see immediately
that the backend is gone. Technically it already know that, but at least
Linux do not propagate that anywhere, keeping the interface up,
seemingly operational, leading to various timeouts.
Additionally, by avoiding attach_network call _just_ after
detach_network call, it avoids various race conditions (like calling
cleanup scripts after new device got already connected).

While libvirt itself still doesn't cleanup devices when the backend
domain is gone, this will emulate it within qubesd.

Fixes QubesOS/qubes-issues#3642
Fixes QubesOS/qubes-issues#1426
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Feb 25, 2019
Since we have more reliable domain-shutdown event delivery (it si
guaranteed to be delivered before subsequent domain start, even if
libvirt fails to report it), it's better to move detach_network call to
domain-shutdown handler. This way, frontend domain will see immediately
that the backend is gone. Technically it already know that, but at least
Linux do not propagate that anywhere, keeping the interface up,
seemingly operational, leading to various timeouts.
Additionally, by avoiding attach_network call _just_ after
detach_network call, it avoids various race conditions (like calling
cleanup scripts after new device got already connected).

While libvirt itself still doesn't cleanup devices when the backend
domain is gone, this will emulate it within qubesd.

Fixes QubesOS/qubes-issues#3642
Fixes QubesOS/qubes-issues#1426
@qubesos-bot
Copy link

Automated announcement from builder-github

The package qubes-core-dom0-4.0.41-1.fc25 has been pushed to the r4.0 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

@qubesos-bot
Copy link

Automated announcement from builder-github

The package qubes-core-dom0-4.0.41-1.fc25 has been pushed to the r4.0 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: Xen P: major Priority: major. Between "default" and "critical" in severity. r4.0-dom0-stable release notes This issue should be mentioned in the release notes. T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality.
Projects
None yet
Development

No branches or pull requests

2 participants