New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys-usb cannot restart because of qrexec (or RAM fragmentation) #3384

Closed
GAhlekzis opened this Issue Dec 9, 2017 · 2 comments

Comments

Projects
None yet
4 participants
@GAhlekzis

Qubes OS version:

R3.2

Affected TemplateVMs:

fedora24, fedora-24-minimal, fedora-25-minimal
used/configured as sys-usb templates


Steps to reproduce the behavior:

Get sys-usb to violently crash (qrexec dies and windows disappear).
Shutdown sys-usb and restart it after that.

Expected behavior:

sys-usb restarts without a hitch.

Actual behavior:

sys-usb tries to start (indicator is yellow)
shortly after sys-usb dies and message "unable to start sys-usb: qrexec cannot be started"
sys-usb stays dead and nothing except a reboot will change that.

General notes:

I use Qubes often to handle my external harddrives and usb sticks because i often have to deal with booting from usb sticks or moving data around.
Sys-usb dies far more often than I'd like to admit while doing this. Often it just dies because I plug in a perfectly fine USB stick.
I also use external keyboard and mouse, so this forces me to reboot and redo my whole working setup.


Related issues:

https://groups.google.com/d/msg/qubes-users/MQEkDjVswNE/HTJWRE8VBQAJ
This is a mailing list entry about this issue from 02.2016
I think it's about time a solution for this problem is found.

An Idea:

Why not use a swap file in dom0 instead of real RAM for the PCI initialization?

@svenssonaxel

This comment has been minimized.

Show comment
Hide comment
@svenssonaxel

svenssonaxel Dec 9, 2017

Not a solution to anything, but if you have several USB controllers, creating several sys-usb-* machines each assigned just one controller, could limit the damage somewhat and allow you to at least continue using your keyboard.

Not a solution to anything, but if you have several USB controllers, creating several sys-usb-* machines each assigned just one controller, could limit the damage somewhat and allow you to at least continue using your keyboard.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Dec 10, 2017

Member

Why not use a swap file in dom0 instead of real RAM for the PCI initialization?

Because PCI devices (through DMA) can access only RAM.

Generally this issue is inherent to how Xen PV domain works. Specifically to the fact that it sees real machine addresses, and IOMMU is used only to protect some areas, but not to translate. This means that fragmented physical memory (which will happen, sooner or later, after enough VM startups, regardless of dynamic memory management) can no longer be used for DMA for some devices.

Fixing this issue properly for PV domains require substantial change to Xen memory management, like reserving some special memory pool solely for DMA buffers.
But this is far more work than we're going to put into this, especially in light of ditching PV in 4.0. For HVM domains, IOMMU is used also for memory translation, so fragmentation is not an issue anymore.

Member

marmarek commented Dec 10, 2017

Why not use a swap file in dom0 instead of real RAM for the PCI initialization?

Because PCI devices (through DMA) can access only RAM.

Generally this issue is inherent to how Xen PV domain works. Specifically to the fact that it sees real machine addresses, and IOMMU is used only to protect some areas, but not to translate. This means that fragmented physical memory (which will happen, sooner or later, after enough VM startups, regardless of dynamic memory management) can no longer be used for DMA for some devices.

Fixing this issue properly for PV domains require substantial change to Xen memory management, like reserving some special memory pool solely for DMA buffers.
But this is far more work than we're going to put into this, especially in light of ditching PV in 4.0. For HVM domains, IOMMU is used also for memory translation, so fragmentation is not an issue anymore.

@marmarek marmarek closed this Dec 10, 2017

@marmarek marmarek added the wontfix label Dec 10, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment