Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upVM instance of QubesDB (sometimes) crashes on DispVM restore #1389
Comments
marmarek
added this to the Release 3.1 milestone
Nov 7, 2015
marmarek
added
bug
C: core
P: minor
labels
Nov 7, 2015
added a commit
to marmarek/old-qubes-gui-agent-linux
that referenced
this issue
Nov 8, 2015
added a commit
to QubesOS/qubes-gui-agent-linux
that referenced
this issue
Nov 13, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Nov 26, 2015
Member
The same applies to qrexec-agent data connection, also on DispVM restore. So it looks like vchan problem.
|
The same applies to qrexec-agent data connection, also on DispVM restore. So it looks like vchan problem. |
marmarek
added
the
C: xen
label
Nov 26, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Dec 13, 2015
Member
It may cause DispVM IP address to be wrong - because /usr/lib/qubes/setup-ip script can't get the right one from QubesDB.
|
It may cause DispVM IP address to be wrong - because |
marmarek
added
P: critical
and removed
P: minor
labels
Dec 22, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
i7u
Dec 24, 2015
A side effect of this bug seems to be:
[user@fedora-23-dvm ~]$ cat /etc/resolv.conf
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB[user@fedora-23-dvm ~]$
This happens 100% of the times when I launch a DispVM on a fresh 3.1rc1 installation.
i7u
commented
Dec 24, 2015
|
A side effect of this bug seems to be: This happens 100% of the times when I launch a DispVM on a fresh 3.1rc1 installation. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Rudd-O
commented
Dec 26, 2015
|
Affects me too. Qubes DB crashes. Confirmed via xl console. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
m-v-b
Jan 4, 2016
Given that this issue is intermittent, there appears to be a small race condition.
I found that removing the following lines in 01QubesDisposableVm.py (in /usr/lib64/python2.7/site-packages/qubes/modules) resolves this issue:
# close() is not really needed, because the descriptor is close-on-exec
# anyway, the reason to postpone close() is that possibly xl is not done
# constructing the domain after its main process exits
# so we close() when we know the domain is up
# the successful unpause is some indicator of it
if qmemman_present:
qmemman_client.close()As the comments indicate, closing the Qubes memory manager is used as a synchronization point; however, this does not seem to be necessary and appears to cause the issue reported in this bug.
To prove this, I wrote a simple shell script, available at [1].
After removing the aforementioned two lines from 01QubesDisposableVm.py, my script runs to completion, and more importantly, I have never encountered this issue after the proposed modification.
I also found another option (but a rather unacceptable one). Adding a sleep after resuming the virtual machine in the same file, after the following line, also resolves the issue:
self.libvirt_domain.resume()
m-v-b
commented
Jan 4, 2016
|
Given that this issue is intermittent, there appears to be a small race condition. I found that removing the following lines in # close() is not really needed, because the descriptor is close-on-exec
# anyway, the reason to postpone close() is that possibly xl is not done
# constructing the domain after its main process exits
# so we close() when we know the domain is up
# the successful unpause is some indicator of it
if qmemman_present:
qmemman_client.close()As the comments indicate, closing the Qubes memory manager is used as a synchronization point; however, this does not seem to be necessary and appears to cause the issue reported in this bug. To prove this, I wrote a simple shell script, available at [1]. After removing the aforementioned two lines from I also found another option (but a rather unacceptable one). Adding a sleep after resuming the virtual machine in the same file, after the following line, also resolves the issue: self.libvirt_domain.resume() |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Jan 4, 2016
Member
qmemman_client.close() basically resumes qmemman operations (dynamic management of VM memory). So maybe the problem is that memory used by vchan during connection is taken away by qmemman? That would be a kernel bug...
|
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Jan 5, 2016
Member
Progress: that xc_gnttab_map_grant_refs: mmap failed is in fact failed GNTTABOP_map_grant_ref hypercall fail. The actual error is GNTST_no_device_space (-7) /* Out of space in I/O MMU. */ (got that using kernel patch, because it isn't logged anywhere...)
|
Progress: that |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Jan 5, 2016
Member
Generally this is qmemman problem:
When VM have some memory assigned, it means two things:
- upper limit on VM memory allocation
- target memory size for balloon driver
When memory allocation is changed, both values are changed. But actual memory usage will change only after balloon driver will balloon up/down the VM memory (give the memory back to the hypervisor, or take it from there). Until that happens, memory assigned, but not yet allocated by the VM, is considered "free" from hypervisor point of view (as in xl info, also xl list displays actual memory usage, not "target memory"). In such case, VM is free to allocate such memory (up to assigned limit) at any time.
Exactly this is happening during DispVM startup:
QubesDisposableVM.startrequests some memory (initial memory size for DispVM template - *-dvm VM) from qmemman to start new DispVM (e.g. 400MB)- DispVM is restored from a savefile, using only memory that was allocated at savefile creation time (e.g. 280MB)
- Now DispVM is using some memory (280MB), but is allowed to use that initial size (400MB). The difference (120MB) is considered "free".
qmemmanredistribute that free memory among other VMs, leaving 50MB safety margin- DispVM, after some time, allocate remaining memory, draining all the memory from Xen free pool
- Bad Things(tm) are happening - in this case grant table operations failures
qmemmanadjust memory assignments, so it looks ok a moment later (making debugging harder)
Note that this is nothing specific to DispVM, nor savefile usage. Any VM, with misbehaving balloon driver, could trigger such problem. The problem is that, qmemman doesn't handle memory assigned to some VM, but not used yet.
|
Generally this is qmemman problem:
When memory allocation is changed, both values are changed. But actual memory usage will change only after balloon driver will balloon up/down the VM memory (give the memory back to the hypervisor, or take it from there). Until that happens, memory assigned, but not yet allocated by the VM, is considered "free" from hypervisor point of view (as in Exactly this is happening during DispVM startup:
Note that this is nothing specific to DispVM, nor savefile usage. Any VM, with misbehaving balloon driver, could trigger such problem. The problem is that, |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Jan 5, 2016
Member
Debugging hint:
watch -n 0.2 xl info\;xl list
And carefully observe the screen during DispVM startup.
|
Debugging hint:
And carefully observe the screen during DispVM startup. |
marmarek
self-assigned this
Jan 5, 2016
marmarek
closed this
in
marmarek/old-qubes-core-admin@181eb3e
Jan 5, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Jan 7, 2016
Member
Automated announcement from builder-github
The package qubes-core-dom0-3.1.9-1.fc20 has been pushed to the r3.1 testing repository for dom0.
To test this update, please install it with the following command:
sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing
|
Automated announcement from builder-github The package
|
marmarek
added
the
r3.1-dom0-cur-test
label
Jan 7, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Jan 12, 2016
Member
Automated announcement from builder-github
The package qubes-core-dom0-3.1.10-1.fc20 has been pushed to the r3.1 stable repository for dom0.
To install this update, please use the standard update command:
sudo qubes-dom0-update
Or update dom0 via Qubes Manager.
|
Automated announcement from builder-github The package
Or update dom0 via Qubes Manager. |
marmarek
added
r3.1-dom0-stable
and removed
r3.1-dom0-cur-test
labels
Jan 12, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Jan 14, 2016
Member
Apparently race condition mentioned here is more common that I thought.
|
Apparently race condition mentioned here is more common that I thought. |
marmarek
reopened this
Jan 14, 2016
marmarek
removed
the
r3.1-dom0-stable
label
Jan 14, 2016
added a commit
to marmarek/old-qubes-core-admin
that referenced
this issue
Jan 14, 2016
added a commit
to marmarek/old-qubes-core-admin
that referenced
this issue
Jan 14, 2016
added a commit
to marmarek/old-qubes-core-admin
that referenced
this issue
Jan 14, 2016
added a commit
to marmarek/old-qubes-core-admin
that referenced
this issue
Jan 14, 2016
marmarek
closed this
in
marmarek/old-qubes-core-admin@5d36923
Jan 14, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Jan 15, 2016
Member
Automated announcement from builder-github
The package qubes-core-dom0-3.1.11-1.fc20 has been pushed to the r3.1 testing repository for dom0.
To test this update, please install it with the following command:
sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing
|
Automated announcement from builder-github The package
|
marmarek
added
the
r3.1-dom0-cur-test
label
Jan 15, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Feb 8, 2016
Member
Automated announcement from builder-github
The package qubes-core-dom0-3.1.11-1.fc20 has been pushed to the r3.1 stable repository for dom0.
To install this update, please use the standard update command:
sudo qubes-dom0-update
Or update dom0 via Qubes Manager.
|
Automated announcement from builder-github The package
Or update dom0 via Qubes Manager. |
marmarek
added
r3.1-dom0-stable
and removed
r3.1-dom0-cur-test
labels
Feb 8, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Rudd-O
commented
Feb 10, 2016
|
I tried to get the updates but it no workie. :-( |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Rudd-O
commented
Feb 10, 2016
|
Wait, the package is already installed. Nice. I'm a klutz. |
marmarek commentedNov 7, 2015
Additionally when it happens, gui-agent spins in endless loop trying to read
QubesDB watch (waits for DispVM restore).