New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DispVM start error - error: internal error: Failed to save domain '30' with libxenlight #2482

Open
adrelanos opened this Issue Dec 4, 2016 · 6 comments

Comments

Projects
None yet
3 participants
@adrelanos
Member

adrelanos commented Dec 4, 2016

Qubes OS version (e.g., R3.2):

R3.2 with testing repository.

Affected TemplateVMs (e.g., fedora-23, if applicable):

whonix-ws (but probably not a template issue)


Actual behavior:

/usr/lib/qubes/qfile-daemon-dvm qubes.VMShell dom0 DEFAULT red
time=1480778855.52, qfile-daemon-dvm init
time=1480778855.52, creating DispVM
Can't get domid of domain name 'whonix-ws', maybe this domain does not exist.
--> Creating volatile image: /var/lib/qubes/appvms/whonix-ws-dvm/volatile.img...
--> Loading the VM (type = AppVM)...
--> Starting Qubes DB...
--> Setting Qubes DB info for the VM...
--> Updating firewall rules...
--> Starting the VM...
--> Starting Qubes GUId...
Connecting to VM's GUI agent: ..............connected
Waiting for DVM whonix-ws-dvm ...
DVM boot complete, memory used=648672. Saving image...
error: Failed to save domain whonix-ws-dvm to /var/lib/qubes/appvms/whonix-ws-dvm/dvm-savefile
error: internal error: Failed to save domain '30' with libxenlight

Already managed to reproduce this twice. Beforehand it was save domain '23' and now save domain '30'. No power loss at any time. No ctrl + c at any time. No hdd disk errors at any time.

Steps to reproduce the behavior:

Playing around with DispVMs a lot to deepen understanding. However, nothing unjustifiable. Steps here:
https://forums.whonix.org/t/qubes-dispvm-technical-discussion/3232/8

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Dec 4, 2016

Member

Take a look at /var/log/libvirt/libxl/libxl-driver.log - there should be more detailed message.

Member

marmarek commented Dec 4, 2016

Take a look at /var/log/libvirt/libxl/libxl-driver.log - there should be more detailed message.

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Dec 4, 2016

Member
2016-12-04 18:10:59 CET libxl: error: libxl_device.c:1006:device_backend_callback: unable to remove device with path /local/domain/3/backend/vif/28/0
2016-12-04 18:10:59 CET libxl: error: libxl.c:1699:devices_destroy_cb: libxl__devices_destroy failed for 28
2016-12-04 18:16:54 CET libxl: error: libxl_device.c:1006:device_backend_callback: unable to remove device with path /local/domain/3/backend/vif/29/0
2016-12-04 18:16:54 CET libxl: error: libxl.c:1699:devices_destroy_cb: libxl__devices_destroy failed for 29
2016-12-04 18:17:19 CET xc: error: Failed to get vcpu0 context (12 = Cannot allocate memory): Internal error
2016-12-04 18:17:19 CET xc: error: Save failed (12 = Cannot allocate memory): Internal error
2016-12-04 18:17:19 CET libxl: error: libxl_stream_write.c:329:libxl__xc_domain_save_done: saving domain: domain responded to suspend request: Cannot allocate memory
2016-12-04 18:17:21 CET libxl: error: libxl_device.c:1006:device_backend_callback: unable to remove device with path /local/domain/3/backend/vif/30/0
2016-12-04 18:17:22 CET libxl: error: libxl.c:1699:devices_destroy_cb: libxl__devices_destroy failed for 30
2016-12-04 18:39:43 CET libxl: error: libxl_device.c:1006:device_backend_callback: unable to remove device with path /local/domain/3/backend/vif/5/0
2016-12-04 18:39:43 CET libxl: error: libxl.c:1699:devices_destroy_cb: libxl__devices_destroy failed for 5
Member

adrelanos commented Dec 4, 2016

2016-12-04 18:10:59 CET libxl: error: libxl_device.c:1006:device_backend_callback: unable to remove device with path /local/domain/3/backend/vif/28/0
2016-12-04 18:10:59 CET libxl: error: libxl.c:1699:devices_destroy_cb: libxl__devices_destroy failed for 28
2016-12-04 18:16:54 CET libxl: error: libxl_device.c:1006:device_backend_callback: unable to remove device with path /local/domain/3/backend/vif/29/0
2016-12-04 18:16:54 CET libxl: error: libxl.c:1699:devices_destroy_cb: libxl__devices_destroy failed for 29
2016-12-04 18:17:19 CET xc: error: Failed to get vcpu0 context (12 = Cannot allocate memory): Internal error
2016-12-04 18:17:19 CET xc: error: Save failed (12 = Cannot allocate memory): Internal error
2016-12-04 18:17:19 CET libxl: error: libxl_stream_write.c:329:libxl__xc_domain_save_done: saving domain: domain responded to suspend request: Cannot allocate memory
2016-12-04 18:17:21 CET libxl: error: libxl_device.c:1006:device_backend_callback: unable to remove device with path /local/domain/3/backend/vif/30/0
2016-12-04 18:17:22 CET libxl: error: libxl.c:1699:devices_destroy_cb: libxl__devices_destroy failed for 30
2016-12-04 18:39:43 CET libxl: error: libxl_device.c:1006:device_backend_callback: unable to remove device with path /local/domain/3/backend/vif/5/0
2016-12-04 18:39:43 CET libxl: error: libxl.c:1699:devices_destroy_cb: libxl__devices_destroy failed for 5
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Dec 4, 2016

Member

Looks like out of memory. I guess you have a lot of VMs running?

Member

marmarek commented Dec 4, 2016

Looks like out of memory. I guess you have a lot of VMs running?

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Dec 4, 2016

Member

I did. I've just closed 6 VMs or so to make sure there is enough RAM. Same issue. tailed the log in meanwhile to show only what it is saying for another new try (ignoring old messages).

2016-12-04 21:13:46 CET xc: error: Failed to get vcpu0 context (12 = Cannot allocate memory): Internal error
2016-12-04 21:13:46 CET xc: error: Save failed (12 = Cannot allocate memory): Internal error
2016-12-04 21:13:46 CET libxl: error: libxl_stream_write.c:329:libxl__xc_domain_save_done: saving domain: domain responded to suspend request: Cannot allocate memory
2016-12-04 21:13:47 CET libxl: error: libxl_device.c:1006:device_backend_callback: unable to remove device with path /local/domain/3/backend/vif/35/0
2016-12-04 21:13:47 CET libxl: error: libxl.c:1699:devices_destroy_cb: libxl__devices_destroy failed for 35
Member

adrelanos commented Dec 4, 2016

I did. I've just closed 6 VMs or so to make sure there is enough RAM. Same issue. tailed the log in meanwhile to show only what it is saying for another new try (ignoring old messages).

2016-12-04 21:13:46 CET xc: error: Failed to get vcpu0 context (12 = Cannot allocate memory): Internal error
2016-12-04 21:13:46 CET xc: error: Save failed (12 = Cannot allocate memory): Internal error
2016-12-04 21:13:46 CET libxl: error: libxl_stream_write.c:329:libxl__xc_domain_save_done: saving domain: domain responded to suspend request: Cannot allocate memory
2016-12-04 21:13:47 CET libxl: error: libxl_device.c:1006:device_backend_callback: unable to remove device with path /local/domain/3/backend/vif/35/0
2016-12-04 21:13:47 CET libxl: error: libxl.c:1699:devices_destroy_cb: libxl__devices_destroy failed for 35
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Dec 4, 2016

Member

How much free memory xl info reports? Anything about it in xl dmesg (I guess nothing there)?

Member

marmarek commented Dec 4, 2016

How much free memory xl info reports? Anything about it in xl dmesg (I guess nothing there)?

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Dec 5, 2016

Member

Now, after the system was idle for some time, DispVM can be started. I'll provide this debug output again once I run into this again.

xl info
free_memory 3578

There may be something interesting indeed. Excerpt:

(XEN) Bogus DMIBAR [redacted] on [redacted]
(XEN) Scrubbing Free RAM on 1 nodes using 4 CPUs
(XEN) .................................done.
(XEN) Initial low memory virq threshold set at [redacted] pages.
(XEN) Std. Loglevel: Errors and warnings
(XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed [redacted] init memory.
(XEN) Bogus DMIBAR [redacted] on [redacted]

(Full xl dmesg sent privately to @marmarek.)

Member

adrelanos commented Dec 5, 2016

Now, after the system was idle for some time, DispVM can be started. I'll provide this debug output again once I run into this again.

xl info
free_memory 3578

There may be something interesting indeed. Excerpt:

(XEN) Bogus DMIBAR [redacted] on [redacted]
(XEN) Scrubbing Free RAM on 1 nodes using 4 CPUs
(XEN) .................................done.
(XEN) Initial low memory virq threshold set at [redacted] pages.
(XEN) Std. Loglevel: Errors and warnings
(XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed [redacted] init memory.
(XEN) Bogus DMIBAR [redacted] on [redacted]

(Full xl dmesg sent privately to @marmarek.)

@andrewdavidwong andrewdavidwong added the bug label Apr 3, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment