New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Old VM's status not fully responding in Qubes widget, yet new VM's work 'mostly' perfectly with an occasional opposite/mirrored issue #3660

Closed
Aekez opened this Issue Mar 5, 2018 · 16 comments

Comments

Projects
None yet
@Aekez

Aekez commented Mar 5, 2018

Qubes OS version:

Qubes 4 RC-3

Affected component(s):

The tools that keep track of whether an App is fully started and running correctly. See attachment picture for an example of how it looks like.
screenshot_2018-03-05_16-48-47

Furthermore the links to the logs are broken and not responsive when clicking on them (see second picture attachment).

screenshot_2018-03-05_17-23-35


Steps to reproduce the behavior:

The biggest clue of them all seem to be that this does never happen to new AppVM's created after this bug started happening. It only happens to AppVM's created before the bug started appearing. So it is possibly a config file inside the AppVM? or a qvm-prefs setting perhaps? (See point 5 Actual behavior down below, in relation to this clue).

Further information:

  • Starting AppVM after recent current-testing update.

  • It happens some 8 out of 10 times simply by starting the VM's, out of the so far 20 or so AppVM boots.

  • As seen in the picture, 3 are showing the wait animation, while the 2 above started correctly, as well as sys-net and sys-firewall below, which also start correctly.

  • It happens to all VM's, be it fedora, fedora cloned templates, debian, or whonix.

  • The same happens to templates, it's not AppVM exclusive.

  • Oddly, I've yet never seen it happen to sys-net and sys-firewall.

  • Multiple full system restart, fully powered down, does not fix the issue.

  • All VM's and dom0 use same current-testing repositories.

  • Metadata expiration are accounted for with '--refresh' on all fedora templates, and --clean in dom0 (--refresh or --action=refresh in dom0 seems not to work, so using --clean instead for the dom0 cases).

Expected behavior:

1-EB) That the Qubes widget reports the VM's fully started correctly.
2-EB) A working widget animation icon for VM's during start-up.
3-EB) That the links to the logs during VM boot works.
4-EB) That it does not affect the VM's performance and system reliability.
5-EB) That template and AppVM behaves the same way.
6-EB) Ability to properly shutdown the VM via the Qubes widget.

Actual behavior:

1-AB) The Qubes widget does not correctly report that the VM has fully started (only the case for old VM's made before the bug appeared. In contrast to new AppVMs it always report the wait animation icon during boot, but it only sometimes disappears after succssful AppVM boot.).
2-AB) "Sometimes" no working widget animation icon for the new AppVM's during start-up (Only the case for new VM's after the bug appeared).
3-AB) The links to the logs during VM boot do not work.
4-AB) It does not appear to affect the VM's performance and system reliability, in fact, it seems to only be a visual issue in the widget (for the time being at least).
5-AB) Oddly, despite that the old AppVM is based on old template, a new AppVM based on the old template seems to have no issues. This is a puzzling behavior, taking into account that the template shows similar issues to that of the old AppVM, yet the new AppVM doesn't.
6-AB) There is no way to properly shutdown the VM in the Qubes widget, it only shows a kill VM in its place, as shown in the second picture attachment. VM can only be closed properly from the VM's own terminal, or by using dom0 qvm-shutdown.

General notes:

It seems that 1-AB and 2-AB are somewhat directly opposites of each others. Old VM's always have a wait animation icon during VM boot, but it mostly doesn't go away after successful boot. In contrast new VM's mostly have an animation icon during boot, but sometimes they don't. Too few confirmed observations to be sure this is the exact behavior, there might be variations.


Related issues:

@Aekez Aekez changed the title from Old VM's not fully responding in Qubes widget, yet new ones work perfectly (possibly a minor issue) to Old VM's status not fully responding in Qubes widget, yet new VM's work 'mostly' perfectly with an occasional opposite/mirrored issue (possibly minor) Mar 5, 2018

@andrewdavidwong andrewdavidwong added this to the Release 4.0 milestone Mar 6, 2018

@tasket

This comment has been minimized.

Show comment
Hide comment
@tasket

tasket Mar 8, 2018

After upgrading (post rc5 release) yesterday using qubes*testing I have the same problem, although old/new VM status seems to have nothing to do with it.

At times about 80% of my running VMs have the spinner and incorrect menu entries.

tasket commented Mar 8, 2018

After upgrading (post rc5 release) yesterday using qubes*testing I have the same problem, although old/new VM status seems to have nothing to do with it.

At times about 80% of my running VMs have the spinner and incorrect menu entries.

@Aekez

This comment has been minimized.

Show comment
Hide comment
@Aekez

Aekez Mar 8, 2018

How odd that you don't have the new/old issue, well at least we can quantify most of it.

Making a circular reference back to Qubes users, so that marmarta has more information to go on. Others might post in that mail thread later too. https://groups.google.com/forum/#!topic/qubes-users/3zmeoR82JRs

Aekez commented Mar 8, 2018

How odd that you don't have the new/old issue, well at least we can quantify most of it.

Making a circular reference back to Qubes users, so that marmarta has more information to go on. Others might post in that mail thread later too. https://groups.google.com/forum/#!topic/qubes-users/3zmeoR82JRs

@mirrorway

This comment has been minimized.

Show comment
Hide comment
@mirrorway

mirrorway Mar 13, 2018

I have this issue too on current-testing, usually 1-2 VMs will be stuck in the swirly state, and the applet menu entries non-functional.

I have this issue too on current-testing, usually 1-2 VMs will be stuck in the swirly state, and the applet menu entries non-functional.

@pugege

This comment has been minimized.

Show comment
Hide comment
@pugege

pugege Mar 15, 2018

Persistent nuisance bug requiring selective shutdown with qube manager in order to avoid potential data loss. On my system numerous random qubes are affected.

pugege commented Mar 15, 2018

Persistent nuisance bug requiring selective shutdown with qube manager in order to avoid potential data loss. On my system numerous random qubes are affected.

@awokd

This comment has been minimized.

Show comment
Hide comment
@awokd

awokd Mar 16, 2018

Seeing it on 4.0 current but if I use a combination of command line and Qube Manager instead of the widget, it doesn't seem to hurt anything. Not sure I've fully exercised it, though.

awokd commented Mar 16, 2018

Seeing it on 4.0 current but if I use a combination of command line and Qube Manager instead of the widget, it doesn't seem to hurt anything. Not sure I've fully exercised it, though.

@mossy-nw

This comment has been minimized.

Show comment
Hide comment
@mossy-nw

mossy-nw Mar 18, 2018

I have this too since upgrading from rc4 via qubes*testing -- likewise I can use commandline tools qvm-ls, qvm-shutdown with no apparent drawbacks. Shutting down from command line also sometimes gives me the popup Domain qubename is starting or inversely, starting from command-line sometimes gives the popup Domain qubename is halting

I have this too since upgrading from rc4 via qubes*testing -- likewise I can use commandline tools qvm-ls, qvm-shutdown with no apparent drawbacks. Shutting down from command line also sometimes gives me the popup Domain qubename is starting or inversely, starting from command-line sometimes gives the popup Domain qubename is halting

@shunju

This comment has been minimized.

Show comment
Hide comment
@shunju

shunju Mar 19, 2018

I have exactly the same as @mossy-nw since an update of Debian forced me to also update dom0 (just ran sudo qubes-dom0-update). Everything worked fine in rc4, now most of the time I have a VM that is reported starting or shutting down while it’s just normally running. And messages for starting and halting are often inverted.

shunju commented Mar 19, 2018

I have exactly the same as @mossy-nw since an update of Debian forced me to also update dom0 (just ran sudo qubes-dom0-update). Everything worked fine in rc4, now most of the time I have a VM that is reported starting or shutting down while it’s just normally running. And messages for starting and halting are often inverted.

@pugege

This comment has been minimized.

Show comment
Hide comment
@pugege

pugege Mar 23, 2018

Issue persists following 22 Mar cur-testing upgrades. There may be a pattern in a qube displaying the animation. On my system, using stretch templates, if a qube first displays "qube is halted" upon startup with cli or menu, the animation appears. Qubes displaying the expected "qube is started then "qube is started" only do not display the animation.

pugege commented Mar 23, 2018

Issue persists following 22 Mar cur-testing upgrades. There may be a pattern in a qube displaying the animation. On my system, using stretch templates, if a qube first displays "qube is halted" upon startup with cli or menu, the animation appears. Qubes displaying the expected "qube is started then "qube is started" only do not display the animation.

@pugege

This comment has been minimized.

Show comment
Hide comment
@pugege

pugege Mar 23, 2018

Qubes displaying the expected "qube is starting" then "qube is started" only do not display the animation.

pugege commented Mar 23, 2018

Qubes displaying the expected "qube is starting" then "qube is started" only do not display the animation.

@opposablebrain

This comment has been minimized.

Show comment
Hide comment
@opposablebrain

opposablebrain Apr 4, 2018

Seeing the same issue. No apparent correlation between "old"/"new" VMs.
However: Any VMs I set to "start automatically on boot" behave as expected, until I shutdown and restart them.
Also, my win7x64 VM behaves correctly every time, regardless of how it is started.
Hope this helps.

opposablebrain commented Apr 4, 2018

Seeing the same issue. No apparent correlation between "old"/"new" VMs.
However: Any VMs I set to "start automatically on boot" behave as expected, until I shutdown and restart them.
Also, my win7x64 VM behaves correctly every time, regardless of how it is started.
Hope this helps.

@pugege

This comment has been minimized.

Show comment
Hide comment
@pugege

pugege Apr 4, 2018

Is the errant animation really necessary even when functioning normally? If not, wouldn't removing it from the code resolve this issue?

pugege commented Apr 4, 2018

Is the errant animation really necessary even when functioning normally? If not, wouldn't removing it from the code resolve this issue?

@opposablebrain

This comment has been minimized.

Show comment
Hide comment
@opposablebrain

opposablebrain Apr 4, 2018

Presumably, the code is looking for successful completion of something, in which case I'd prefer someone to investigate rather than just comment out the symptoms.

Presumably, the code is looking for successful completion of something, in which case I'd prefer someone to investigate rather than just comment out the symptoms.

@mossy-nw

This comment has been minimized.

Show comment
Hide comment
@mossy-nw

mossy-nw Apr 4, 2018

I'm still on R4_rc5 but it's definitely not just an animation issue -- VMs affected by the animation issue have no option to Shut Down, so the only choice from the menu is to Kill the VM.

mossy-nw commented Apr 4, 2018

I'm still on R4_rc5 but it's definitely not just an animation issue -- VMs affected by the animation issue have no option to Shut Down, so the only choice from the menu is to Kill the VM.

@Aekez

This comment has been minimized.

Show comment
Hide comment
@Aekez

Aekez Apr 23, 2018

I can confirm anecdotally that on this particular setup of Qubes 4, RC4, as of current-testing update today (updates released yesterday), I "almost" no longer experience this bug (at least by all appearance). It still happens on occasions, and I haven't fully discovered the trigger for the few times when it still happens yet, but it's rare enough to make a big positive difference compared to before.

Also a pleasant surprise, in addition to being able to properly shutdown the VM's in the menu, the new links introduced the previous week, i.e. for terminal, also work now. Although the log links during VM startup still doesn't work.

Aekez commented Apr 23, 2018

I can confirm anecdotally that on this particular setup of Qubes 4, RC4, as of current-testing update today (updates released yesterday), I "almost" no longer experience this bug (at least by all appearance). It still happens on occasions, and I haven't fully discovered the trigger for the few times when it still happens yet, but it's rare enough to make a big positive difference compared to before.

Also a pleasant surprise, in addition to being able to properly shutdown the VM's in the menu, the new links introduced the previous week, i.e. for terminal, also work now. Although the log links during VM startup still doesn't work.

@andrewdavidwong andrewdavidwong changed the title from Old VM's status not fully responding in Qubes widget, yet new VM's work 'mostly' perfectly with an occasional opposite/mirrored issue (possibly minor) to Old VM's status not fully responding in Qubes widget, yet new VM's work 'mostly' perfectly with an occasional opposite/mirrored issue Apr 24, 2018

@Aekez

This comment has been minimized.

Show comment
Hide comment
@Aekez

Aekez Apr 24, 2018

Update for some persistent VM's:
Found that some VM's still have the issue consistently, while other VM's have it very rarely now (or maybe triggered by rare events?). I found a work-around which is a bit dramatic, simply delete the old VM and make a new one. It solved my issue on 3 out of 7 remaining fedora AppVM's that still have this issue.

  • I focused on the 3 out of 7 found VM's where it still happened on a consistent basis, that being more likely to trigger the menu bug than not triggering it.

  • I made new VM's for the same purposes as the VM's that persistently had the menu bug, and then copied or moved all the data (i.e. firefox and thunderbird folders, needed cache's, hidden folders, etc., and other necessaries, to the new VM. Actually copying the whole Home folder to the new AppVM might work too, but I didn't test it.

  • I tested if all the data worked as it should and hash verified the important bits. Then I deleted the old VM.

  • The bug now rarely happens for this new re-made VM too, for now I've not seen it happen (been half a day).

  • Only tested on fedora VM's. Seemingly new Whonix VM's still easier trigger the bug, even if making new fresh VM's.

  • Chain opening sys-net or sys-whonix by opening another dependent VM while these two are shutdown, seems to also still provoke the bug more than it would by opening individual VM, although it doesn't always work to open them individually, just "less" frequently, it seems. So chain opening depending network AppVM chains seems like it provokes it.

Needed confirmation:
I got a few VM's left to test on. Any suggestion for what I can delete or modify to hypothetically remove old cache's or similar, that induces the same behaviour, as when making a fresh new AppVM? Where does the service data go for this mechanism that the bug relies on, is it somewhere in dom0 or inside the AppVM? /rw? Having this work-around could be neat short-term solution I think, for those who do not want to re-make AppVM's, or got many AppVM's to re-make. If deleting or replacing a file or cache folder can fix it manually, I imagine some people might find a quick work-around useful. I'm also not sure if this work-around fix is lasting or not, only tested it for half a day so far.

Aekez commented Apr 24, 2018

Update for some persistent VM's:
Found that some VM's still have the issue consistently, while other VM's have it very rarely now (or maybe triggered by rare events?). I found a work-around which is a bit dramatic, simply delete the old VM and make a new one. It solved my issue on 3 out of 7 remaining fedora AppVM's that still have this issue.

  • I focused on the 3 out of 7 found VM's where it still happened on a consistent basis, that being more likely to trigger the menu bug than not triggering it.

  • I made new VM's for the same purposes as the VM's that persistently had the menu bug, and then copied or moved all the data (i.e. firefox and thunderbird folders, needed cache's, hidden folders, etc., and other necessaries, to the new VM. Actually copying the whole Home folder to the new AppVM might work too, but I didn't test it.

  • I tested if all the data worked as it should and hash verified the important bits. Then I deleted the old VM.

  • The bug now rarely happens for this new re-made VM too, for now I've not seen it happen (been half a day).

  • Only tested on fedora VM's. Seemingly new Whonix VM's still easier trigger the bug, even if making new fresh VM's.

  • Chain opening sys-net or sys-whonix by opening another dependent VM while these two are shutdown, seems to also still provoke the bug more than it would by opening individual VM, although it doesn't always work to open them individually, just "less" frequently, it seems. So chain opening depending network AppVM chains seems like it provokes it.

Needed confirmation:
I got a few VM's left to test on. Any suggestion for what I can delete or modify to hypothetically remove old cache's or similar, that induces the same behaviour, as when making a fresh new AppVM? Where does the service data go for this mechanism that the bug relies on, is it somewhere in dom0 or inside the AppVM? /rw? Having this work-around could be neat short-term solution I think, for those who do not want to re-make AppVM's, or got many AppVM's to re-make. If deleting or replacing a file or cache folder can fix it manually, I imagine some people might find a quick work-around useful. I'm also not sure if this work-around fix is lasting or not, only tested it for half a day so far.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Jul 11, 2018

Member

This is fixed with QubesOS/updates-status#562 (in current-testing right now).

Member

marmarek commented Jul 11, 2018

This is fixed with QubesOS/updates-status#562 (in current-testing right now).

@marmarek marmarek closed this Jul 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment