New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM frequently doesn't go beyond yellow (=transient) state when private storage contains many files. #2583

Closed
mycase opened this Issue Jan 15, 2017 · 2 comments

Comments

Projects
None yet
3 participants
@mycase

mycase commented Jan 15, 2017

Qubes OS version:

R3.2

Affected TemplateVMs:

fedora-24, probably also fedora-23


Expected behavior:

  • User starts VM
  • After a while the Qubes VM Manager shows a green State symbol

Actual behavior:

  • User starts VM
  • Depending on private storage size (number of files), the VM State symbol ends up green or yellow:
    -- small size: green
    -- large size: yellow
  • More specifically, qvm-start gives a timeout while waiting for the VMs qrexec agent, which in turn doesn't show up in time possibly because it frequently waits for a fsck.ext4 performed on the private storage:

running sudo journalctl -b | grep dirs produces, for a small-sized private storage VM:

Jan 15 17:58:49 localhost mount-dirs.sh[323]: Private device size management: enlarging /dev/xvdb
Jan 15 17:58:49 localhost mount-dirs.sh[323]: Private device size management: resize2fs /dev/xvdb failed:
Jan 15 17:58:49 localhost mount-dirs.sh[323]: resize2fs 1.42.13 (17-May-2015)
Jan 15 17:58:49 localhost mount-dirs.sh[323]: Please run 'e2fsck -f /dev/xvdb' first.
Jan 15 17:58:49 localhost mount-dirs.sh[323]: Private device size management: attempting to mark /dev/xvdb clean
Jan 15 17:58:49 localhost mount-dirs.sh[323]: Private device size management: /dev/xvdb marked clean, enlarging it again
Jan 15 17:58:49 localhost mount-dirs.sh[323]: Private device size management: resize2fs of /dev/xvdb succeeded
Jan 15 17:58:49 localhost mount-dirs.sh[323]: Checking /rw
Jan 15 17:58:49 localhost mount-dirs.sh[323]: Finished checking /rw
Jan 15 17:58:49 localhost mount-dirs.sh[323]: Mounting /rw/home onto /home
Jan 15 17:58:49 localhost mount-dirs.sh[323]: Bind mounting /rw/bind-dirs/var/spool/cron onto /var/spool/cron
Jan 15 17:58:50 localhost audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubes-mount-dirs comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

In case of a VM with large private storage, note the timestamps:

Jan 15 18:36:01 localhost mount-dirs.sh[305]: Private device size management: attempting to mark /dev/xvdb clean
Jan 15 18:40:54 localhost mount-dirs.sh[305]: Private device size management: /dev/xvdb marked clean, enlarging it again

The attempt at marking the private storage clean (which runs fsck.ext4) takes almost 5 minutes in this specific case: more than the 60 seconds timeout in qvm-start, which leaves the VM State symbol dangling in a yellow state. If however one waits for the fsck to complete, the VM is usable. The irritating thing is that at the next startup, it will often run fsck again. It would seem therefore that the shutdown procedure frequently leaves the device in a state that resize2fs considers dirty.

Steps to reproduce the behavior:

  • stop and restart a fedora-24 based AppVM with enough files in private storage to make fsck.ext4 take more than a minute to complete.
  • the phenomenon only shows up if resize2fs fails during startup, which is frequently but not always the case.

General notes:

  • The resize2fs failure is frequently, but not always, visible in fedora-24-based VMs, also when private storage has a small nr of files:
[user@untrusted24 ~]$ sudo journalctl | head -1
-- Logs begin at Wed 2016-12-21 15:11:16 CET, end at Sun 2017-01-15 19:15:44 CET. --
[user@untrusted24 ~]$ sudo journalctl | grep 'resize2fs'
Dec 25 22:49:37 localhost mount-dirs.sh[304]: Private device size management: resize2fs /dev/xvdb failed:
Dec 25 22:49:37 localhost mount-dirs.sh[304]: resize2fs 1.42.13 (17-May-2015)
Dec 25 22:49:37 localhost mount-dirs.sh[304]: Private device size management: resize2fs of /dev/xvdb succeeded
Dec 25 22:50:13 desktempl24 sudo[966]:     user : TTY=pts/0 ; PWD=/home/user ; USER=root ; COMMAND=/sbin/resize2fs /dev/mapper/dmroot
Dec 25 22:51:26 localhost mount-dirs.sh[302]: Private device size management: resize2fs of /dev/xvdb succeeded
Dec 26 00:14:28 localhost mount-dirs.sh[300]: Private device size management: resize2fs /dev/xvdb failed:
Dec 26 00:14:28 localhost mount-dirs.sh[300]: resize2fs 1.42.13 (17-May-2015)
Dec 26 00:14:28 localhost mount-dirs.sh[300]: Private device size management: resize2fs of /dev/xvdb succeeded
Dec 26 00:38:29 localhost mount-dirs.sh[304]: Private device size management: resize2fs of /dev/xvdb succeeded
Dec 26 22:36:17 desktempl24 mount-dirs.sh[293]: Private device size management: resize2fs /dev/xvdb failed:
Dec 26 22:36:17 desktempl24 mount-dirs.sh[293]: resize2fs 1.42.13 (17-May-2015)
Dec 26 22:36:17 desktempl24 mount-dirs.sh[293]: Private device size management: resize2fs of /dev/xvdb succeeded
Dec 26 23:04:26 localhost mount-dirs.sh[303]: Private device size management: resize2fs of /dev/xvdb succeeded
Dec 27 02:32:48 localhost mount-dirs.sh[304]: Private device size management: resize2fs /dev/xvdb failed:
Dec 27 02:32:48 localhost mount-dirs.sh[304]: resize2fs 1.42.13 (17-May-2015)
Dec 27 02:32:48 localhost mount-dirs.sh[304]: Private device size management: resize2fs of /dev/xvdb succeeded
Dec 27 10:51:44 desktempl24 mount-dirs.sh[297]: Private device size management: resize2fs of /dev/xvdb succeeded
Dec 27 22:32:26 localhost mount-dirs.sh[304]: Private device size management: resize2fs /dev/xvdb failed:
Dec 27 22:32:26 localhost mount-dirs.sh[304]: resize2fs 1.42.13 (17-May-2015)
Dec 27 22:32:26 localhost mount-dirs.sh[304]: Private device size management: resize2fs of /dev/xvdb succeeded
Dec 28 09:31:29 localhost mount-dirs.sh[302]: Private device size management: resize2fs /dev/xvdb failed:
Dec 28 09:31:29 localhost mount-dirs.sh[302]: resize2fs 1.42.13 (17-May-2015)
Dec 28 09:31:29 localhost mount-dirs.sh[302]: Private device size management: resize2fs of /dev/xvdb succeeded
Dec 28 18:39:14 localhost mount-dirs.sh[300]: Private device size management: resize2fs /dev/xvdb failed:
Dec 28 18:39:14 localhost mount-dirs.sh[300]: resize2fs 1.42.13 (17-May-2015)
Dec 28 18:39:14 localhost mount-dirs.sh[300]: Private device size management: resize2fs of /dev/xvdb succeeded
Dec 28 19:13:00 localhost mount-dirs.sh[301]: Private device size management: resize2fs of /dev/xvdb succeeded
Dec 30 21:48:41 localhost mount-dirs.sh[263]: Private device size management: resize2fs /dev/xvdb failed:
Dec 30 21:48:41 localhost mount-dirs.sh[263]: resize2fs 1.42.13 (17-May-2015)
Dec 30 21:48:41 localhost mount-dirs.sh[263]: Private device size management: resize2fs of /dev/xvdb succeeded
Jan 08 10:17:32 localhost mount-dirs.sh[303]: Private device size management: resize2fs of /dev/xvdb succeeded
Jan 14 18:29:18 desktempl24 mount-dirs.sh[301]: Private device size management: resize2fs /dev/xvdb failed:
Jan 14 18:29:18 desktempl24 mount-dirs.sh[301]: resize2fs 1.42.13 (17-May-2015)
Jan 14 18:29:18 desktempl24 mount-dirs.sh[301]: Private device size management: resize2fs of /dev/xvdb succeeded
Jan 15 17:05:16 localhost mount-dirs.sh[298]: Private device size management: resize2fs /dev/xvdb failed:
Jan 15 17:05:16 localhost mount-dirs.sh[298]: resize2fs 1.42.13 (17-May-2015)
Jan 15 17:05:16 localhost mount-dirs.sh[298]: Private device size management: resize2fs of /dev/xvdb succeeded
Jan 15 17:11:51 localhost mount-dirs.sh[301]: Private device size management: resize2fs of /dev/xvdb succeeded
Jan 15 17:28:43 desktempl24 mount-dirs.sh[297]: Private device size management: resize2fs /dev/xvdb failed:
Jan 15 17:28:43 desktempl24 mount-dirs.sh[297]: resize2fs 1.42.13 (17-May-2015)
Jan 15 17:28:43 desktempl24 mount-dirs.sh[297]: Private device size management: resize2fs of /dev/xvdb succeeded
Jan 15 17:38:39 desktempl24 mount-dirs.sh[301]: Private device size management: resize2fs of /dev/xvdb succeeded
Jan 15 17:58:49 localhost mount-dirs.sh[323]: Private device size management: resize2fs /dev/xvdb failed:
Jan 15 17:58:49 localhost mount-dirs.sh[323]: resize2fs 1.42.13 (17-May-2015)
Jan 15 17:58:49 localhost mount-dirs.sh[323]: Private device size management: resize2fs of /dev/xvdb succeeded

  • sys-firewall and sys-net (running fedora-23) also frequently run fsck on /rw at startup.

  • The yellow-State-for-large-storage phenomenon only showed up on my computer today, although:

  • The e2fsck phenomenon seems to have been present for some time already. It may or may not be a feature.

  • I recently added a large number of files (200000+) which presumably made the fsck slower.

  • I did a dnf update recently, which may or may not have made fsck slower.

Related issues:

@andrewdavidwong andrewdavidwong added this to the Release 3.2 updates milestone Jan 15, 2017

@marmarek marmarek self-assigned this Feb 27, 2017

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Feb 27, 2017

Member

The bug is that fsck -f is called at each startup. So even if the filesystem is clean, check is forced...

Member

marmarek commented Feb 27, 2017

The bug is that fsck -f is called at each startup. So even if the filesystem is clean, check is forced...

@mycase

This comment has been minimized.

Show comment
Hide comment
@mycase

mycase Feb 27, 2017

The non-deterministic aspect that I mentioned is exemplified by the following output of sudo journalctl | grep 'resize2fs' on an AppVM (presumably localhost below), with underlying template desktempl24:

Feb 10 01:15:22 desktempl24 mount-dirs.sh[308]: Private device size management: resize2fs /dev/xvdb failed:
Feb 10 01:15:22 desktempl24 mount-dirs.sh[308]: resize2fs 1.42.13 (17-May-2015)
Feb 10 01:15:22 desktempl24 mount-dirs.sh[308]: Private device size management: resize2fs of /dev/xvdb succeeded
Feb 19 22:25:11 localhost mount-dirs.sh[311]: Private device size management: resize2fs of /dev/xvdb succeeded
Feb 23 16:38:51 desktempl24 mount-dirs.sh[288]: Private device size management: resize2fs /dev/xvdb failed:
Feb 23 16:38:51 desktempl24 mount-dirs.sh[288]: resize2fs 1.42.13 (17-May-2015)
Feb 23 16:38:51 desktempl24 mount-dirs.sh[288]: Private device size management: resize2fs of /dev/xvdb succeeded
Feb 23 18:06:20 localhost mount-dirs.sh[329]: Private device size management: resize2fs /dev/xvdb failed:
Feb 23 18:06:20 localhost mount-dirs.sh[329]: resize2fs 1.42.13 (17-May-2015)
Feb 23 18:06:22 localhost mount-dirs.sh[329]: Private device size management: resize2fs of /dev/xvdb succeeded

Sometimes resize2fs immediately succeeds, like at Feb 19 22:25:11, sometimes it fails first, like at Feb 10 01:15:22 . I don't know if these entries were all generated at VM startup. (either AppVM or TemplVM).

mycase commented Feb 27, 2017

The non-deterministic aspect that I mentioned is exemplified by the following output of sudo journalctl | grep 'resize2fs' on an AppVM (presumably localhost below), with underlying template desktempl24:

Feb 10 01:15:22 desktempl24 mount-dirs.sh[308]: Private device size management: resize2fs /dev/xvdb failed:
Feb 10 01:15:22 desktempl24 mount-dirs.sh[308]: resize2fs 1.42.13 (17-May-2015)
Feb 10 01:15:22 desktempl24 mount-dirs.sh[308]: Private device size management: resize2fs of /dev/xvdb succeeded
Feb 19 22:25:11 localhost mount-dirs.sh[311]: Private device size management: resize2fs of /dev/xvdb succeeded
Feb 23 16:38:51 desktempl24 mount-dirs.sh[288]: Private device size management: resize2fs /dev/xvdb failed:
Feb 23 16:38:51 desktempl24 mount-dirs.sh[288]: resize2fs 1.42.13 (17-May-2015)
Feb 23 16:38:51 desktempl24 mount-dirs.sh[288]: Private device size management: resize2fs of /dev/xvdb succeeded
Feb 23 18:06:20 localhost mount-dirs.sh[329]: Private device size management: resize2fs /dev/xvdb failed:
Feb 23 18:06:20 localhost mount-dirs.sh[329]: resize2fs 1.42.13 (17-May-2015)
Feb 23 18:06:22 localhost mount-dirs.sh[329]: Private device size management: resize2fs of /dev/xvdb succeeded

Sometimes resize2fs immediately succeeds, like at Feb 19 22:25:11, sometimes it fails first, like at Feb 10 01:15:22 . I don't know if these entries were all generated at VM startup. (either AppVM or TemplVM).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment