New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updating fedora-25 causes template root filesystem corruption #3370

Closed
jpouellet opened this Issue Dec 7, 2017 · 3 comments

Comments

Projects
None yet
3 participants
@jpouellet
Contributor

jpouellet commented Dec 7, 2017

Qubes OS version:

R4-rc3

Affected TemplateVMs:

at least fedora-25

Steps to reproduce the behavior & Actual behavior

[user@dom0 ~]$ rpm -q qubes-template-fedora-25
qubes-template-fedora-25-4.0.0-201711210613.noarch
[user@dom0 ~]$ qvm-clone fedora-25 f25-test
[user@dom0 ~]$ qvm-run f25-test gnome-terminal
[user@f25-test2 ~]$ sudo dnf update
Qubes OS Repository for VM (updates)             20 kB/s |  73 kB     00:03    
Fedora 25 - x86_64 - Updates                    7.7 MB/s |  24 MB     00:03    
Dependencies resolved.
================================================================================
 Package                  Arch   Version            Repository             Size
================================================================================
Installing:
 kernel-core              x86_64 4.13.16-100.fc25   updates                21 M
 kernel-devel             x86_64 4.13.16-100.fc25   updates                12 M
Upgrading:
 hplip                    x86_64 3.17.10-3.fc25     updates                14 M
 hplip-common             x86_64 3.17.10-3.fc25     updates               104 k
 hplip-libs               x86_64 3.17.10-3.fc25     updates               192 k
 julietaula-montserrat-fonts
                          noarch 1:7.200-1.fc25     updates               4.1 M
 kernel-headers           x86_64 4.13.16-100.fc25   updates               1.2 M
 libglvnd                 x86_64 1:1.0.0-1.fc25     updates                91 k
 libglvnd-egl             x86_64 1:1.0.0-1.fc25     updates                45 k
 libglvnd-gles            x86_64 1:1.0.0-1.fc25     updates                36 k
 libglvnd-glx             x86_64 1:1.0.0-1.fc25     updates               126 k
 libsane-hpaio            x86_64 3.17.10-3.fc25     updates               120 k
 libtalloc                x86_64 2.1.10-2.fc25      updates                45 k
 libtevent                x86_64 0.9.34-1.fc25      updates                39 k
 nss                      x86_64 3.34.0-1.0.fc25    updates               851 k
 nss-softokn              x86_64 3.34.0-1.0.fc25    updates               389 k
 nss-softokn-freebl       x86_64 3.34.0-1.0.fc25    updates               231 k
 nss-sysinit              x86_64 3.34.0-1.0.fc25    updates                62 k
 nss-tools                x86_64 3.34.0-1.0.fc25    updates               516 k
 nss-util                 x86_64 3.34.0-1.0.fc25    updates                87 k
 openssl                  x86_64 1:1.0.2m-1.fc25    updates               500 k
 openssl-libs             x86_64 1:1.0.2m-1.fc25    updates               1.2 M
 pcre                     x86_64 8.41-3.fc25        updates               199 k
 pcre-utf16               x86_64 8.41-3.fc25        updates               189 k
 pcre2                    x86_64 10.23-10.fc25      updates               212 k
 perl-libnet              noarch 3.11-1.fc25        updates               119 k
 python                   x86_64 2.7.13-3.fc25      updates                96 k
 python-libs              x86_64 2.7.13-3.fc25      updates               6.2 M
 python2-rpm              x86_64 4.13.0.2-1.fc25    updates               106 k
 python3-rpm              x86_64 4.13.0.2-1.fc25    updates               106 k
 python3-sssdconfig       noarch 1.16.0-3.fc25      updates               107 k
 rpm                      x86_64 4.13.0.2-1.fc25    updates               518 k
 rpm-build-libs           x86_64 4.13.0.2-1.fc25    updates               121 k
 rpm-libs                 x86_64 4.13.0.2-1.fc25    updates               305 k
 rpm-plugin-selinux       x86_64 4.13.0.2-1.fc25    updates                57 k
 rpm-plugin-systemd-inhibit
                          x86_64 4.13.0.2-1.fc25    updates                57 k
 thunderbird              x86_64 52.4.0-2.fc25      updates                75 M
 webkitgtk4               x86_64 2.18.3-1.fc25      updates                13 M
 webkitgtk4-jsc           x86_64 2.18.3-1.fc25      updates               4.4 M
 webkitgtk4-plugin-process-gtk2
                          x86_64 2.18.3-1.fc25      updates               9.7 M
 xen-libs                 x86_64 2001:4.8.2-11.fc25 qubes-vm-r4.0-current 616 k
 xen-licenses             x86_64 2001:4.8.2-11.fc25 qubes-vm-r4.0-current 105 k
 xen-qubes-vm             x86_64 2001:4.8.2-11.fc25 qubes-vm-r4.0-current 197 k

Transaction Summary
================================================================================
Install   2 Packages
Upgrade  41 Packages

Total download size: 168 M
Is this ok [y/N]: y	
Downloading Packages:
...
Sending application list and icons to dom0
Complete!
[user@f25-test2 ~]$ poweroff
[user@dom0 ~]$ qvm-run f25-test gnome-terminal
Running 'gnome-terminal' on f25-test
f25-test: Cannot execute qrexec-daemon!

Here you should see "Domain f25-test is starting" notification, and a few seconds later get a terminal running in the VM. Instead, the vm goes Transient in qvm-ls, and then dies.

To see what it's doing:

[user@dom0 ~]$ qvm-prefs f25-test virt_mode pv
[user@dom0 ~]$ qvm-run f25-test gnome-terminal &
[1] 16093
[user@dom0 ~]$ Running 'gnome-terminal' on f25-test

[user@dom0 ~]$ sudo xl console f25-test
[    0.000000] Linux version 4.9.56-21.pvops.qubes.x86_64 (user@build-fedora4) (gcc version 6.4.1 20170727 (Red Hat 6.4.1-1) (GCC) ) #1 SMP Wed Oct 18 00:22:42 UTC 2017
[    0.000000] Command line: root=/dev/mapper/dmroot ro nomodeset console=hvc0 rd_NO_PLYMOUTH rd.plymouth.enable=0 plymouth.enable=0 nopat
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
...
[  OK  ] Reached target Login Prompts.
[  OK  ] Reached target Timers.
[  OK  ] Reached target Network.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Paths.
[FAILED] Failed to start File System Check on Root Device.
See 'systemctl status systemd-fsck-root.service' for details.
[    2.802804] audit: type=1130 audit(1512633528.188:8): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-fsck-root comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
         Starting Remount Root and Kernel File Systems...
[    2.822486] EXT4-fs (xvda3): warning: mounting fs with errors, running e2fsck is recommended
[    2.824455] EXT4-fs (xvda3): re-mounted. Opts: (null)
[  OK  ] Started Remount Root and Kernel File Systems.
[    2.825599] audit: type=1130 audit(1512633528.211:9): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-remount-fs comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
         Starting Configure read-only root support...
         Starting udev Coldplug all Devices...
         Starting Flush Journal to Persistent Storage...
         Starting Create Static Device Nodes in /dev...
[    2.853110] systemd-journald[196]: Received request to flush runtime journal from PID 1
[  OK  ] Started Configure read-only root support.
[    2.856234] audit: type=1130 audit(1512633528.242:10): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=fedora-readonly comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
         Starting Load/Save Random Seed...
[    2.866985] systemd-journald[196]: File /var/log/journal/a8fa5e9e47ac4088966dff0baca9603b/system.journal corrupted or uncleanly shut down, renaming and replacing.
[  OK  ] Started udev Coldplug all Devices.
[  OK  ] Started Create Static Device Nodes in /dev.
...
[  OK  ] Started Emergency Shell.
[  OK  ] Reached target Emergency Mode.
[  OK  ] Started Restore /run/initramfs on shutdown.
[  OK  ] Started Create Volatile Files and Directories.
         Starting Security Auditing Service...
[  OK  ] Started Early Qubes VM settings.
[  OK  ] Started Security Auditing Service.
         Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Started Update UTMP about System Boot/Shutdown.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Started Update UTMP about System Runlevel Changes.
You are in emergency mode. After logging in, type "journalctl -xb" to view
system logs, "systemctl reboot" to reboot, "systemctl default" or ^D to
try again to boot into default mode.
Press Enter for maintenance
(or press Control-D to continue): 
[root@f25-test ~]# systemctl status systemd-fsck-root.service
● systemd-fsck-root.service - File System Check on Root Device
   Loaded: loaded (/usr/lib/systemd/system/systemd-fsck-root.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2017-12-07 02:58:48 EST; 34s ago
     Docs: man:systemd-fsck-root.service(8)
  Process: 190 ExecStart=/usr/lib/systemd/systemd-fsck (code=exited, status=1/FAILURE)
 Main PID: 190 (code=exited, status=1/FAILURE)

Dec 07 02:58:48 localhost systemd-fsck[190]: /dev/xvda3:         ... (inode #159284, mod time Thu Oct 19 13:10:23 2017)
Dec 07 02:58:48 localhost systemd-fsck[190]: /dev/xvda3:
Dec 07 02:58:48 localhost systemd-fsck[190]: /dev/xvda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
Dec 07 02:58:48 localhost systemd-fsck[190]:         (i.e., without -a or -p options)
Dec 07 02:58:48 localhost systemd-fsck[190]: fsck failed with error code 4.
Dec 07 02:58:48 localhost systemd-fsck[190]: Running request emergency.target/start/replace
Dec 07 02:58:48 localhost systemd[1]: systemd-fsck-root.service: Main process exited, code=exited, status=1/FAILURE
Dec 07 02:58:48 localhost systemd[1]: Failed to start File System Check on Root Device.
Dec 07 02:58:48 localhost systemd[1]: systemd-fsck-root.service: Unit entered failed state.
Dec 07 02:58:48 localhost systemd[1]: systemd-fsck-root.service: Failed with result 'exit-code'.

Pressing ^D at the rescue shell to continue booting lets the VM proceed, and on subsequent boots it works, but unless you know to do this, none of your VMs will boot (no matter how many times you try).

Expected behavior:

The ability to update the default template without causing all VMs based on it to stop booting (essentially bricking a default install).

General notes:

Switching to PV just for a console is undesirable, but I don't know of a convenient way to get a console for HVM domains. Setting qvm-prefs "debug" to True gives me the qemu SDL (or whatever) console, but that doesn't show anything after SeaBIOS. I guess I need to dig around in the stubdom...

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Dec 7, 2017

Member

Are you sure you haven't run out of disk space?

Setting qvm-prefs "debug" to True gives me the qemu SDL (or whatever) console, but that doesn't show anything after SeaBIOS. I guess I need to dig around in the stubdom...

Probably you need to adjust console= kernel parameter. But that shouldn't be needed anyway, because xl console -t pv f25-test works.

Member

marmarek commented Dec 7, 2017

Are you sure you haven't run out of disk space?

Setting qvm-prefs "debug" to True gives me the qemu SDL (or whatever) console, but that doesn't show anything after SeaBIOS. I guess I need to dig around in the stubdom...

Probably you need to adjust console= kernel parameter. But that shouldn't be needed anyway, because xl console -t pv f25-test works.

@jpouellet

This comment has been minimized.

Show comment
Hide comment
@jpouellet

jpouellet Dec 7, 2017

Contributor

Are you sure you haven't run out of disk space?

I don't think so.

[user@dom0 ~]$ sudo lvs | head -3
  LV                                VG         Attr       LSize   Pool   Origin                       Data%  Meta%  Move Log Cpy%Sync Convert
  pool00                            qubes_dom0 twi-aotz-- 453.11g                                     46.80  24.20                           
  root                              qubes_dom0 Vwi-aotz-- 453.11g pool00                              9.36                                   
...

pool00 Data% < 50%, so if that means what I think it means then nope.

xl console -t pv f25-test

thanks :)

Contributor

jpouellet commented Dec 7, 2017

Are you sure you haven't run out of disk space?

I don't think so.

[user@dom0 ~]$ sudo lvs | head -3
  LV                                VG         Attr       LSize   Pool   Origin                       Data%  Meta%  Move Log Cpy%Sync Convert
  pool00                            qubes_dom0 twi-aotz-- 453.11g                                     46.80  24.20                           
  root                              qubes_dom0 Vwi-aotz-- 453.11g pool00                              9.36                                   
...

pool00 Data% < 50%, so if that means what I think it means then nope.

xl console -t pv f25-test

thanks :)

@jpouellet

This comment has been minimized.

Show comment
Hide comment
@jpouellet

jpouellet Dec 8, 2017

Contributor

I'm now more suspicious of failing hardware than of Qubes. Closing to avoid littering qubes-issues, and will re-open if I can eliminate hardware-failure as a possible root cause.

Contributor

jpouellet commented Dec 8, 2017

I'm now more suspicious of failing hardware than of Qubes. Closing to avoid littering qubes-issues, and will re-open if I can eliminate hardware-failure as a possible root cause.

@jpouellet jpouellet closed this Dec 8, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment