Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Secure Remounting #128

Closed
monsieuremre opened this issue Oct 20, 2023 · 59 comments
Closed

Secure Remounting #128

monsieuremre opened this issue Oct 20, 2023 · 59 comments

Comments

@monsieuremre
Copy link
Contributor

Is there any particular reason to have a systemd service remount partitions rather than editing /etc/fstab accordingly? This seems cause extra complexity without any benefit that I could see of.

As a reference, the recommended options from the lynis documentation can be used. I have tested this and no application seems to break that a debian user would normally install, let alone any whonix default packages.

#            ---------------------------------------------------------
#               Mount point              nodev  noexec  nosuid
#               /boot                            v            v            v
#               /dev                              v            v
#               /dev/shm                      v            v            v
#               /home                           v                         v
#               /run                               v                         v
#               /tmp                              v            v            v
#               /var                               v                         v
#               /var/log                         v            v            v
#               /var/log/audit                v            v            v
#               /var/tmp                        v            v            v
#            ---------------------------------------------------------

This can be achieved with a simple bind for those without literal partitions on disk. And for the literal partitions, a basic sed operation can be used on fstab. The old fstab can be backed up to be restored upon package removal.

So what would be the reason to use a service here? If this is not necessary, I can create a pull request for modifying the fstab. If there is a reason that I do not know of, the same recommended options can be adapted by the service, I assume.

@adrelanos
Copy link
Member

Editing /etc/fstab manually as system administrator is fine.

Editing /etc/fstab with a package, as a Linux distribution is quite harder. If we'd ship /etc/fstab in a package that would make user modifications hard. [1] And that's also not an option because...

If the /etc/fstab is already owned by another package (such as qubes-core-agent), then any modifications to it would result in an interactive dpkg conflict resolution dialog (see [1] for example) as soon as that package is updated.

There is no /etc/fstab.d yet:

See here for more details and discussion:
https://forums.whonix.org/t/kernel-hardening-security-misc/7296/24


[1]

Configuration file '/etc/fstab'
 ==> Modified (by you or by a script) since installation.
 ==> Package distributor has shipped an updated version.
   What would you like to do about it ?  Your options are:
    Y or I  : install the package maintainer's version
    N or O  : keep your currently-installed version
      D     : show the differences between the versions
      Z     : start a shell to examine the situation
 The default action is to keep your current version.
*** fstab (Y/I/N/O/D/Z) [default=N] ?

@monsieuremre
Copy link
Contributor Author

I have read the documentation and I think we might have a better workaround. Upon installing kicksecure we already make several assumptions like a user named user being a sudoer and so on. If it is acceptable to make the assumption that there are only real disk partitions for /boot /root and /home and nothing else, especially not for /tmp and /var, I can write a fstab file for remounting and binding everything with hardened options. This 'fstab' does not have to be the real fstab. We can use any arbitrary path and specify it to mount with --fstab. So we write a new fstab in any location we want. Then we run upon booting to the system "mount --fstab /our/path/fstab -a". This will remount and bind everything. I can create a pull with the said implementation. Would this be a viable option?

@adrelanos
Copy link
Member

security-misc is part of Kicksecure but ideally it will stay a clean design and won't turn into requiring Kicksecure.

If it is acceptable to make the assumption that there are only real disk partitions for /boot /root and /home and nothing else, especially not for /tmp and /var,

Please describe some other scenarios.

security-misc, Kicksecure is intended to be as functional as Debian. It is designed to be generic, should installable on arbitrary end user or servers.

I can write a fstab file for remounting and binding everything with hardened options.

Please suggest as this could be interesting but I cannot promise this will be the chosen implementation.

This 'fstab' does not have to be the real fstab. We can use any arbitrary path and specify it to mount with --fstab. So we write a new fstab in any location we want.

Interesting.

Then we run upon booting to the system "mount --fstab /our/path/fstab -a". This will remount and bind everything.

This is the critical point. "Upon booting". How? The systemd unit file approach wasn't viable. Lead to race conditions, weird bugs. If "upon booting" was sorted out the this feature would probably in production years ago already.

Note, that it won't help a lot to wait until booting succeeded and only then start using more secure mount options. For effective protection, the remounting to more secure mount options needs to be done during a time where only more trusted code is running. Once the system booted and less trusted code is running such as when user user already has a terminal emulator window open, in many cases it will be too late already to actually improve the security of the system.

This is because to prevent for example malicious code to be executed from /run, /home, etc. it needs to be done before lets say any malicious code has any chance to execute something in /home.

Therefore switching to more secure mount options is crucial to be done at the right time. The earliest time would be from initramfs after the "normal" mount has been done.

A later but still secure time might be something that current systemd unit is attempting.

Before=sysinit.target
Requires=local-fs.target
After=local-fs.target
After=qubes-sysinit.service

@monsieuremre
Copy link
Contributor Author

I have successfully tested, for example, a solution like this:

Create a hardened fstab:
touch /etc/fstab_hardened

Echo the following content

/                             /               ext4    defaults,remount                      0 1
/home                         /home           ext4    defaults,nosuid,nodev,remount         0 2
/boot                         /boot           ext2    defaults,nosuid,noexec,nodev,remount  0 2
/var                          /var            ext4    defaults,nosuid,nodev,bind            0 2
/dev                          /dev            ext4    defaults,bind,nosuid,noexec           0 2
tmpfs                         /dev/shm        tmpfs   defaults,nodev,nosuid,noexec          0 0
tmpfs                         /tmp            tmpfs   defaults,nodev,nosuid,noexec          0 0
/tmp                          /var/tmp        none    defaults,nodev,nosuid,noexec,bind     0 0
/var/log                      /var/log        ext4    defaults,nosuid,noexec,nodev,bind     0 2
/var/log/audit                /var/log/audit  ext4    defaults,nosuid,noexec,nodev,bind     0 2

When we run mount --fstab /etc/fstab_hardened -a, the magic happens. Much simpler then manually doing them in the service. The assumptions here only to decide if we want to use a remount option or a bind option. We can also avoid making any assumptions by simply checking the partitions on the user system, and adjust the fstab_hardened accordingly. And by adjusting, I mean simply this:

If there is a partition for /var -> use remount for var
If no partition for var -> use bind

And you are right, I imagined we would do this after having booted to the system. This would obviously not be the optimal solution. Optimally we would append these lines to the good old fstab we know with a script, and remove them upon uninstall. I don't know if that would be possible tho. So if it is still more desirable to use a system service, it might as well be a one-liner and use the hardened_fstab as its pseudo config file.

As it seems /run is hardened as is by default on bookworm, we need no antry for it. I also choose to bind /var/tmp to /tmp and mount them with tmpfs. This mounts the tmp system to ram, and would assure no persistance in /tmp and also sanitize the data on reboot. So a /tmp that is not on disk as a partition is definetely more desirable, but we can just use remount with ext4 to /tmp if it is already on disk as a partition.

Would this be of interest?

@adrelanos
Copy link
Member

I don't think this would be compatible if using file systems other than ext4?
What would happen when using less common mount points?

On Qubes in a Debian based App Qube:

mount | grep home

/dev/xvdb on /home type ext4 (rw,nosuid,nodev,relatime,discard)

This is Qubes fstab:
https://github.com/QubesOS/qubes-core-agent-linux/blob/main/filesystem/fstab

Without resolving "how" to start this, it wouldn't be very secure anyhow.

As we speak I am working on a dracut module which would resolve the "how" part.

Could you please review, contribute to https://github.com/Kicksecure/security-misc/blob/master/usr/bin/remount-secure so it handles all the mount points and corner cases?

@adrelanos
Copy link
Member

@monsieuremre
Copy link
Contributor Author

I forked the repo for modifications on remount-secure. I could not manage to fix the pipe broken bug in the old version, which prevented me from the lynis recommended mount options to other directories. I did a somewhat reimplementation. It works, almost completely tested. Can definetely be cleaned and shortened, and more comments can be added where fit.

This checks if the system has seperate partitions for these directories on disk and if that is the case we do a simple bind. Else we remount the partition. If it is already hardened, we don't touch it.

@monsieuremre
Copy link
Contributor Author

I think I might have deleted some of the important options, which may be necessary. To me they seemed not necessary, at least not anymore, with the new implementation, but I might be wrong if they were meant to serve some other purpose that I do not know.

@adrelanos
Copy link
Member

The new dracut module based implementation is very promising. It works except for /home because apparently by the time the hook runs, there is no /home folder yet. Investigating...

It's in git master (and in the developers repository).

I could not manage to fix the pipe broken bug in the old version,

I cannot reproduce this anymore in the current version.

I think I might have deleted some of the important options, which may be necessary. To me they seemed not necessary, at least not anymore, with the new implementation, but I might be wrong if they were meant to serve some other purpose that I do not know.

I don't understand.

@adrelanos
Copy link
Member

Even though using the dracut cleanup hook, I think /home folder does not yet exist because that is in $NEWROOT /sysroot.

@monsieuremre
Copy link
Contributor Author

I don't understand.

I didn't create a pull for my fork. I was talking about how I did the implementation in my fork.

Could you please review, contribute to https://github.com/Kicksecure/security-misc/blob/master/usr/bin/remount-secure so it handles all the mount points and corner cases?

I have a slight different implementation. It certainly works when booted after and it checks and mounts everything that CISOfy recommends. They also recommend to bind /var/tmp to /tmp and to mount these to tmpfs. This was not possible to do with the old implementation where all functions called a remount_secure function with parameters. So the implementation differs there. But it does all these if the partitioning on system is available. So if there is a /tmp or /var/tmp partition, we do not do this but just remount them with hardened options. We also do not touch anything if a directory is already hardened. So all checks are done and the mounting applies accordingly.

But unfortunately I did not try this with the hook. So I do not know if it would work.

I can also port the recommended options for all directories to the old implementation but then we would miss out on tmpfs for tmp and /var/tmp and /tmp binding. And also I think the old implementation runs regardless if the directory/partition was already hardened, so that might also be a problem.

@monsieuremre
Copy link
Contributor Author

Also /run is mounted with the hardening options by default on bookworm. So unless the user explicitly remounted it with weaker options, a check for this might be unnecessary.

Making a noexec on /tmp causes no issues whatsoever in my observation. It is also recommended. It has been a cause for numerous vulnurabilities. So making this opt-in is not optimal.

Noexec on /home breaks applications like steam, pycharm and if installed with the tarball, the tor browser. It also software makes development impossible. And it is also not on the recommended list of CISOfy. So this might be better left alone.

The other CISOfy recommended mounts that are not yet present in the current version also do not break anything. So porting them with either of the imlementations would be a net positive, which I can do.

@adrelanos
Copy link
Member

Pretty sure the new implementation is the way to go because it cleanly, robustly applies mounting options hardening as early as possible? It seems more robust and secure.

The remount-secure script for now is designed in a way that it should work in any situation. Either during dracut (initramfs) or after the system is fully booted.

The latter is very useful for development, debugging.

Due to the dracut NEWROOT issue for /home some special code will be needed. But I would be nice to keep remount-secure functional even outside of dracut.

Configuration is currently possible using kernel parameters:
https://www.kicksecure.com/wiki/Security-misc#Remount_Secure

Which for now are rather simple:

  • remountsecure=1
  • remountnoexec=1

Something more sophisticated would be in theory:

remount.secure_opts=/mount/point1:nosuid,nodev,noexec;/mount/point2:nosuid,nodev

But not sure that's worth it. I think a reasonable development goal is to:

  • handle standard cases, that is Kicksecure in a VM, USB or installed on internal hard drive,
  • not break user custom configurations, avoid broken boot.

If a system administrator is setting up more complex mounts using /etc/fstab then secure mount options would be best handled there and not in security-misc.

I am not sure yet about remounting to tmpfs. Might think about this later or patches welcome. That would be a separate feature request. The current implementation in principle is probably not blocking remouting to tmpfs?

@adrelanos
Copy link
Member

Also /run is mounted with the hardening options by default on bookworm.

Does not have noexec for me by default.

@adrelanos
Copy link
Member

Making a noexec on /tmp causes no issues whatsoever in my observation. It is also recommended. It has been a cause for numerous vulnurabilities. So making this opt-in is not optimal.

It might break a few things such as some installers. Not a reason to not set noexec. In other words, no exec can be set. But something to document.

Noexec on /home breaks applications like steam, pycharm and if installed with the tarball, the tor browser. It also software makes development impossible.

Please add these applications to the wiki. Useful for users to decide who can enable this (in what VMs) and who cannot.

Noexec on /home breaks applications like steam, pycharm and if installed with the tarball, the tor browser. It also software makes development impossible. And it is also not on the recommended list of CISOfy. So this might be better left alone.

Right. I am wondering about the proper defaults here. Perhaps even /home with noexec by default. But only after this is stable (will request some opt-in testing period) and of course only if it is documented how to disable any noexec.

This would be opt-out by default for Whonix because of Tor Browser, unless another solution could be found. But this can be considered later. Potentially in other tickets.

Highest priority in this ticket is fixing remounting /home.

@monsieuremre
Copy link
Contributor Author

What is the reason for using noexec_maybe on every function. except /home, we know no breakage will occur. So for others, noexec should be used directly. Should I create a request?

@monsieuremre
Copy link
Contributor Author

Also why is /dev function commented out? Does it cause breakage or error of some kind?

@adrelanos
Copy link
Member

Also why is /dev function commented out? Does it cause breakage or error of some kind?

In development. I currently broke the boot. Dunno which one is hardened too much that it breaks the boot.

But also the option to disable this on kernel command line is broken too... So one step at a time.

@adrelanos
Copy link
Member

What is the reason for using noexec_maybe on every function. except /home, we know no breakage will occur. So for others, noexec should be used directly.

Code complexity. Too many options. Getting confusing. Also all of this needs to be documented.

What's the point if lets say /tmp is noexec but /home is not? noexec seems to make sense only in a context where it is used everywhere. If the major open door /home doesn't use noexec then it doesn't seem to make sense.

So maybe 3 different modes make sense...?

  • off (no changes)
  • light (nosuid, nodec as much as possible)
  • full (nosuid, nodev, noexec as much as possible)

?

@adrelanos
Copy link
Member

New issue. As soon as I enable the _run function (effectively remounting /sysroot/run) the boot is broken.

[ 10.610177] systemd[1]: Failed to allocate manager object: Read-only file system

I hope I haven't reached another dead-end here implementation wise.

@monsieuremre
Copy link
Contributor Author

A lot of security vulnurabilities are from /tmp and can be mostly prevented by giving noexec to tmp. This applies to log files as well. Because most anyone can create log and tmp files or make programs do it. But creating files on /home is probably more difficult.

So noexec should be used on /tmp /dev/shm /dev/log /dev/log/audit and /dev/tmp for better vulnurability protection. I also created a pull with these commented out completely, for testing. There is also one for /usr which would ensure that apt is the only one who can modify binaries and libraries.

@monsieuremre
Copy link
Contributor Author

New issue. As soon as I enable the _run function (effectively remounting /sysroot/run) the boot is broken.

[ 10.610177] systemd[1]: Failed to allocate manager object: Read-only file system

I hope I haven't reached another dead-end here implementation wise.

Could it be related to this issue? dracutdevs/dracut#481
There are similar issues as well.

@adrelanos
Copy link
Member

Some need to use NEWROOT and some don't.

@adrelanos
Copy link
Member

Doesn't /var/log imply /var/log/audit?

@adrelanos
Copy link
Member

A lot of security vulnurabilities are from /tmp and can be mostly prevented by giving noexec to tmp. This applies to log files as well. Because most anyone can create log and tmp files or make programs do it. But creating files on /home is probably more difficult.

Makes sense.

So we need at least 4 settings...?

  • 0 - off, no changes
  • 1 - nosuid, nodev as much as possible
  • 2 - nosuid, nodev, noexec as much as possible except for /home
  • 3 - nosuid, nodev, noexec as much as possible

Just using numbers for the kernel parameter or other name suggestions?

So noexec should be used on /tmp /dev/shm /dev/log /dev/log/audit and /dev/tmp for better vulnurability protection.

Ok.

There is also one for /usr which would ensure that apt is the only one who can modify binaries and libraries.

For this one please open a separate forum discussion (preferably) or ticket.

@monsieuremre
Copy link
Contributor Author

Doesn't /var/log imply /var/log/audit?

I think so. This is for systems where /var/log and /var/log/audit are located on seperate and dedicated disk partitions. This seems to be a very common practice in servers. I doubt it is as common on desktops since it auditing is not of big importance. So we may choose to pass this one, assuming this package targets desktops and this is probably only common practice on servers.

@monsieuremre
Copy link
Contributor Author

Some need to use NEWROOT and some don't.

Very interesting. They don't seem to follow any pattern that I can see. Surely there is an explanation but not anything I can see.

@monsieuremre
Copy link
Contributor Author

A lot of security vulnurabilities are from /tmp and can be mostly prevented by giving noexec to tmp. This applies to log files as well. Because most anyone can create log and tmp files or make programs do it. But creating files on /home is probably more difficult.

Makes sense.

So we need at least 4 settings...?

* `0` - off, no changes

* `1` - nosuid, nodev as much as possible

* `2` - nosuid, nodev, noexec as much as possible except for `/home`

* `3` - nosuid, nodev, noexec as much as possible

Just using numbers for the kernel parameter or other name suggestions?

So noexec should be used on /tmp /dev/shm /dev/log /dev/log/audit and /dev/tmp for better vulnurability protection.

Ok.

There is also one for /usr which would ensure that apt is the only one who can modify binaries and libraries.

For this one please open a separate forum discussion (preferably) or ticket.

this might be desirable. pretty much everything other than /dev should be mounted with nodev. since no device should be expected anywhere else. this has been used for exploits (my understanding, mostly on /tmp and logging stuff).

noexec is also very effective at avoiding undiscovered exploits. noexecing the home is also desirable but breaking users applications is not something we want. a lot of non-deb third party software install on home.

@adrelanos
Copy link
Member

This works really good for now...

It certainly needs testing and probably unit testing with systemcheck. All the folders needing protection need to be tested with a script if really the options intended are set or not.

lynis isn't 100% reliable either. I just had /var/tmp missing noexec but lynis did not complain. But I was able to create a script in /var/tmp and execute it. That bug I fixed just now.

lynis audit system
[+] File systems
------------------------------------
  - Checking mount points
    - Checking /home mount point                              [ OK ]
    - Checking /tmp mount point                               [ OK ]
    - Checking /var mount point                               [ OK ]
  - Query swap partitions (fstab)                             [ NONE ]
  - Testing swap partitions                                   [ OK ]
  - Checking for old files in /tmp                            [ OK ]
  - Checking /tmp sticky bit                                  [ OK ]
  - Checking /var/tmp sticky bit                              [ OK ]
  - Mount options of /                                        [ NON DEFAULT ]
  - Mount options of /boot                                    [ HARDENED ]
  - Mount options of /dev                                     [ HARDENED ]
  - Mount options of /dev/shm                                 [ HARDENED ]
  - Mount options of /home                                    [ HARDENED ]
  - Mount options of /run                                     [ HARDENED ]
  - Mount options of /tmp                                     [ HARDENED ]
  - Mount options of /var                                     [ HARDENED ]
  - Mount options of /var/log                                 [ HARDENED ]
  - Mount options of /var/tmp                                 [ HARDENED ]
  - Total without nodev:6 noexec:8 nosuid:4 ro or noexec (W^X): 8 of total 33
  - Disable kernel support of some filesystems

I don't know yet what

  • Mount options of / [ NON DEFAULT ]

is about and if that matters.

@monsieuremre
Copy link
Contributor Author

monsieuremre commented Oct 22, 2023

Mount options of / [ NON DEFAULT ]

This is a debian thing. Lynis expects the defaults on root as option. Debian has errors=remount-ro as well. This is for, if there is an error, remount as read-only. I do not know what this is for either. Editing /etc/fstab and replacing errors=remount-ro with defaults makes lynis say OK in green. I don't think it is a big deal. This is very good news by the way.

@adrelanos
Copy link
Member

By looking at lynis source code it seems to be it looks at /etc/fstab looking for / (which isn't there) and wants to see defaults.

findmnt /

TARGET SOURCE    FSTYPE OPTIONS
/      /dev/sda1 ext4   rw,relatime,errors=remount-ro

Unless lynis has a real argument I think this has to be ignored.

@adrelanos
Copy link
Member

These layered mounts aren't pretty but I don't see a real issue or have a solution.

findmnt /var/log
TARGET   SOURCE              FSTYPE OPTIONS
/var/log /dev/sda1[/var/log] ext4   rw,nosuid,nodev,noexec,relatime,errors=remount-ro
/var/log /dev/sda1[/var/log] ext4   rw,nosuid,nodev,relatime,errors=remount-ro
findmnt /var/tmp
TARGET   SOURCE              FSTYPE OPTIONS
/var/tmp /dev/sda1[/var/tmp] ext4   rw,nosuid,nodev,noexec,relatime,errors=remount-ro
/var/tmp /dev/sda1[/var/tmp] ext4   rw,nosuid,nodev,relatime,errors=remount-ro

As long as the mount options are effectively hardened (first line preceeding, mounted with noexec) then it's probably fine.

@monsieuremre
Copy link
Contributor Author

I agree. There is no hardening option for /root anyway. It is just default or non default. And errors=remount-ro doesn't appear to do any harm.

The layered mounts are inevitable if there are no seperate partitions on disk. I do not see any downside to this.

@adrelanos
Copy link
Member

Some failed unmount during shutdown.

[FAILED] Failed unmounting var-log.mount.
[FAILED] Failed unmounting var-tmp.mount.
[FAILED] Failed unmounting var.mount.

[ 1923.023754] (sd-remount)[43306]: Failed to remount '/var/log' read-only: Invalid argument
[ 1923.026285] (sd-umount)[43307]: Failed to unmount /var/log: Invalid argument
[ 1923.029123] (sd-remount)[43308]: Failed to remount '/var/tmp' read-only: Invalid argument
[ 1923.031693] (sd-umount)[43309]: Failed to unmount /var/tmp: Invalid argument

[  OK  ] Stopped target local-fs.target - Local File Systems.
         Unmounting boot.mount - /boot...
         Unmounting home.mount...
         Unmounting run-credentials…dentials/systemd-sysctl.service...
         Unmounting run-credentials…ntials/systemd-sysusers.service...
         Unmounting run-credentials…temd-tmpfiles-setup-dev.service...
         Unmounting run-msgcollector.mount - /run/msgcollector...
         Unmounting tmp.mount...
         Unmounting var-log.mount...
         Unmounting var-tmp.mount...
[  OK  ] Stopped systemd-modules-lo…service - Load Kernel Modules.
[ 1922.556991] audit: type=1131 audit(1698009222.706:410): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=systemd-modules-load comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  OK  ] Unmounted boot.mount - /boot.
[  OK  ] Unmounted home.mount.
[  OK  ] Unmounted run-credentials-…redentials/systemd-sysctl.service.
[  OK  ] Unmounted run-credentials-…dentials/systemd-sysusers.service.
[  OK  ] Unmounted run-credentials-…ystemd-tmpfiles-setup-dev.service.
[  OK  ] Unmounted run-msgcollector.mount - /run/msgcollector.
[  OK  ] Unmounted tmp.mount.
[FAILED] Failed unmounting var-log.mount.
[  OK  ] Stopped target swap.target - Swaps.
[FAILED] Failed unmounting var-tmp.mount.
         Unmounting var.mount...
[FAILED] Failed unmounting var.mount.
[  OK  ] Stopped target local-fs-pr…reparation for Local File Systems.
[  OK  ] Reached target umount.target - Unmount All Filesystems.
[  OK  ] Stopped systemd-tmpfiles-s…reate Static Device Nodes in /dev.
[ 1922.670633] audit: type=1131 audit(1698009222.818:411): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  OK  ] Stopped systemd-sysusers.service - Create System Users.
[ 1922.675135] audit: type=1131 audit(1698009222.822:412): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=systemd-sysusers comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  OK  ] Stopped systemd-remount-fs…ount Root and Kernel File Systems.
[ 1922.678313] audit: type=1131 audit(1698009222.826:413): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=systemd-remount-fs comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  OK  ] Stopped systemd-fsck-root.… File System Check on Root Device.
[  OK  ] Reached target shutdown.target - System Shutdown.
[  OK  ] Reached target final.target - Late Shutdown Services.
[  OK  ] Finished systemd-poweroff.service - System Power Off.
[  OK  ] Reached target poweroff.target - System Power Off.
[ 1922.945104] systemd-shutdown[1]: Syncing filesystems and block devices.
[ 1922.971729] systemd-shutdown[1]: Sending SIGTERM to remaining processes...
[ 1922.975104] systemd-journald[518]: Received SIGTERM from PID 1 (systemd-shutdow).
[ 1922.987141] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[ 1922.989948] systemd-shutdown[1]: Unmounting file systems.
[ 1922.991355] (sd-remount)[43304]: Remounting '/var/log' read-only with options 'errors=remount-ro'.
[ 1923.005198] EXT4-fs (sda1): re-mounted. Quota mode: none.
[ 1923.021128] (sd-umount)[43305]: Unmounting '/var/log'.
[ 1923.022672] (sd-remount)[43306]: Remounting '/var/log' read-only with options 'errors=remount-ro'.
[ 1923.023754] (sd-remount)[43306]: Failed to remount '/var/log' read-only: Invalid argument
[ 1923.025414] (sd-umount)[43307]: Unmounting '/var/log'.
[ 1923.026285] (sd-umount)[43307]: Failed to unmount /var/log: Invalid argument
[ 1923.027942] (sd-remount)[43308]: Remounting '/var/tmp' read-only with options 'errors=remount-ro'.
[ 1923.029123] (sd-remount)[43308]: Failed to remount '/var/tmp' read-only: Invalid argument
[ 1923.030880] (sd-umount)[43309]: Unmounting '/var/tmp'.
[ 1923.031693] (sd-umount)[43309]: Failed to unmount /var/tmp: Invalid argument
[ 1923.033427] (sd-remount)[43310]: Remounting '/var' read-only with options 'errors=remount-ro'.
[ 1923.034464] EXT4-fs (sda1): re-mounted. Quota mode: none.
[ 1923.035832] (sd-umount)[43311]: Unmounting '/var'.
[ 1923.037161] (sd-remount)[43312]: Remounting '/' read-only with options 'errors=remount-ro'.
[ 1923.038104] EXT4-fs (sda1): re-mounted. Quota mode: none.
[ 1923.039631] (sd-umount)[43313]: Unmounting '/var/log'.
[ 1923.041043] (sd-umount)[43314]: Unmounting '/var/tmp'.
[ 1923.042124] systemd-shutdown[1]: All filesystems unmounted.
[ 1923.042963] systemd-shutdown[1]: Deactivating swaps.
[ 1923.043801] systemd-shutdown[1]: All swaps deactivated.
[ 1923.044617] systemd-shutdown[1]: Detaching loop devices.
[ 1923.045952] systemd-shutdown[1]: All loop devices detached.
[ 1923.046654] systemd-shutdown[1]: Stopping MD devices.
[ 1923.047414] systemd-shutdown[1]: All MD devices stopped.
[ 1923.048119] systemd-shutdown[1]: Detaching DM devices.
[ 1923.049798] systemd-shutdown[1]: All DM devices detached.
[ 1923.050531] systemd-shutdown[1]: All filesystems, swaps, loop devices, MD devices and DM devices detached.
[ 1923.053849] systemd-shutdown[1]: Syncing filesystems and block devices.
[ 1923.054574] systemd-shutdown[1]: Powering off.
[ 1923.055475] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 1923.056420] sd 0:0:0:0: [sda] Stopping disk
[ 1923.068889] ACPI: PM: Preparing to enter system sleep state S5
[ 1923.069942] reboot: Power down

@adrelanos
Copy link
Member

@monsieuremre
Copy link
Contributor Author

Debian packages being one thousand years behind the upstream does not seem to help our cause. Anyway.

From the what you posted it seems like systemd-journald is still up when we are trying to unmount /var. It seems to be trying to log to /var whilst we are trying to unmount it. I found a similar thread here: https://unix.stackexchange.com/questions/378678/why-do-i-get-the-error-failed-unmounting-var-during-shutdown

And an issue: systemd/systemd#867

If this really is the problem, then something like the following in journald would solve the issue:

[Unit]
After:umount.target
Conflicts=umount.target

If this is not the issue to begin with, then I would have to compare the order of thing to how they are when we use a regular fstab, to understand the cause.

@adrelanos
Copy link
Member

This issue is only introduced by the new code, remount. It does not happen without that.

From the what you posted it seems like systemd-journald is still up when we are trying to unmount /var.

Why is how it should be or not?

It seems to be trying to log to /var whilst we are trying to unmount it. I found a similar thread here: https://unix.stackexchange.com/questions/378678/why-do-i-get-the-error-failed-unmounting-var-during-shutdown

And an issue: systemd/systemd#867

If this really is the problem, then something like the following in journald would solve the issue:

[Unit]
After:umount.target
Conflicts=umount.target

The shutdown sequence is all up to systemd. systemd-journald writing output to stdout / serial console is important, good, as long as it does not write to devices where are mounted read-only. Which is probably as it is as per upstream defaults?

Messing with that could cause more problems that it solves.

If this is not the issue to begin with, then I would have to compare the order of thing to how they are when we use a regular fstab, to understand the cause.

Yes. Or compare with a newer Debian and systemd version. Check if reproducible there. Because then systemd upstream won't close the bug report for being an older systemd version.

@adrelanos
Copy link
Member

@monsieuremre
Copy link
Contributor Author

I think I found a solution. I am going to test and report the results.

@monsieuremre
Copy link
Contributor Author

These layered mounts aren't pretty but I don't see a real issue or have a solution.

I managed to solve this and created a pull.

Some failed unmount during shutdown.

This I solved too, and created a pull. Both are fully tested.

@adrelanos
Copy link
Member

I am not convinced yet that #133 is a clean way to solve this.

umount with --lazy risks that the filesystem isn't unmounted at ll.

It's simply ignored. Also we don't understand why this is happening. We cannot point at the project, code that is failing.

We'd simply layer a hack on top of this mess to conceal it. There's no need to conceal it. It's not possible to even spot this issue without a (virtual) serial-console which very, very few people are using.

Next proper steps to get closed and hopefully be able to report/fix this at the actual level it is happening:

If this is not the issue to begin with, then I would have to compare the order of thing to how they are when we use a regular fstab, to understand the cause.

Yes. Or compare with a newer Debian and systemd version. Check if reproducible there. Because then systemd upstream won't close the bug report for being an older systemd version.

@monsieuremre
Copy link
Contributor Author

You are absolutely right. This is more like a band-aid solution. For the record, I have done numerous tests and can say the following:
*It does not happen when mounting is done with /etc/fstab.
*It does not happen even if I execute the exact same commands after having booted. That is, if I do the remounting myself, either by executing remount-secure or just manually doing it, the error does not happen on shutdown.
It seems to be a hook problem. My guess is, maybe systemd-journald has to start before the remounting. But then, I don't know how the fstab solution does not yield any errors.

@monsieuremre
Copy link
Contributor Author

Not being quite sure, restarting systemd-journald.service after we mount everything might solve the issue. My guess seems to be the explanation. Journald starting before the mounting is the problem. Will test this.

@monsieuremre
Copy link
Contributor Author

New Issue: Not having the package dracut installed on the system breaks several systemd services loading during boot up even when secure remounting not enabled. Dracut has to be a dependency for the package. On a new debian VM I didn't get any warnings about dracut being needed or being a dependency. The package install perfectly then breaks stuff. When I install dracut, the breakage disappears. So solution is already identified. Dracut needs to be a dependency.

@monsieuremre
Copy link
Contributor Author

And also addressing the original issue: I managed to identify the exact problem. /var needs to be unmounted before /var/tmp and /var/log. Because then no error occurs. You can test this. Just boot into the system. Once you are in, unmount /var manually. During shutdown, no error occurs. Both /var/log and /var/tmp get unmounted successfully. If you inspect the failed logs, you will see that system tries to unmount /var/log and /var/tmp first, and then /var. How can we make sure they get unmounted in this order? That is /var gets unmounted first. Fstab file guarantees the unmountung order mirrors thte mounting order. How can we manually guarantee this?

@monsieuremre
Copy link
Contributor Author

Is it possible for you to write a shutdown hook? I think this can be achieved with dracut. Is it possible?

@monsieuremre
Copy link
Contributor Author

Is it possible for you to write a shutdown hook? I think this can be achieved with dracut. Is it possible?

To unmount the units in the appropriate order, mirroring the mount order.

@adrelanos
Copy link
Member

adrelanos commented Oct 29, 2023 via email

@adrelanos
Copy link
Member

adrelanos commented Oct 29, 2023 via email

@adrelanos
Copy link
Member

adrelanos commented Oct 29, 2023 via email

@monsieuremre
Copy link
Contributor Author

monsieuremre:
New Issue: Not having the package dracut installed on the system breaks several systemd services loading during boot up even when secure remounting not enabled.
How's that possible? Please report separately, perhaps report upstream if applicable.
Dracut has to be a dependency for the package.
I would want to avoid a hard dependency on dracut. Because simply installing dracut can break the boot process. dracutdevs/dracut#2437 Does it suffice to install dracut-core but not dracut? That would be a nice middle ground. Ideally that would satisfy the dependency but not cause other breakage.

By this I mean, dracut does not come preinstalled. And we use a dracut hook. I don't know how it would be possible to not depend on dracut.

@adrelanos
Copy link
Member

adrelanos commented Oct 30, 2023 via email

@LaszloGombos
Copy link

There is interest for initramfs-tools to use dracut - https://salsa.debian.org/debian/dracut/-/merge_requests/27

@monsieuremre
Copy link
Contributor Author

It was the unmounting order as I suspected. I managed to found a fix that works like charm with no drawbacks. Already created a pull request. Review please.

@monsieuremre
Copy link
Contributor Author

I found an even better solution with none of the problems. Pull request pending.

@monsieuremre
Copy link
Contributor Author

Hoping to use the pull request solely for further discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants