Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto-luks: require new `mountPoint` option #1156

Closed
wants to merge 1 commit into from
Closed

Conversation

@andir
Copy link
Member

@andir andir commented May 28, 2019

With the upcoming systemd v242 change (NixOS/nixpkgs#61321) in nixpkgs there will be a breaking change to how auto-luks used to work. We are going to remove a commit from the systemd fork (NixOS/systemd@ce79214) that was added for auto-luks to work. The
auto-luks feature will continue to work but another mount option is required to mark those fileSystems as not local. All mount points that carry the _netdev mount option are not part of the local-fs.target and thus not blocking the start of things like the sshd daemon.

Adding this option is (so far) the least intrusive way to ensure that systems will not break. Every user of the feature will be required to add only one more value to it's configuration.

Trying to automatically detect which mount points are mounted from the unlocked block devices is not really feasible in all/most cases. There might be another layer of indirection in between the actual (luks) device and the mount point. Good examples that will make it hard for us to guess are things like RAID, LVM, ZFS, ….

Dropping the systemd patch removes a bunch of issues and race conditions in regards to creating /run/<service> and /var/lib/<service> directories (see NixOS/nixpkgs#47550, NixOS/nixpkgs#53852, NixOS/nixpkgs#53852). Those paths are usually set up using the systemd-tempfiles.service which is (again) part of sysinit.target.

This is also part of the ongoing effort to remove large a portion of our PreStart scripts that just create directories. In the long run there should be less custom scripting required to get a service with (temporary and persistent) state up and running.

For all the current version of NixOS this should be mostly a No-Op. In a local test against libvirtd there was no observable difference in behavior between 19.03 and the systemd PR.

Ideally we would have a way to warn users that deploy a more recent NixOS that they must use a newer NixOps version. It is still not obvious how to do that in an elegant way. Opinions are very welcome. Not updating systemd or keeping the broken sysinit behaviour isn't really an option in the long run.


This change is Reviewable

@@ -67,6 +67,24 @@ with utils;
luksFormat</command>.
'';
};

mountPoint = mkOption {
default = false;

This comment has been minimized.

@ip1981

ip1981 May 28, 2019
Contributor

The type is null or string.

nix/auto-luks.nix Outdated Show resolved Hide resolved
default = false;
type = types.nullOr types.string;
description = ''
This option is required so the autoLuks module knows where the

This comment has been minimized.

@ip1981

ip1981 May 28, 2019
Contributor

If required, why allow "null"?

This comment has been minimized.

@andir

andir May 28, 2019
Author Member

That was meant as an escape hatch for people that know what they are doing. There might be cases where it is being used without using it for a filesystem.

This comment has been minimized.

@andir

andir May 28, 2019
Author Member

I changed the type signature make it explicit that I want either a str or the string "none". While technically not required I think making the "none" case explicit in the type serves as a good way to document the existence of it.

@edolstra
Copy link
Member

@edolstra edolstra commented May 28, 2019

Maybe we should just remove the auto-LUKS feature. I doubt anybody is using it.

With the upcoming systemd v242 change in nixpkgs there will be a
breaking change to how auto-luks used to work. We are going to remove a
commit from the systemd fork that was added for auto-luks to work. The
auto-luks feature will continue to work but another mount option is
required to mark those `fileSystems` as not local. All mount points that
carry the `_netdev` mount option are not part of the `local-fs.target`
and thus not blocking the start of things like the sshd daemon.

Adding this option is (so far) the least intrusive way to ensure that
systems will not break. Every user of the feature will be required to
add only one more value to it's configuration.

Trying to automatically detect which mount points are mounted from the
unlocked block devices is not really feasible in all/most cases. There
might be another layer of indirection in between the actual (luks)
device and the mount point. Good examples that will make it hard for us
to guess are things like RAID, LVM, ZFS, ….

Dropping the systemd patch removes a bunch of issues and race conditions
in regards to creating `/run/<service>` and `/var/lib/<service>`
directories. Those paths are usually set up using the
`systemd-tempfiles.service` which is (again) part of `sysinit.target`.

This is also part of the ongoing effort to remove large a portion of our
`PreStart` scripts that just create directories. In the long run there
should be less custom scripting required to get a service with
(temporary and persistent) state up and running.
@andir andir force-pushed the andir:autoluks branch from ede9a1e to c7eba12 May 28, 2019
@andir
Copy link
Member Author

@andir andir commented May 28, 2019

I addressed the review comments but In general I agree with @edolstra that we might just want to remove the feature.

It will still have to be somewhat synchronized with the NixOS releases so people are not creating unbootable systems by using a newer NixOS version from a very old NixOps.

@grahamc
Copy link
Member

@grahamc grahamc commented May 28, 2019

On a call with Andi I suggested we break autoluks as soon as possible in 19.03 stable, to try and get users to come out of the woodwork. If nobody comes out, we delete it.

@andir andir mentioned this pull request May 29, 2019
4 of 10 tasks complete
@AmineChikhaoui
Copy link
Member

@AmineChikhaoui AmineChikhaoui commented Jun 3, 2019

@edolstra @grahamc we use auto-luks at work for all our encrypted deployments.

@andir
Copy link
Member Author

@andir andir commented Jun 3, 2019

@AmineChikhaoui Lets say we keep auto-luks but merge this PR. Would the additional attributes be okay / feasible for your deployments?

@AmineChikhaoui
Copy link
Member

@AmineChikhaoui AmineChikhaoui commented Jun 3, 2019

@andir I'll double check and get back. I just wanted to reply that there are systems using this first, will read the details later.

@AmineChikhaoui
Copy link
Member

@AmineChikhaoui AmineChikhaoui commented Jun 4, 2019

Just as an update: we discussed this in irc's #nixos-systemd channel (thanks to @flokli and @andir for the explanations ), the PR makes sense to me.
There is currently an infinite recursion issue when testing this PR, @andir mentioned that he will be looking at it.

@flokli
Copy link
Contributor

@flokli flokli commented Feb 22, 2020

What's the status of this? Does the inifinite recursion still occur?

@andir
Copy link
Member Author

@andir andir commented Feb 22, 2020

As far as I remember I tried to reproduce this and then ran into issues reproducing it. I do not have an AWS account or similar to actually test this properly. On the libvirt backend I wasn't able to reproduce.

@flokli
Copy link
Contributor

@flokli flokli commented Feb 25, 2020

Well, this PR was meant to improve UX for people migrating from 19.03 to 19.09, alerting them for the missing _netdev in the mount options.

With nixpkgs now having branched off to 20.03 (and most people migrated to 19.09 at least), people did run into this, and this PR has probably become less useful.

If nobody can test this, we should probably just drop the idea, instead of just keeping this PR open.

@AmineChikhaoui
Copy link
Member

@AmineChikhaoui AmineChikhaoui commented Feb 26, 2020

I'll try this again this week. Though I don't think it needs an AWS account to reproduce as the infinite recursion is an eval error.

@grahamc
Copy link
Member

@grahamc grahamc commented Mar 26, 2020

Hello!

Thank you for this PR.

In the past several months, some major changes have taken place in
NixOps:

  1. Backends have been removed, preferring a plugin-based architecture.
    Here are some of them:

  2. NixOps Core has been updated to be Python 3 only, and at the
    same time, MyPy type hints have been added and are now strictly
    required during CI.

This is all accumulating in to what I hope will be a NixOps 2.0
release
. There is a tracking issue for that:
#1242 . It is possible that
more core changes will be made to NixOps for this release, with a
focus on simplifying NixOps core and making it easier to use and work
on.

My hope is that by adding types and more thorough automated testing,
it will be easier for contributors to make improvements, and for
contributions like this one to merge in the future.

However, because of the major changes, it has become likely that this
PR cannot merge right now as it is. The backlog of now-unmergable PRs
makes it hard to see which ones are being kept up to date.

If you would like to see this merge, please bring it up to date with
master and reopen it
. If the or mypy type checking fails, please
correct any issues and then reopen it. I will be looking primarily at
open PRs whose tests are all green.

Thank you again for the work you've done here, I am sorry to be
closing it now.

Graham

@grahamc grahamc closed this Mar 26, 2020
@andir
Copy link
Member Author

@andir andir commented Mar 27, 2020

@flokli flokli mentioned this pull request Aug 7, 2020
4 of 10 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants
You can’t perform that action at this time.