Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

launchd: Lower security permissions for daemon, startup on reboot #5698

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jsoo1
Copy link

@jsoo1 jsoo1 commented Nov 30, 2021

This allows the spawning program to be the nix-daemon instead of
/bin/sh. That means that the Full Disk Access permission can be only
for the nix-daemon.

This should significantly decrease the security risk of the Full Disk Access as mentioned in
#4640, though it is not a complete solution.

It also solves the problem that the /nix volume may not be mounted upon reboot, and keeps the darwin-store service from restarting using the launchd analog of oneshot.

This allows the spawning program to be the nix-daemon instead of
/bin/sh.  That means that the Full Disk Access permission can be only
for the nix-daemon.
When a darwin host is rebooted, /nix was not mounted.  Let the daemon
also wait until the store mounting service is finished.
@jsoo1 jsoo1 changed the title launchd: Use KeepAlive.PathState instead of wait4path. launchd: Lower security permissions for daemon, startup on reboot Dec 1, 2021
@abathur
Copy link
Member

abathur commented Dec 1, 2021

FWIW, the FDA perm issue was likely resolved in #5172

@hlolli
Copy link
Member

hlolli commented Dec 1, 2021

I was about to make the exact same PR.

This solved two things:

  1. ulimits were affected when starting the nix-daemon with /bin/sh, this I bypassed in osx 11.x with sudo calls, but didn't realize until last night, that these limitations were a result of /bin/sh
  2. the nix-daemon is entirely unable to get the permissions on Monterey osx 12.x, the upgrade caused these sort of issues to be logged from the nix-daemon's launchdaemon
    '/usr/local/lib/libsodium.23.dylib' (file system sandbox blocked open())

I'll confirm that this change is needed for the new osx upgrade, but I don't know if removing wait4path is good, or if there's a better alternative. (ex. I use external SSD for /nix, I guess I'll need to research what happens when I boot without it connected).

<key>RunAtLoad</key>
<true/>
<dict>
<key>OtherJobEnabled</key>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clever, this effectively prevents the deamon from starting without nix store existing (replacing the need for wait4path?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the idea. I think there may be a better way, as man launchd.plist mentions that OtherJobsEnabled is to be avoided. There is a RunAtLoad replacement called WatchPaths which I would like to explore just a little more before this should be considered done.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't think PathState or WatchPaths really work here. It looks like launchd wants to eagerly check the path to the daemon exists when they are used (even without RunAtLoad). Could be the way I am using it, of course.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also the LaunchEvents key which might work for working on volume mount but the docs are very sparse for it, so I didn't try.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still work if you don’t have a /nix/store volume? Say you’re still on macOS X?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggests that the mount command run by org.nixos.darwin-store perhaps exited prior to the actual filesystem being available. That seems rather surprising.

Is there anything specific to suggest that is more likely than the PathState trigger having some latency? Do we know if the mechanism is just polling? The manpage does suggest it's both race-prone and lossy. I'm not certain what lossy means here, but my first guess would be that it might miss filesystem conditions that don't persist for longer than some polling interval?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything specific to suggest that is more likely than the PathState trigger having some latency?

Perhaps I misunderstood, but it looks like in your second log that it runs diskutil, that exits, it unloads the job, and then spawns nix-daemon. And in the first log the nix-daemon spawn fails because /nix/var/nix/profiles/default/bin/nix-daemon isn't accessible yet. Latency wouldn't cause the path to become inaccessible. So my impression was that diskutil must have exited prior to the filesystem actually being accessible and therefore launchd tried to launch nix-daemon too soon. Though that doesn't answer the question of why the nix-daemon job would have launched at all given that PathState means it shouldn't launch until the path is accessible.

Though looking at the second log now, there's a delay of several seconds in between diskutil exiting and nix-daemon being launched. So it's clearly waiting for something. PathState having latency would explain that delay (which could also just be launchd prioritizing other work prior to responding to PathState), but doesn't explain why the first log failed.

Do you have a log of a failure that includes the org.nixos.darwin-store lines?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lilyball sure. I need a little bit to go setup a vm for myself

Copy link
Author

@jsoo1 jsoo1 Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On commit 5d959b33c5a75f3053280a8a34c711f964437766 without filevault enabled.

sh <(curl -L https://jsoo1-nix-install-tests.cachix.org/serve/v89k06b82dkhalcdkdhnfbmrfr6fp1w9/install) --tarball-url-prefix https://jsoo1-nix-install-tests.cachix.org/serve
The first install goes well:
% grep nixos /var/log/com.apple.xpc.launchd/launchd.log
2021-12-03 14:43:25.520241 (system/org.nixos.darwin-store) <Notice>: internal event: WILL_SPAWN, code = 0
2021-12-03 14:43:25.520245 (system/org.nixos.darwin-store) <Notice>: service state: spawn scheduled
2021-12-03 14:43:25.520246 (system/org.nixos.darwin-store) <Notice>: service state: spawning
2021-12-03 14:43:25.520307 (system/org.nixos.darwin-store) <Notice>: launching: speculative
2021-12-03 14:43:25.521383 (system/org.nixos.darwin-store [740]) <Notice>: xpcproxy spawned with pid 740
2021-12-03 14:43:25.521399 (system/org.nixos.darwin-store [740]) <Notice>: internal event: SPAWNED, code = 0
2021-12-03 14:43:25.521401 (system/org.nixos.darwin-store [740]) <Notice>: service state: xpcproxy
2021-12-03 14:43:25.521468 (system) <Notice>: Bootstrap by launchctl[739] for /Library/LaunchDaemons/org.nixos.darwin-store.plist succeeded (0: )
2021-12-03 14:43:25.521506 (system/org.nixos.darwin-store [740]) <Notice>: internal event: SOURCE_ATTACH, code = 0
2021-12-03 14:43:25.526997 (system/org.nixos.darwin-store [740]) <Notice>: service state: running
2021-12-03 14:43:25.527000 (system/org.nixos.darwin-store [740]) <Notice>: internal event: INIT, code = 0
2021-12-03 14:43:25.527005 (system/org.nixos.darwin-store [740]) <Notice>: Successfully spawned diskutil[740] because speculative
2021-12-03 14:43:25.591782 (system/org.nixos.darwin-store [740]) <Notice>: signaled service: Terminated: 15
2021-12-03 14:43:25.591792 (system/org.nixos.darwin-store [740]) <Notice>: service state: SIGTERMed
2021-12-03 14:43:25.591794 (system/org.nixos.darwin-store [740]) <Notice>: scheduling cleanup in 5 sec after sending Terminated: 15
2021-12-03 14:43:25.592549 (system/org.nixos.darwin-store [740]) <Notice>: service exited: dirty = 0, supported pressured-exit = 0
2021-12-03 14:43:25.592551 (system/org.nixos.darwin-store [740]) <Notice>: exited due to SIGTERM | sent by launchd[1]
2021-12-03 14:43:25.592553 (system/org.nixos.darwin-store [740]) <Notice>: service state: exited
2021-12-03 14:43:25.592556 (system/org.nixos.darwin-store [740]) <Notice>: internal event: EXITED, code = 0
2021-12-03 14:43:25.592558 (system) <Notice>: service inactive: org.nixos.darwin-store
2021-12-03 14:43:25.592560 (system/org.nixos.darwin-store [740]) <Notice>: service state: not running
2021-12-03 14:43:25.592582 (system/org.nixos.darwin-store) <Notice>: Service only ran for 0 seconds. Pushing respawn out by 10 seconds.
2021-12-03 14:43:25.592586 (system/org.nixos.darwin-store) <Notice>: internal event: WILL_SPAWN, code = 0
2021-12-03 14:43:25.592587 (system/org.nixos.darwin-store) <Notice>: service state: spawn scheduled
2021-12-03 14:43:25.592589 (system/org.nixos.darwin-store) <Notice>: service throttled by 10 seconds
2021-12-03 14:43:35.598100 (system/org.nixos.darwin-store) <Notice>: service state: spawning
2021-12-03 14:43:35.598154 (system/org.nixos.darwin-store) <Notice>: launching: non-ipc demand
2021-12-03 14:43:35.598730 (system/org.nixos.darwin-store [746]) <Notice>: xpcproxy spawned with pid 746
2021-12-03 14:43:35.598746 (system/org.nixos.darwin-store [746]) <Notice>: internal event: SPAWNED, code = 0
2021-12-03 14:43:35.598748 (system/org.nixos.darwin-store [746]) <Notice>: service state: xpcproxy
2021-12-03 14:43:35.598750 (system/org.nixos.darwin-store [746]) <Notice>: deferred event: domain spawn response: 0
2021-12-03 14:43:35.598754 (system/org.nixos.darwin-store [746]) <Notice>: internal event: SOURCE_ATTACH, code = 0
2021-12-03 14:43:35.601500 (system/org.nixos.darwin-store [746]) <Notice>: service state: running
2021-12-03 14:43:35.601508 (system/org.nixos.darwin-store [746]) <Notice>: internal event: INIT, code = 0
2021-12-03 14:43:35.601513 (system/org.nixos.darwin-store [746]) <Notice>: Successfully spawned diskutil[746] because non-ipc demand
2021-12-03 14:43:35.647161 (system/org.nixos.darwin-store [746]) <Notice>: job state = running
2021-12-03 14:43:35.873600 (system/org.nixos.darwin-store [746]) <Notice>: service exited: dirty = 0, supported pressured-exit = 0
2021-12-03 14:43:35.873609 (system/org.nixos.darwin-store [746]) <Notice>: exited due to exit(0)
2021-12-03 14:43:35.873612 (system/org.nixos.darwin-store [746]) <Notice>: service state: exited
2021-12-03 14:43:35.873614 (system/org.nixos.darwin-store [746]) <Notice>: internal event: EXITED, code = 0
2021-12-03 14:43:35.873616 (system/org.nixos.darwin-store [746]) <Notice>: job state = exited
2021-12-03 14:43:35.873630 (system) <Notice>: service inactive: org.nixos.darwin-store
2021-12-03 14:43:35.873632 (system/org.nixos.darwin-store [746]) <Notice>: service state: not running
2021-12-03 14:43:35.874726 (system/org.nixos.darwin-store) <Notice>: job is not monitored, can't poll
2021-12-03 14:44:53.708894 (system/org.nixos.nix-daemon) <Notice>: internal event: WILL_SPAWN, code = 0
2021-12-03 14:44:53.708898 (system/org.nixos.nix-daemon) <Notice>: service state: spawn scheduled
2021-12-03 14:44:53.708900 (system/org.nixos.nix-daemon) <Notice>: service state: spawning
2021-12-03 14:44:53.708965 (system/org.nixos.nix-daemon) <Notice>: launching: speculative
2021-12-03 14:44:53.710107 (system/org.nixos.nix-daemon [2512]) <Notice>: xpcproxy spawned with pid 2512
2021-12-03 14:44:53.710124 (system/org.nixos.nix-daemon [2512]) <Notice>: internal event: SPAWNED, code = 0
2021-12-03 14:44:53.710127 (system/org.nixos.nix-daemon [2512]) <Notice>: service state: xpcproxy
2021-12-03 14:44:53.710189 (system) <Notice>: Bootstrap by launchctl[2511] for /Library/LaunchDaemons/org.nixos.nix-daemon.plist succeeded (0: )
2021-12-03 14:44:53.710249 (system/org.nixos.nix-daemon [2512]) <Notice>: internal event: SOURCE_ATTACH, code = 0
2021-12-03 14:44:53.719456 (system/org.nixos.nix-daemon [2512]) <Notice>: service state: running
2021-12-03 14:44:53.719460 (system/org.nixos.nix-daemon [2512]) <Notice>: internal event: INIT, code = 0
2021-12-03 14:44:53.719465 (system/org.nixos.nix-daemon [2512]) <Notice>: Successfully spawned nix-daemon[2512] because speculative
2021-12-03 14:44:53.821718 (system/org.nixos.nix-daemon [2512]) <Notice>: signaled service: Terminated: 15
2021-12-03 14:44:53.821730 (system/org.nixos.nix-daemon [2512]) <Notice>: service state: SIGTERMed
2021-12-03 14:44:53.821732 (system/org.nixos.nix-daemon [2512]) <Notice>: scheduling cleanup in 5 sec after sending Terminated: 15
2021-12-03 14:44:53.821902 (system/org.nixos.nix-daemon [2512]) <Notice>: service exited: dirty = 0, supported pressured-exit = 0
2021-12-03 14:44:53.821904 (system/org.nixos.nix-daemon [2512]) <Notice>: exited due to SIGTERM | sent by launchd[1]
2021-12-03 14:44:53.821906 (system/org.nixos.nix-daemon [2512]) <Notice>: service state: exited
2021-12-03 14:44:53.821909 (system/org.nixos.nix-daemon [2512]) <Notice>: internal event: EXITED, code = 0
2021-12-03 14:44:53.821911 (system) <Notice>: service inactive: org.nixos.nix-daemon
2021-12-03 14:44:53.821921 (system/org.nixos.nix-daemon [2512]) <Notice>: service state: not running
2021-12-03 14:44:53.821940 (system/org.nixos.nix-daemon) <Notice>: Service only ran for 0 seconds. Pushing respawn out by 10 seconds.
2021-12-03 14:44:53.821944 (system/org.nixos.nix-daemon) <Notice>: internal event: WILL_SPAWN, code = 0
2021-12-03 14:44:53.821946 (system/org.nixos.nix-daemon) <Notice>: service state: spawn scheduled
2021-12-03 14:44:53.821948 (system/org.nixos.nix-daemon) <Notice>: service throttled by 10 seconds
2021-12-03 14:44:53.821964 (system/org.nixos.nix-daemon) <Notice>: launch already in progress
2021-12-03 14:45:03.827328 (system/org.nixos.nix-daemon) <Notice>: service state: spawning
2021-12-03 14:45:03.827362 (system/org.nixos.nix-daemon) <Notice>: launching: xpc event
2021-12-03 14:45:03.827998 (system/org.nixos.nix-daemon [2518]) <Notice>: xpcproxy spawned with pid 2518
2021-12-03 14:45:03.828014 (system/org.nixos.nix-daemon [2518]) <Notice>: internal event: SPAWNED, code = 0
2021-12-03 14:45:03.828016 (system/org.nixos.nix-daemon [2518]) <Notice>: service state: xpcproxy
2021-12-03 14:45:03.828018 (system/org.nixos.nix-daemon [2518]) <Notice>: deferred event: domain spawn response: 0
2021-12-03 14:45:03.828022 (system/org.nixos.nix-daemon [2518]) <Notice>: internal event: SOURCE_ATTACH, code = 0
2021-12-03 14:45:03.832118 (system/org.nixos.nix-daemon [2518]) <Notice>: service state: running
2021-12-03 14:45:03.832126 (system/org.nixos.nix-daemon [2518]) <Notice>: internal event: INIT, code = 0
2021-12-03 14:45:03.832130 (system/org.nixos.nix-daemon [2518]) <Notice>: Successfully spawned nix-daemon[2518] because xpc event
After a reboot, though:
% grep nixos /var/log/com.apple.xpc.launchd/launchd.log
2021-12-03 14:56:43.145942 (system) <Notice>: pending spawn, domain in on-demand-only mode: org.nixos.darwin-store
2021-12-03 14:56:43.147175 (system) <Notice>: pending spawn, domain in on-demand-only mode: org.nixos.nix-daemon
2021-12-03 14:56:43.167185 (system/org.nixos.darwin-store) <Notice>: internal event: WILL_SPAWN, code = 0
2021-12-03 14:56:43.167188 (system/org.nixos.darwin-store) <Notice>: service state: spawn scheduled
2021-12-03 14:56:43.167190 (system/org.nixos.darwin-store) <Notice>: service state: spawning
2021-12-03 14:56:43.167230 (system/org.nixos.darwin-store) <Notice>: launching: speculative
2021-12-03 14:56:43.168707 (system/org.nixos.darwin-store [70]) <Notice>: xpcproxy spawned with pid 70
2021-12-03 14:56:43.168719 (system/org.nixos.darwin-store [70]) <Notice>: internal event: SPAWNED, code = 0
2021-12-03 14:56:43.168721 (system/org.nixos.darwin-store [70]) <Notice>: service state: xpcproxy
2021-12-03 14:56:43.200394 (system/org.nixos.nix-daemon) <Notice>: internal event: WILL_SPAWN, code = 0
2021-12-03 14:56:43.200397 (system/org.nixos.nix-daemon) <Notice>: service state: spawn scheduled
2021-12-03 14:56:43.200399 (system/org.nixos.nix-daemon) <Notice>: service state: spawning
2021-12-03 14:56:43.200421 (system/org.nixos.nix-daemon) <Notice>: launching: speculative
2021-12-03 14:56:43.201162 (system/org.nixos.nix-daemon [99]) <Notice>: xpcproxy spawned with pid 99
2021-12-03 14:56:43.201173 (system/org.nixos.nix-daemon [99]) <Notice>: internal event: SPAWNED, code = 0
2021-12-03 14:56:43.201175 (system/org.nixos.nix-daemon [99]) <Notice>: service state: xpcproxy
2021-12-03 14:56:43.233166 (system/org.nixos.darwin-store [70]) <Notice>: internal event: SOURCE_ATTACH, code = 0
2021-12-03 14:56:43.233228 (system/org.nixos.nix-daemon [99]) <Notice>: internal event: SOURCE_ATTACH, code = 0
2021-12-03 14:56:43.495702 (system/org.nixos.nix-daemon [99]) <Warning>: Could not find and/or execute program specified by service: 2: No such file or directory: /nix/var/nix/profiles/default/bin/nix-daemon
2021-12-03 14:56:43.495706 (system/org.nixos.nix-daemon [99]) <Error>: Service could not initialize: posix_spawn(/nix/var/nix/profiles/default/bin/nix-daemon) not accessible error: 0x6f: Invalid or missing Program/ProgramArguments
2021-12-03 14:56:43.495709 (system/org.nixos.nix-daemon [99]) <Error>: initialization failure: 21A559: xpcproxy + 23780 [840][F29643C9-8E6C-3632-93A1-5214FFD1DC57]: 0x6f
2021-12-03 14:56:43.495711 (system/org.nixos.nix-daemon [99]) <Notice>: Service setup event to handle failure and will not launch until it fires.
2021-12-03 14:56:43.495713 (system/org.nixos.nix-daemon [99]) <Error>: Missing executable detected. Job: 'org.nixos.nix-daemon' Executable: '/nix/var/nix/profiles/default/bin/nix-daemon'
2021-12-03 14:56:43.495715 (system/org.nixos.nix-daemon [99]) <Notice>: internal event: INIT, code = 111
2021-12-03 14:56:43.502612 (system/org.nixos.nix-daemon [99]) <Notice>: trampoline exited with code: 78
2021-12-03 14:56:43.502617 (system/org.nixos.nix-daemon [99]) <Notice>: service exited: dirty = 0, supported pressured-exit = 0
2021-12-03 14:56:43.502619 (system/org.nixos.nix-daemon [99]) <Notice>: exited due to exit(78)
2021-12-03 14:56:43.502621 (system/org.nixos.nix-daemon [99]) <Notice>: already handled failed init, ignoring
2021-12-03 14:56:43.502623 (system/org.nixos.nix-daemon [99]) <Notice>: service state: exited
2021-12-03 14:56:43.502625 (system/org.nixos.nix-daemon [99]) <Notice>: internal event: EXITED, code = 0
2021-12-03 14:56:43.502627 (system) <Notice>: service inactive: org.nixos.nix-daemon
2021-12-03 14:56:43.502641 (system/org.nixos.nix-daemon [99]) <Notice>: service state: not running
2021-12-03 14:56:43.502643 (system/org.nixos.nix-daemon) <Notice>: internal event: WILL_SPAWN, code = 0
2021-12-03 14:56:43.502647 (system/org.nixos.nix-daemon) <Notice>: service state: spawn scheduled
2021-12-03 14:56:43.972199 (system/org.nixos.darwin-store [70]) <Notice>: service state: running
2021-12-03 14:56:43.972203 (system/org.nixos.darwin-store [70]) <Notice>: internal event: INIT, code = 0
2021-12-03 14:56:43.972206 (system/org.nixos.darwin-store [70]) <Notice>: Successfully spawned diskutil[70] because speculative
2021-12-03 14:56:52.950907 (system/org.nixos.darwin-store [70]) <Notice>: job state = running
2021-12-03 14:56:53.302459 (system/org.nixos.darwin-store [70]) <Notice>: service exited: dirty = 0, supported pressured-exit = 0
2021-12-03 14:56:53.302461 (system/org.nixos.darwin-store [70]) <Notice>: exited due to exit(0)
2021-12-03 14:56:53.302463 (system/org.nixos.darwin-store [70]) <Notice>: service state: exited
2021-12-03 14:56:53.302465 (system/org.nixos.darwin-store [70]) <Notice>: internal event: EXITED, code = 0
2021-12-03 14:56:53.302467 (system/org.nixos.darwin-store [70]) <Notice>: job state = exited
2021-12-03 14:56:53.302481 (system) <Notice>: service inactive: org.nixos.darwin-store
2021-12-03 14:56:53.302483 (system/org.nixos.darwin-store [70]) <Notice>: service state: not running
Note that that commit has `RunAtLoad = true`. If I `sudo launchctl bootout system /Library/LaunchDaemons/org.nixos.nix-daemon.plist` then change `RunAtLoad = false` and `sudo launchctl bootstram system /Library/LaunchDaemons/org.nixos.nix-daemon.plist` and reboot:
No RunAtLoad
2021-12-03 15:07:52.156447 (system) <Notice>: pending spawn, domain in on-demand-only mode: org.nixos.darwin-store
2021-12-03 15:07:52.177604 (system/org.nixos.darwin-store) <Notice>: internal event: WILL_SPAWN, code = 0
2021-12-03 15:07:52.177606 (system/org.nixos.darwin-store) <Notice>: service state: spawn scheduled
2021-12-03 15:07:52.177608 (system/org.nixos.darwin-store) <Notice>: service state: spawning
2021-12-03 15:07:52.177633 (system/org.nixos.darwin-store) <Notice>: launching: speculative
2021-12-03 15:07:52.178811 (system/org.nixos.darwin-store [70]) <Notice>: xpcproxy spawned with pid 70
2021-12-03 15:07:52.178823 (system/org.nixos.darwin-store [70]) <Notice>: internal event: SPAWNED, code = 0
2021-12-03 15:07:52.178825 (system/org.nixos.darwin-store [70]) <Notice>: service state: xpcproxy
2021-12-03 15:07:52.241388 (system/org.nixos.darwin-store [70]) <Notice>: internal event: SOURCE_ATTACH, code = 0
2021-12-03 15:07:52.605618 (system/org.nixos.darwin-store [70]) <Notice>: service state: running
2021-12-03 15:07:52.605639 (system/org.nixos.darwin-store [70]) <Notice>: internal event: INIT, code = 0
2021-12-03 15:07:52.605643 (system/org.nixos.darwin-store [70]) <Notice>: Successfully spawned diskutil[70] because speculative
2021-12-03 15:08:01.843502 (system/org.nixos.darwin-store [70]) <Notice>: job state = running
2021-12-03 15:08:01.988518 (system/org.nixos.nix-daemon) <Notice>: internal event: WILL_SPAWN, code = 0
2021-12-03 15:08:01.988533 (system/org.nixos.nix-daemon) <Notice>: service state: spawn scheduled
2021-12-03 15:08:01.988535 (system/org.nixos.nix-daemon) <Notice>: service state: spawning
2021-12-03 15:08:01.988778 (system/org.nixos.nix-daemon) <Notice>: launching: xpc event
2021-12-03 15:08:01.990454 (system/org.nixos.nix-daemon [233]) <Notice>: xpcproxy spawned with pid 233
2021-12-03 15:08:01.990467 (system/org.nixos.nix-daemon [233]) <Notice>: internal event: SPAWNED, code = 0
2021-12-03 15:08:01.990470 (system/org.nixos.nix-daemon [233]) <Notice>: service state: xpcproxy
2021-12-03 15:08:01.990514 (system/org.nixos.nix-daemon [233]) <Notice>: internal event: SOURCE_ATTACH, code = 0
2021-12-03 15:08:02.010866 (system/org.nixos.nix-daemon [233]) <Notice>: service state: running
2021-12-03 15:08:02.010892 (system/org.nixos.nix-daemon [233]) <Notice>: internal event: INIT, code = 0
2021-12-03 15:08:02.010896 (system/org.nixos.nix-daemon [233]) <Notice>: Successfully spawned nix-daemon[233] because xpc event
2021-12-03 15:08:02.057506 (system/org.nixos.darwin-store [70]) <Notice>: service exited: dirty = 0, supported pressured-exit = 0
2021-12-03 15:08:02.057508 (system/org.nixos.darwin-store [70]) <Notice>: exited due to exit(0)
2021-12-03 15:08:02.057510 (system/org.nixos.darwin-store [70]) <Notice>: service state: exited
2021-12-03 15:08:02.057512 (system/org.nixos.darwin-store [70]) <Notice>: internal event: EXITED, code = 0
2021-12-03 15:08:02.057514 (system/org.nixos.darwin-store [70]) <Notice>: job state = exited
2021-12-03 15:08:02.057526 (system) <Notice>: service inactive: org.nixos.darwin-store
2021-12-03 15:08:02.057528 (system/org.nixos.darwin-store [70]) <Notice>: service state: not running

I thought I had tested that configuration and experienced a failure, but I may not have. Seems like PathState for the nix-daemon may be all that is required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed the change reverting OtherJobsEnabled.

@jsoo1
Copy link
Author

jsoo1 commented Dec 1, 2021

FWIW, the FDA perm issue was likely resolved in #5172

Oh nice! Maybe the installer I was using didn't have this step, though GlobalPermissionsEnable was true for the volume it had created.

@abathur
Copy link
Member

abathur commented Dec 1, 2021

I imagine this will be one of those that needs a bit of testing. If you're willing/able to set up a cachix cache with the right name (see #4577 (comment)) successful CI runs on your branch will create an installer others can try out.

@abathur
Copy link
Member

abathur commented Dec 1, 2021

Mentioning a few people who might have thoughts: @andersk @callahad @matthewbauer @LnL7 @lilyball

@jsoo1
Copy link
Author

jsoo1 commented Dec 1, 2021

I imagine this will be one of those that needs a bit of testing. If you're willing/able to set up a cachix cache with the right name (see #4577 (comment)) successful CI runs on your branch will create an installer others can try out.

I agree. I will try. I am not a cachix user, currently, so it might take a little while for me to setup...

@abathur
Copy link
Member

abathur commented Dec 1, 2021

FWIW, the FDA perm issue was likely resolved in #5172

Oh nice! Maybe the installer I was using didn't have this step, though GlobalPermissionsEnable was true for the volume it had created.

It's possible it didn't, or only did in some circumstances. I have not actually hit that issue myself, so my perspective is all secondhand through reading reports and trying to help a few people troubleshoot it. :)

@jsoo1
Copy link
Author

jsoo1 commented Dec 1, 2021

I imagine this will be one of those that needs a bit of testing. If you're willing/able to set up a cachix cache with the right name (see #4577 (comment)) successful CI runs on your branch will create an installer others can try out.

@abathur I setup a cachix cache jsoo1-nix-install-tests and put a CACHIX_AUTH_TOKEN secret in my fork's secrets. Is that all that I needed to do?

@abathur
Copy link
Member

abathur commented Dec 1, 2021

I think you may also need to enable workflows in your fork. I don't really recall, but maybe there's a link or button to do so at https://github.com/jsoo1/nix/actions?

@jsoo1
Copy link
Author

jsoo1 commented Dec 1, 2021

I think you may also need to enable workflows in your fork. I don't really recall, but maybe there's a link or button to do so at https://github.com/jsoo1/nix/actions?

Ok, done!

@jsoo1
Copy link
Author

jsoo1 commented Dec 1, 2021

I am running the action on my fork so a cache can be available. Can anyone say what is going on here? https://github.com/jsoo1/nix/runs/4384590357

(cachix push failed?)

@abathur
Copy link
Member

abathur commented Dec 1, 2021

Try re-running it. It may take a few rolls...

I should've mentioned earlier that it might fail. It looks like the failure is because the previous step timed out. There's some sort of flaky execution bug on darwin that can cause some hangs and/or aborts on an EOF error.

It is now RunOnlyOnce, so it is automatically unloaded after running.
It should also be forcefully loaded by launchctl bootstrap...
Comment on lines 308 to 315
<key>KeepAlive</key>
<dict>
<key>PathState</key>
<dict>
<key>$NIX_ROOT/store</key>
<false/>
</dict>
</dict>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part concerns me. What does it mean to try and "keep alive" the mount command? It runs, and then exits. And since the job is now marked LaunchOnlyOnce launchd won't relaunch it. My best guess as to what this does is it says "don't run the mounter if the store is already mounted", but I'm not sure what the goal there is. If the problem is that trying to mount it when it's already mounted fails, we should update the mount command instead to be idempotent.

If we didn't mark this as LaunchOnlyOnce then I could see this as attempting to remount automatically if the nix store unmounts, though I suspect that would get rather annoying. I am a bit concerned that this new job configuration means I can't use launchctl kickstart to request the volume be remounted. I assume the use of LaunchOnlyOnce is meant to avoid having it immediately remount the volume if the user unmounts it (or to try and mount in a loop if the mount command fails).

Copy link
Author

@jsoo1 jsoo1 Dec 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty confused about that, also. It seems like PathState should be more of a run condition than a KeepAlive condition, don't you think? Using LaunchOnlyOnce actually unloads the service when it is done, which is the hacky way that the nix-daemon configuration works with OtherJobEnabled. The docs are quite clear that OtherJobEnabled is not actually about enabled jobs, but about loaded ones. The idea is that the nix-daemon starts after the darwin-store is unloaded.

On the plus side, that should mean that you should be able to launchctl bootstrap system /Library/LaunchDaemons/org.nixos.darwin-store.plist to your hearts content to retry mounting the store volume.

(Edit) As to a run condition vs a keep alive condition, WatchPaths exists but for some reason did not do what I hoped. Perhaps that avenue could be explored a bit more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed the fact that OtherJobEnabled was set to false. I am curious if launchd guarantees that it loads all of the launchdaemons before evaluating any of the OtherJobEnabled conditions, because if not then it could plausibly load the daemon first, and decide it should be launched before loading the store mounter.

Also the lack of documentation on LaunchOnlyOnce about it actually unloading the job makes me nervous about this approach as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I would really prefer just PathState. If anyone else out there can help test, it would help a lot.

Copy link
Member

@abathur abathur Dec 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another concern here will be that, regardless of the other ergonomics of the method, the volume doesn't just have to be mounted before the daemon starts--it also needs to be mounted before macOS tries to load or run anything that might be stored on it (files open in a restoring editor app, the shell for restoring terminal tabs/windows).

This isn't a problem for unencrypted systems, but there is a race-condition when FileVault is enabled. Any mechanism change will need to get tested against this case. We had a fairly straightforward test before, which was to enable FileVault, install Nix, open a file from the store volume in TextEdit, restart, and see if the document reopens.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. Makes sense. I will go try out the encrypted store branch and see what happens.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I recall, we never got that "it must mount before GUI" to actually be foolproof. The problem being that fstab wouldn't mount the volume before the GUI, and the LaunchDaemon that did this couldn't actually block the GUI, it would just cause the mount attempt to happen sooner and hopefully win. My recollection is that it mostly worked but sometimes failed, and that's why I actually still use an unencrypted Nix volume on my work laptop (my newer M1 laptop is using FileVault, but I've only restarted a handful of times and without a Nix-installed app running so there hasn't been much chance for the race to fail there).

Or is there something I'm missing about the current setup that actually makes it work?

Personally, it seems to me that we should be able to use autofs to have it mount automatically when /nix is accessed. There's an old technical paper on autofs that indicates that /etc/fstab is actually used by autofs so I'm not sure why the mountpoints defined there aren't mounted automatically upon access. If it did mount automatically upon access then we wouldn't expect to have an issue, because we could just attempt to launch nix-daemon and it would mount the volume for us. So the question is, why doesn't this work? And is there anything we haven't considered that would actually make it have the "mount automatically on access, blocking the operation until the mount completes" behavior? Because that would solve everything.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a problem for unencrypted systems, but there is a race-condition when FileVault is enabled. Any mechanism change will need to get tested against this case. We had a fairly straightforward test before, which was to enable FileVault, install Nix, open a file from the store volume in TextEdit, restart, and see if the document reopens.

Ok I tested this in a fresh vm: enabled filevault, opened TextEdit on a file in the store volume, restarted (with reopen windows when logging back in checked) and TextEdit was still there with the contents as before the reboot.

Copy link
Member

@abathur abathur Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I recall, we never got that "it must mount before GUI" to actually be foolproof. The problem being that fstab wouldn't mount the volume before the GUI, and the LaunchDaemon that did this couldn't actually block the GUI, it would just cause the mount attempt to happen sooner and hopefully win.

It's broadly accurate that we're depending on winning the race.

My recollection is that it mostly worked but sometimes failed,

If anyone has receipts it'd be great to have them posted publicly, especially if it's reproducible. I had a reliable protocol for causing race-condition failures with the fstab-only mount, and when testing the daemon-mounted volume I was not once able to induce the same failure. I do not recall anyone telling me that they observed such a race-condition failure after using the test (and now release) installers. (But I'll leave room for eating crow since I'm also somewhat forgetful...)

Personally, it seems to me that we should be able to use autofs ... why doesn't this work? And is there anything we haven't considered that would actually make it have the "mount automatically on access, blocking the operation until the mount completes" behavior? Because that would solve everything.

I generally agree that this path sounded promising, but we flogged this pretty extensively back around #4181 (comment) through the end of that first PR. I've already expended days of scarce hobby time I can't get back banging my head against automount/autofs with nothing to show for it. As before, I'd be happy to have someone prove me wrong there--a simpler system-sanctioned solution would of course be better.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have a vm with recovery mode available soon. I want to use that to look into automount. It definitely seems like the best way, but it seems like autofs has had some breaking changes the last few macos releases. I haven't been using darwin or nixos in that time, though. I'll keep you posted.

@lilyball
Copy link
Member

lilyball commented Dec 2, 2021

If the goal is just "stop using /bin/sh to run the daemon" then I have two alternative solutions:

  1. Set KeepAlive to true but don't have any launch conditions for the daemon. Then use a separate job that runs wait4path prior to running launchctl kickstart system/org.nixos.nix-daemon. I am mildly concerned that launchd might interpret KeepAlive == true with no other launch conditions as launching the job at load anyway. I hope not, but if so, a bogus launch condition could always be specified as launchctl kickstart will ignore it. I don't know offhand how we handle nix-daemon updates either, hopefully it restarts the job rather than e.g. running nix-daemon outside of the launch daemon?
  2. Or we could just have a custom executable that we copy to the system somewhere (e.g. a /Library/Application Support/nix-daemon folder perhaps) and that executable can then run wait4path before exec'ing nix-daemon. This way FDA would have to be applied to that executable but it's better than putting it on /bin/sh. It also does mean one more location to potentially clean up when uninstalling Nix, but Application Support folders are often left behind when uninstalling stuff anyway.

All that said, the docs on PathState sure make it sound like it does what we want (if applied to the nix-daemon job, not the nix-store job) so I'd love to know why that's not usable.

@jsoo1
Copy link
Author

jsoo1 commented Dec 2, 2021

If the goal is just "stop using /bin/sh to run the daemon" then I have two alternative solutions:

I think the other mystery I was hoping to solve here was that the nix volume wasn't mounted on reboot for me, either. The other goal is definitely to avoid giving /bin/sh full disk access.

1. Set `KeepAlive` to `true` but don't have any launch conditions for the daemon. Then use a separate job that runs `wait4path` prior to running `launchctl kickstart system/org.nixos.nix-daemon`. I am mildly concerned that launchd might interpret `KeepAlive == true` with no other launch conditions as launching the job at load anyway. I hope not, but if so, a bogus launch condition could always be specified as `launchctl kickstart` will ignore it. I don't know offhand how we handle nix-daemon updates either, hopefully it restarts the job rather than e.g. running `nix-daemon` outside of the launch daemon?

2. Or we could just have a custom executable that we copy to the system somewhere (e.g. a `/Library/Application Support/nix-daemon` folder perhaps) and that executable can then run `wait4path` before exec'ing `nix-daemon`. This way FDA would have to be applied to that executable but it's better than putting it on `/bin/sh`. It also does mean one more location to potentially clean up when uninstalling Nix, but Application Support folders are often left behind when uninstalling stuff anyway.

Both of those are sounding pretty reasonable. I was even thinking that if a full-fledged .app could be provided, code signing and all that might let us actually get capabilities we want but it seems maybe more than I want to bite off.

All that said, the docs on PathState sure make it sound like it does what we want (if applied to the nix-daemon job, not the nix-store job) so I'd love to know why that's not usable.

I totally agree here. My actions finally completed, so cachix cache at jsoo1-nix-install-tests should be available to test? I think?

@abathur
Copy link
Member

abathur commented Dec 2, 2021

My actions finally completed, so cachix cache at jsoo1-nix-install-tests should be available to test? I think?

I'm not certain it's the "best" way to get it, but here's where I get the installer URL from https://github.com/jsoo1/nix/runs/4388555488?check_suite_focus=true#step:4:3
Screen Shot 2021-12-01 at 8 42 52 PM

You'll end up with something like sh <(curl -L https://jsoo1-nix-install-tests.cachix.org/serve/p0y7njv5mklyrvrry6pybds49d2rbsmw/install) --tarball-url-prefix https://jsoo1-nix-install-tests.cachix.org/serve

@lilyball
Copy link
Member

lilyball commented Dec 2, 2021

I don't suppose there's any LaunchEvent that corresponds to "the Nix volume was mounted" that we could specify in the nix-daemon plist, is there? If we did use that, nix-daemon would have to use xpc_set_event_stream_handler() to consume the event (I don't know what launchd does if a daemon doesn't do that, maybe it's okay if we don't care about viewing the event? It might at least log snarky messages though). And we'd have to figure out a launch event that reliably means "this particular volume has been mounted". I don't know if there's an appropriate IOKit event for that (does IOKit even care about mounted volumes? Or just devices).

We could also have the launch event be a notifyd event that we post from the volume mounter (using notifyutil), though nix-daemon would still be expected to register the event handler. I don't know if there's any practical benefit to this over using something like launchctl kickstart org.nixos.nix-daemon though.


Another thought I had was if we could make the LaunchDaemon for nix-daemon use Sockets to have launchd set up and bind its unix socket on its behalf. This would allow launchd to launch nix-daemon on demand (i.e. when someone connects to the socket). nix-daemon would have to be modified to use launch_activate_socket() to get the file descriptors of course (perhaps signaled by passing a --launch flag to nix-daemon), but it would mean we don't have to worry about trying to launch too soon. My big question here is what launchd will do regarding the volume; will it treat it as an error for the socket directory to be missing? Will it just wait for the directory to exist (i.e. the volume to be mounted) and then create it? What if the volume is mounted but the /nix/var/nix/daemon-socket directory doesn't exist? I don't know. If it does properly wait for the volume to mount and then create the socket, that could be a great solution to launching nix-daemon. I am assuming of course that nix-daemon doesn't do anything until someone connects to its socket, which might be a false assumption (though nothing obvious comes to mind for stuff it should be doing prior to receiving commands on the socket).

@jsoo1
Copy link
Author

jsoo1 commented Dec 3, 2021

I don't suppose there's any LaunchEvent that corresponds to "the Nix volume was mounted" that we could specify in the nix-daemon plist, is there? If we did use that, nix-daemon would have to use xpc_set_event_stream_handler() to consume the event (I don't know what launchd does if a daemon doesn't do that, maybe it's okay if we don't care about viewing the event? It might at least log snarky messages though). And we'd have to figure out a launch event that reliably means "this particular volume has been mounted". I don't know if there's an appropriate IOKit event for that (does IOKit even care about mounted volumes? Or just devices).

That is a great question. I started looking around and came across this technologeeks.com/docs/launchd.pdf which mentions an incomplete list of keys in notify_keys.h which I think lives in Libsystem or the macos sdk? Some versions exist in the store and mention some keys like "com.apple.system.kernel.mount". No idea how to use that information though...

We could also have the launch event be a notifyd event that we post from the volume mounter (using notifyutil), though nix-daemon would still be expected to register the event handler. I don't know if there's any practical benefit to this over using something like launchctl kickstart org.nixos.nix-daemon though.

Another thought I had was if we could make the LaunchDaemon for nix-daemon use Sockets to have launchd set up and bind its unix socket on its behalf. This would allow launchd to launch nix-daemon on demand (i.e. when someone connects to the socket). nix-daemon would have to be modified to use launch_activate_socket() to get the file descriptors of course (perhaps signaled by passing a --launch flag to nix-daemon), but it would mean we don't have to worry about trying to launch too soon. My big question here is what launchd will do regarding the volume; will it treat it as an error for the socket directory to be missing? Will it just wait for the directory to exist (i.e. the volume to be mounted) and then create it? What if the volume is mounted but the /nix/var/nix/daemon-socket directory doesn't exist? I don't know. If it does properly wait for the volume to mount and then create the socket, that could be a great solution to launching nix-daemon. I am assuming of course that nix-daemon doesn't do anything until someone connects to its socket, which might be a false assumption (though nothing obvious comes to mind for stuff it should be doing prior to receiving commands on the socket).

Also an interesting thought. I think the nix-darwin daemon configuration has a socket listener configuration already (which seems to work for me?).

@lilyball
Copy link
Member

lilyball commented Dec 3, 2021

Also an interesting thought. I think the nix-darwin daemon configuration has a socket listener configuration already (which seems to work for me?).

There is no Socket configuration today (see misc/launchd/org.nixos.nix-daemon.plist.in). It looks like the systemd configuration specifies the socket though (which nix-daemon handles via an env var), so adapting this for launchd wouldn't be too hard. Either use an env var or a flag to tell nix-daemon that it's coming from launchd and have it call launch_activate_socket() at the same place where it does the systemd stuff.

@jsoo1
Copy link
Author

jsoo1 commented Dec 3, 2021

Also an interesting thought. I think the nix-darwin daemon configuration has a socket listener configuration already (which seems to work for me?).

There is no Socket configuration today (see misc/launchd/org.nixos.nix-daemon.plist.in). It looks like the systemd configuration specifies the socket though (which nix-daemon handles via an env var), so adapting this for launchd wouldn't be too hard. Either use an env var or a flag to tell nix-daemon that it's coming from launchd and have it call launch_activate_socket() at the same place where it does the systemd stuff.

Oh I see! What do you think the purpose of this is, then? https://github.com/LnL7/nix-darwin/blob/44da835ac40dab5fd231298b59d83487382d2fab/modules/services/nix-daemon.nix#L60

This seems to be all that is necessary to run the daemon when /nix is
mounted.
* Keep RunAtLoad=false so that the daemon executable is not found when
launchd loads the service.
* Keep RunAtLoad=true in darwin-store so that it always runs.
@lilyball
Copy link
Member

lilyball commented Dec 6, 2021

@jsoo1 My best guess? It's probably broken. nix-daemon, in the absence of the LISTEN_FDS env var, will recreate the socket on launch. This means it will unlink any socket that may already be at that path and bind a new one. I would assume this means the first command that tries to use the socket will have its connection be broken immediately. So unless Nix automatically retries the nix-daemon connection, the first command will likely fail.

I also don't know what launchd's behavior is in the event that its socket gets replaced. I don't know if launchd ever terminates socket-launchd jobs automatically (can launchd tell whether anyone is still talking to the process? If it can, then it could plausibly ask the job to exit after idling long enough), though I suspect it defers to the job to determine when to exit due to being idle. nix-daemon isn't going to idle-exit so that's probably fine, but if it does ever terminate, I have no idea if launchd will reestablish the original socket in order to relaunch it.

I really do think nix-daemon should support this setup on macOS (and it should default to this configuration in the installed LaunchDaemon, and then nix-darwin should be updated accordingly), but it just doesn't look like it will work correctly today.

@jsoo1
Copy link
Author

jsoo1 commented Dec 6, 2021

I briefly looked into an automountd solution but that is where the trail runs cold for me. I cannot for the life of me figure out how to debug autofs not mounting the store. Recovery mode is (I think) too early to tell, and the logs just seem to not exist in the syslogs. I think it is clearly the best solution but I just cannot figure it out.

@lilyball
Copy link
Member

lilyball commented Dec 6, 2021

I just filed #5739 about launchd socket activation.

@stale
Copy link

stale bot commented Jun 12, 2022

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Jun 12, 2022
@jsoo1
Copy link
Author

jsoo1 commented Nov 1, 2022

Is this PR still relevant/useful?

@jsoo1
Copy link
Author

jsoo1 commented Nov 1, 2022

It's been a long time, but wait4path is still what gets executed in the daemon service definition. I am a bit fuzzy on the reasons we didn't want to use launchd's PathState...

@Ericson2314 Ericson2314 added the macos Nix on macOS, aka OS X, aka darwin label Jun 14, 2023
@stale stale bot removed the stale label Jun 14, 2023
@Ericson2314
Copy link
Member

Also curious on the status of this during triaging.

@abathur
Copy link
Member

abathur commented Jun 23, 2023

Re-skimming this and seeing that @lilyball bumped #5739 4 minutes after @jsoo1 asked if this was still useful makes me think @lilyball saw that as a good way forward.

Re: PathState, #5698 (comment) makes it sound like @jsoo1 ran into trouble getting that to work as expected.

That makes me think that there hasn't been (yet) a workable option here for ensuring the daemon waits for the volume, thus the interest in exploring the socket option?

@jsoo1
Copy link
Author

jsoo1 commented Jun 23, 2023

Re: PathState, #5698 (comment) makes it sound like @jsoo1 ran into trouble getting that to work as expected.

To be fair to PathState, I couldn't reliably find a way to make it break.

Also I'm not sure how socket activation works in launchd but wouldn't it be prone to the same kind of race condition we have already?

@edolstra edolstra self-requested a review as a code owner November 12, 2024 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
macos Nix on macOS, aka OS X, aka darwin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants