Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

darwin.stdenv: Darwin stdenv rework #240433

Merged
merged 7 commits into from Jul 5, 2023

Conversation

reckenrode
Copy link
Contributor

Description of changes

This PR contains the reworked and refactored (or perhaps rewritten) Darwin stdenv bootstrap. The reasoning behind the change is explained in 15f7418. This PR targets clang 11, but the stdenv is able to build using clang 16. A separate PR will be coming for the LLVM bump itself, which still has a few things it requires to be merge (other than the stdenv changes) before it can be done.

I want to touch on the inclusion of a couple of other commits. While I’d have liked to make this just the PR for the stdenv bump, there were few other things that needed to be incorporated:

cc-wrapper/ld-wrapper

The new stdenv disables the response file if your Darwin version is new enough to support a 1 MiB ARG_MAX. Without it, clang 16 intermittently fails with a “bad file descriptor” error. I explored other options (including using a coproc in Bash that waited on the clang to exit), but the only other one that worked required creating temporary files and not cleaning them up. Note: I did not explore removing the exec. The change in the stdenv is localized and does not require changes to cc-wrapper or ld-wrapper.

cctools

The cctools-port to cctools-llvm switch is required by clang 15+. It’s not strictly needed by clang 11, but the stdenv is written with the assumption that it will be used. The reason for the switch is compiler-rt 15+ is not properly stripped by the strip in cctools. It leaves bootstrap paths in liborc_rt_osx.a. cctools-llvm only uses LLVM tools when they are a drop-in replacement for Apple’s cctools.

CF

There are a bunch of CF fixes. Unfortunately, while #234861 did fix the rpath issue for cross-building, it broke rpath for Darwin builds that require it (like e2fsprogs). The previous version of the stdenv did not have this issue because it effectively disabled the rpath hook during the stdenv bootstrap. The reworked one is written to build with it in effect. The changes to the previous version would be too great just to replace them, so it is included here.

@eliasnaur I made sure that this didn’t regress your use case of cross-building. NIX_COREFOUNDATION_RPATH should still be empty in a cross build.

While working on CF, I also took the opportunity to switch it to use the nixpkgs ICU. CF was one of two packages using darwin.ICU. Once Darwin has a deprecation mechanism, it will be deprecated and removed along with darwin.libiconv. On my clang 16 branch, I have been testing with both of those dependencies removed.

@emilazy The CF changes should also address the issue you reported in #238791. As part of the CF changes, I switched the build system to use cmake instead of using build.py. I’m not sure why it makes a difference, but I get the expected outputs when linking against the new CF. Please let me know if this works for you.

Testing

I’ve been building my configs plus other things and stdenvs on both Darwin platforms for with clang 11 and clang 16. I have been using this flake to automate mass builds. Note that the darwin-llvm16-update branch is not yet ready. I need to take this one and cherry-pick just the commits that are required to build with clang 16. I will be opening separate PRs for those fixes and adding them to the tracking issue.

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.11 Release Notes (or backporting 23.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@ofborg ofborg bot added the 6.topic: darwin Running or building packages on Darwin label Jun 28, 2023
@github-actions github-actions bot added the 6.topic: stdenv Standard environment label Jun 28, 2023
@reckenrode reckenrode changed the title Darwin stdenv rework darwin.stdenv: Darwin stdenv rework Jun 28, 2023
Copy link
Member

@veprbl veprbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've counted 9 stages. By any chance, do you know: Is this slower or faster to bootstrap?

pkgs/stdenv/darwin/default.nix Outdated Show resolved Hide resolved
pkgs/stdenv/darwin/default.nix Outdated Show resolved Hide resolved
pkgs/stdenv/darwin/default.nix Outdated Show resolved Hide resolved
@reckenrode reckenrode force-pushed the darwin-stdenv-rework branch 3 times, most recently from 3eb551f to 93f5e40 Compare June 29, 2023 01:41
@reckenrode
Copy link
Contributor Author

I fixed the previous stage labels and pushed an updated branch. I think I also fixed the ofborg failure.

I've counted 9 stages.

The big difference is the addition of a “build libc” stage as stage 2, which I had to split into separate stages because I need to use the newly built Libsystem to build CF. (Neither exactly take a while to build though.) The other addition is the sysctl stage. It was needed to break an infinite recursion if I tried to build in stage 1 proper.

By any chance, do you know: Is this slower or faster to bootstrap?

I haven’t done benchmarks comparing complete rebuilds, but I doubt it’s faster. As part of the decoupling from the bootstrap tools, I end up rebuilding everything in them. Everything in them. This was done to work around having older versions of some required tools, but it also makes the build cleaner because it can rely on not having bootstrap tools binaries leak in via the path. It also uncovered #237348.

It also builds more things. cctools is included in the final overlay. To make sure that nix build nixpkgs#darwin.cctools^man provides correct man pages, they need to be built during the bootstrap. That means building LLVM man pages, which pulls in a bunch of Python stuff. Most of it is pretty quick to build, but the cost is obviously non-zero. There’s stuff to build where there wasn’t any before.

And that’s not even getting to LLVM 16. LLVM 16 has tests, and it runs them. That helps ensure everything is actually functioning properly (which did uncover a problem with llvm-lipo), but they take a while to run. I looked into only running them once, but trying to override llvmPackages.(lib)llvm doesn’t seem to work. The overriden version gets built, but so does the original. I sunk way too much time into that before accepting that it was going to run the tests twice.

Oh, and I have to build system_cmds for the final output, though it’s quick. The stdenv detects your OS version to determine whether or not to use a response file. This is due to the clang 16 issue mentioned above, which I suspect is similar to #119779 but not reliably reproducible. The easiest way I can reproduce is trying to build GHC.

@emilazy
Copy link
Member

emilazy commented Jun 29, 2023

@emilazy The CF changes should also address the issue you reported in #238791. As part of the CF changes, I switched the build system to use cmake instead of using build.py. I’m not sure why it makes a difference, but I get the expected outputs when linking against the new CF. Please let me know if this works for you.

Yep, works fine :)

I'm unqualified to review this but I want to thank you for the immense amount of work you've put into this to ease Darwin maintenance and bring it in line with the Linux stdenv structure!

@eliasnaur
Copy link
Contributor

@emilazy The CF changes should also address the issue you reported in #238791. As part of the CF changes, I switched the build system to use cmake instead of using build.py. I’m not sure why it makes a difference, but I get the expected outputs when linking against the new CF. Please let me know if this works for you.

Yep, works fine :)

I'm unqualified to review this but I want to thank you for the immense amount of work you've put into this to ease Darwin maintenance and bring it in line with the Linux stdenv structure!

Seconded! Not only a great amount of work, but also very thorough documentation of said work. It's a pleasure to follow down here from the peanut gallery.

Copy link
Contributor

@tjni tjni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also a cheerleader and looking forward to learning a lot from this.

As such, I started reading through this and have a few comments, mostly questions around how things work that I think can be answered as comments in the code or commit message. When I have more time, I plan to read through more (and probably ask more questions).

@reckenrode
Copy link
Contributor Author

reckenrode commented Jun 29, 2023

I’ll clean up the patches for CF and push an update. I also have another commit that removes the curl dependency from CF. This drops a bunch of curl-related stuff from the final stdenv and should allow curl to drop the patch removing NAT64 support on Darwin (which needs SystemConfiguration, causing a cycle due to the CF dependency).

@reckenrode reckenrode force-pushed the darwin-stdenv-rework branch 2 times, most recently from a6f3712 to 365592b Compare June 30, 2023 00:55
@reckenrode
Copy link
Contributor Author

@tjni I split the CF changes up into logical commits along with the related patches. I also included comments in the patch explaining the change CoreFoundation.h header change and the addition of those two files.

@reckenrode
Copy link
Contributor Author

I pushed a fix for the fix to the NIX_LD_USE_RESPONSE_FILE typo. It turned out the version of ld64 in the bootstrap tools is too old to support response files.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/darwin-again/29331/65

@tjni
Copy link
Contributor

tjni commented Jun 30, 2023

The extra comments and extra patches are exactly what I was looking for. Thank you for doing it. It's late where I live right now, but I intend to read through the main stdenv stages tomorrow morning :)

@Atemu Atemu requested a review from a team June 30, 2023 10:51
Copy link
Member

@Atemu Atemu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed it and can't claim to understand but it left me with the impression that I could understand without too much trouble; the code looks well crafted.

I especially like the eval-time sanity checks.

A few nits:

pkgs/stdenv/darwin/default.nix Outdated Show resolved Hide resolved
pkgs/stdenv/darwin/default.nix Outdated Show resolved Hide resolved
@reckenrode reckenrode force-pushed the darwin-stdenv-rework branch 3 times, most recently from ff447a9 to 5b422ef Compare June 30, 2023 12:06
@reckenrode
Copy link
Contributor Author

@ofborg eval

@reckenrode
Copy link
Contributor Author

The output in the ofborg-eval failure is strange. It looks like it just stopped. I tried retriggering an eval (see above), but that doesn’t seem to have worked. This is what the details shows:

nix-env did not evaluate cleanly:
 ["copying path '/nix/store/814ql2bfkwdfb73rmd9fi3d9vfdjhcap-bootstrap-tools' from 'https://cache.nixos.org'..."]

@reckenrode
Copy link
Contributor Author

Pushing seems to have triggered it. The only change was to restore the quoting of the install_name_tool invocation to what it is currently in nixpkgs to make a cleaner diff (and see if it triggers a new ofborg-eval). I also rebased on current staging for my local branch. There should be no other changes in the PR.

I think this should be the PR’s final form except for changes requested by feedback.

@reckenrode
Copy link
Contributor Author

Pushing seems to have triggered it.

Or not? 🤨

@tjni
Copy link
Contributor

tjni commented Jul 2, 2023

I tried running ofborg-eval locally on my aarch64-darwin system:

nix-env --query --available --no-name --attr-path --out-path --option extra-experimental-features no-url-literals . -f .gc-of-borg-outpaths.nix --arg checkMeta true

where .gc-of-borg-outpaths.nix is copied from https://github.com/NixOS/ofborg/blob/released/ofborg/src/outpaths.nix.

I notice that it fails on this line:

raw_version = lib.pipe "${bootstrapTools}/lib/clang/" (with builtins; [ readDir attrNames ]);

when system is x86_64-darwin (I guess since I am on the other one). When it runs readDir, it realizes bootstrapTools, but for some reason an error occurs:

error: path '/nix/store/wn8fif83s2lwxl8n4s7916c3pj8lli6l-bootstrap-tools.drv' is not valid

@reckenrode
Copy link
Contributor Author

Thanks. I hadn’t run it locally yet, but I had a hunch the problem might be related to that.

@emilazy
Copy link
Member

emilazy commented Jul 2, 2023

Unless I'm reading it wrong, that line uses (the misnamed) "import-from-derivation": you're making evaluation results depend on the result of a derivation output. That's not allowed in nixpkgs (and therefore by ofborg and Hydra).

@reckenrode
Copy link
Contributor Author

I pushed a fix for the IFD. I split the version detection:

  • During the bootstrap it probes the path.
  • For the final stdenv build, it gets the version from release_version.

It’s still building, but I feel pretty confident it’s okay if it’s gotten this far so far.

In preparation for bumping the LLVM used by Darwin, this change
refactors and reworks the stdenv build process. When it made sense,
existing behaviors were kept to avoid causing any unwanted breakage.
However, there are some differences. The reasoning and differences are
discussed below.

- Improved cycle times - Working on the Darwin stdenv was a tedious
  process because `allowedRequisites` determined what was allowed
  between stages. If you made a mistake, you might have to wait a
  considerable amount of time for the build to fail. Using assertions
  makes many errors fail at evaluation time and makes moving things
  around safer and easier to do.
- Decoupling from bootstrap tools - The stdenv build process builds as
  much as it can in the early stages to remove the requirement that the
  bootstrap tools need bumped in order to bump the stdenv itself. This
  should lower the barrier to updates and make it easier to bump in the
  future. It also allows changes to be made without requiring additional
  tools be added to the bootstrap tools.
- Patterned after the Linux stdenv - I tried to follow the patterns
  established in the Linux stdenv with adaptations made to Darwin’s
  needs. My hope is this makes the Darwin stdenv more approable for
  non-Darwin developers who made need to interact with it. It also
  allowed some of the hacks to be removed.
- Documentation - Comments were added explaining what was happening and
  why things were being done. This is particular important for some
  stages that might not be obvious (such as the sysctl stage).
- Cleanup - Converting the intermediate `allowedRequisites` to
  assertions revealed that many packages were being referenced that no
  longer exist or have been renamed. Removing them reduces clutter and
  should help make the stdenv bootstrap process be more understandable.
swift-corelibs fails to build due to a missing header and an invalid
pointer conversion. Patches are provided to fix both of these issues.
Switching the build system to cmake makes it easier to make changes to
the build (particularly which dependencies to link). It also removes a
lot of manual build steps and fixes the issue identified by @emilazy in
NixOS#238791.

Fixes NixOS#238791.
Upstream swift-corelibs links against icu on Linux, so it is not
necessarily tied to the version of ICU provided by Apple for Darwin.

swift-corelibs and qtwebkit are the only two packages that link against
darwin.ICU. Switching to the nixpkgs icu will allow the Darwin-specific
package to be deprecated and removed eventually.
swift-corelibs uses libcurl to implement `NSURLSession` in Foundation
via the symbols exported by CF. Foundation is not build on Darwin, and
these symbols are not exported by the system CoreFoundation.

By not linking against libcurl, this breaks a cycle between CF and
libcurl. That should allow libcurl to drop the patch disabling
linking against the SystemConfiguration and restore NAT64 support.

Unfortunately, the Darwin stdenv bootstrap still needs to build
dependencies that use `fetchFromGitHub`. While it can drop curl from the
final stdenv, it still needs to use it during the stdenv bootstrap.
@emilazy found a bug in NixOS#234861. Specifically, the hook is not actually
applied. e2fsprogs links against darwin.CF, but since it cannot find the
framework by rpath, it crashes.

Since the hook should always be used, it is copied directly to
`nix-support/setup-hook` instead of providing it as an attribute. This
preserves dropping the hook in the cross-compilation case while
providing it for everything else that needs it.

To avoid further churn and due to the complexity of building the stdenv
with the hook active, this change required the stdenv rework.
Copy link
Contributor

@tjni tjni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for the hard work.

@wegank wegank merged commit f0a11a5 into NixOS:staging Jul 5, 2023
17 checks passed
@tjni tjni mentioned this pull request Jul 8, 2023
12 tasks
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nix-macos-monthly/12330/40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet