Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/qemu-vm: simplify building nix store image #241373

Merged
merged 1 commit into from
Jul 20, 2024

Conversation

nikstur
Copy link
Contributor

@nikstur nikstur commented Jul 4, 2023

Description of changes

Summary of this change:

  • Simplify code.
  • Stop a disk image from being cached in the binary cache.
  • Make erofs Nix Store image build in an acceptable time outside of
    testing environments (like darwin.builder).
  • Do not regress on performance for tests that use many store paths in
    their Nix store image.
  • Slightly longer startup time for tests where not many store paths are
    included in the image (these probably shouldn't use useNixStoreImage
    anyways).
  • Slightly longer startup time when inputs of VM do not change because
    the Nix store image is not cached anymore.

Remove the storeImage built with make-disk-image.nix. This produced a
separate derivation which is then cached in the binary cache. These
types of images should be avoided because they gunk up the cache as they
change frequently. Now all Nix store images, whether read-only or
writable are based on the erofs image previously only used for read-only
images.

Additionally, simplify the way the erofs image is built by copying the
paths to include to a separate directory and build the erofs image from
there.

Before this change, the list of Nix store paths to include in the Nix
store image was converted to a complex regex that excludes all other
paths from a potentially large Nix store.

This previous approach suffers from two issues:

  1. The regex is complex and, as admitted in the source code of the
    includes-to-excludes.py script, most likely contains at least one
    error. This means that it's unlikely that anyone will touch this
    piece of software again.

  2. When the Nix store image is built from a large Nix store (like when
    you build the VM script to run outside of any testing context) this
    regex becomes painfully slow. There is at least one prominent
    use-case where this matters: darwin.builder.

Benchmarking impressions:

  • Building Nix store via make-disk-image.nix takes ~25s
  • Building Nix store as an erofs image takes ~4s
  • Running nixosTests.qemu-vm-writable-store-image takes ~10s when
    building the erofs image with the regex vs ~14s when building by
    copying to a temporary directory.
  • nixosTests.gitlab which had the biggest gains from the initial erofs
    change takes the same time as before.
  • On a host with ~140k paths in /nix/store, building the erofs image
    with the regex takes 410s as opposed to 6s when copying to a temporary
    directory.
Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.11 Release Notes (or backporting 23.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` labels Jul 4, 2023
@nikstur nikstur requested a review from emilazy July 4, 2023 00:33
@RaitoBezarius
Copy link
Member

Will push your stuff on the classical Hydra jobset.

@RaitoBezarius
Copy link
Member

Pushed 990f197 for baseline.

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the regex doesn't scale too well to large stores.
You could probably make it faster still by adding support for multiple paths in mkfs.erofs, but I guess this is already an improvement.

nixos/tests/qemu-vm-writable-store-image.nix Outdated Show resolved Hide resolved
nixos/modules/virtualisation/qemu-vm.nix Outdated Show resolved Hide resolved
nixos/modules/virtualisation/qemu-vm.nix Outdated Show resolved Hide resolved
@nikstur nikstur force-pushed the qemu-vm-simplify-nix-store-image branch from 8101507 to 74b8ca1 Compare July 4, 2023 19:31
@nikstur
Copy link
Contributor Author

nikstur commented Jul 4, 2023

You could probably make it faster still by adding support for multiple paths in mkfs.erofs

I thought about that but then I couldn't find the patience for the Kernel mailing list and C.

It would be a lovely feature of mkfs.erofs though and I if it is ever implemented, I'll glady switch to it.

@RaitoBezarius
Copy link
Member

I thought about that but then I couldn't find the patience for the Kernel mailing list and C.

I can do it for you :P.

@nikstur
Copy link
Contributor Author

nikstur commented Jul 5, 2023

I think the baseline eval is done now: https://hydra.nixos.org/eval/1797268 Can you please push my changes?

@emilazy
Copy link
Member

emilazy commented Jul 5, 2023

Will test darwin.builder after Hydra gets to this :)

@RaitoBezarius
Copy link
Member

RaitoBezarius commented Jul 5, 2023 via email

@nikstur
Copy link
Contributor Author

nikstur commented Jul 5, 2023

I just realized, the commit you pushed as a baseline is super old. It's from July 2022. That's why I can't really rebase sensibly.

@RaitoBezarius
Copy link
Member

RaitoBezarius commented Jul 5, 2023 via email

@RaitoBezarius
Copy link
Member

Pushed 652411a

@RaitoBezarius
Copy link
Member

Pushed 74b8ca1.

Copy link
Member

@emilazy emilazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good news: darwin.builder seems to work perfectly on x86_64-darwin with these change (modulo patching e2fsprogs to work around a bug fixed by the upcoming Darwin stdenv rework).

Less good news: Previously it instantly started the VM boot and went right into systemd; now there's ~20-25 additional seconds of building the Nix store image on every run. Presumably some regression here is unavoidable, but it's a little unfortunate. Not sure if making the mkfs.erofs change being discussed here would make a significant difference to that. It would be really nice if we could use overlay stores or something here to avoid building the image entirely, but I don't know if that's viable.

Wouldn't want to block this PR on that account as it seems like a good change in general, but the slowdown from not having a cached store is a little unfortunate and hopefully we can do something in the future to address that.

@mweinelt mweinelt added the 12.approvals: 1 This PR was reviewed and approved by one reputable person label Jul 10, 2023
@nikstur
Copy link
Contributor Author

nikstur commented Jul 10, 2023

Some thoughts/questions:

  • We could make the store persistent by writing it to some persistent path and if it exists we just use it (and do not recreate it). This would work the same as it currently works for the root disk (from virtualisation.diskImage). However, I'm not sure this is a super elegant design. I myself have already wasted hours (hopefully only a few) by not realising that I need to clean up old state (e.g. disk images) myself. This seems to be a common concern: nixos/qemu-vm: use cfg.host.pkgs #238596 (comment)

  • Ideally, we'd have an input-addressed caching mechanism that does not rely on derivations. Then we could cache the store image nicely in between invocations (since it is read-only).

  • How do people usually use darwin.builder? Is it normally started once when the system boots up or is it repeatedly called on-demand? In the first case, the 20s delay is probably fine, in the second case that gets annoying quickly.

  • For darwin.builder we could also use these options:

    virtualisation.useBootLoader = true;
    virtualisation.directBoot.enable = true;
    

    Then you have a full disk image that is persistent and you can reuse without any rebuilding. You also don't incur the penalty of using a boot loader because directBoot.enable should forgo the boot loader. Maybe this is a more sensible option for macos-builder/darwin.builder?

@roberth
Copy link
Member

roberth commented Jul 11, 2023

  • Ideally, we'd have an input-addressed caching mechanism that does not rely on derivations. Then we could cache the store image nicely in between invocations (since it is read-only).

That sounds like a massive opportunity for impurity. I wouldn't trust Nix to be reproducible enough while building something once after such a feature would be implemented.

  • How do people usually use darwin.builder?

I've reviewed it for the purpose of more easily bootstrapping a linux builder, so users don't have to mess with a custom VM for that. How people use it, or how people should use it is not something I can really answer.

darwin.builder

It's darwin.linux-builder now :)

virtualisation.directBoot.enable = true;

I currently use a custom direct boot on a mutable image (for an aarch64-linux hercules-ci-agent that I run on a mac). It has a problem in nixos-rebuild switch, probably related to changes to the kernel, although I'm a bit puzzled because NixOS does support that.

firewall-start[25615]: iptables: Failed to initialize nft: Protocol not supported

Could be a me problem though.

@nikstur
Copy link
Contributor Author

nikstur commented Jul 13, 2023

Hydra doesn't show any regressions: https://hydra.nixos.org/eval/1797393 Should we merge this or are there any reservations?

@peterwaller-arm

This comment was marked as resolved.

@roberth
Copy link
Member

roberth commented Jul 14, 2023

These do build for me:

Do you mean that you can substitute those packages' dependencies correctly, or that you've tested the builder?

If you're going to use an expression whose linux dependencies haven't been built by hydra yet, you're going to first need a working linux builder to build the dependencies of nikstur's new expression that's not in the cache yet.

@peterwaller-arm
Copy link

👍 Good point, thanks.

Copy link
Member

@RaitoBezarius RaitoBezarius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @roberth anything left on that PR or are you okay with me merging?

@drupol drupol changed the title nixos/qemu-vm: simplify building nix store image nixos/qemu-vm: simplify building nix store image Jul 26, 2023
@wegank wegank added 12.approvals: 2 This PR was reviewed and approved by two reputable people and removed 12.approvals: 1 This PR was reviewed and approved by one reputable person labels Sep 7, 2023
@wegank wegank added 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 2.status: merge conflict This PR has merge conflicts with the target branch labels Mar 19, 2024
@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Mar 20, 2024
@drupol
Copy link
Contributor

drupol commented Mar 27, 2024

LGTM, @roberth anything left on that PR or are you okay with me merging?

@roberth ^^

@roberth
Copy link
Member

roberth commented Mar 28, 2024

Other than perhaps giving my suggestion a try, no objections.

@nikstur nikstur force-pushed the qemu-vm-simplify-nix-store-image branch from 74b8ca1 to 4c3c6b8 Compare April 1, 2024 23:18
@nikstur
Copy link
Contributor Author

nikstur commented Apr 1, 2024

@ofborg test qemu-vm-writable-store-image

again pls

@ofborg ofborg bot removed 2.status: merge conflict This PR has merge conflicts with the target branch 10.rebuild-darwin: 1 labels Apr 1, 2024
@wegank wegank removed the 12.approvals: 2 This PR was reviewed and approved by two reputable people label Apr 2, 2024
@nikstur nikstur force-pushed the qemu-vm-simplify-nix-store-image branch from 4c3c6b8 to 2a2edce Compare April 2, 2024 23:19
@nikstur nikstur force-pushed the qemu-vm-simplify-nix-store-image branch from 2a2edce to 77fdc11 Compare April 17, 2024 20:01
@RaitoBezarius
Copy link
Member

As we are in the release cycle, I just want to give a heads up to staging and infrastructure people. Let's merge this in 48 - 72 hours if we have no comment by then.

@nikstur
Copy link
Contributor Author

nikstur commented May 1, 2024

@RaitoBezarius should we merge this then?

Summary of this change:

- Simplify code.
- Stop a disk image from being cached in the binary cache.
- Make erofs Nix Store image build in an acceptable time outside of
  testing environments (like `darwin.builder`).
- Do not regress on performance for tests that use many store paths in
  their Nix store image.
- Slightly longer startup time for tests where not many store paths are
  included in the image (these probably shouldn't use `useNixStoreImage`
  anyways).
- Slightly longer startup time when inputs of VM do not change because
  the Nix store image is not cached anymore.

Remove the `storeImage` built with make-disk-image.nix. This produced a
separate derivation which is then cached in the binary cache. These
types of images should be avoided because they gunk up the cache as they
change frequently. Now all Nix store images, whether read-only or
writable are based on the erofs image previously only used for read-only
images.

Additionally, simplify the way the erofs image is built by copying the
paths to include to a separate directory and build the erofs image from
there.

Before this change, the list of Nix store paths to include in the Nix
store image was converted to a complex regex that *excludes* all other
paths from a potentially large Nix store.

This previous approach suffers from two issues:

1. The regex is complex and, as admitted in the source code of the
   includes-to-excludes.py script, most likely contains at least one
   error. This means that it's unlikely that anyone will touch this
   piece of software again.

2. When the Nix store image is built from a large Nix store (like when
   you build the VM script to run outside of any testing context) this
   regex becomes painfully slow. There is at least one prominent
   use-case where this matters: `darwin.builder`.

Benchmarking impressions:

- Building Nix store via make-disk-image.nix takes ~25s
- Building Nix store as an erofs image takes ~4s
- Running nixosTests.qemu-vm-writable-store-image takes ~10s when
  building the erofs image with the regex vs ~14s when building by
  copying to a temporary directory.
- nixosTests.gitlab which had the biggest gains from the initial erofs
  change takes the same time as before.
- On a host with ~140k paths in /nix/store, building the erofs image
  with the regex takes 410s as opposed to 6s when copying to a temporary
  directory.
@nikstur nikstur force-pushed the qemu-vm-simplify-nix-store-image branch from 77fdc11 to 289dd22 Compare July 18, 2024 18:39
@nikstur
Copy link
Contributor Author

nikstur commented Jul 19, 2024

Pushed ff5889a as baseline

@nikstur
Copy link
Contributor Author

nikstur commented Jul 19, 2024

Pushed 289dd22

@nikstur
Copy link
Contributor Author

nikstur commented Jul 20, 2024

https://hydra.nixos.org/eval/1807756#tabs-now-fail

5 new tests succeeded.
3 new tests failed. All of them are intermittent errors that are not related to the changes. I re-tested the ones besides the Gitlab test locally. The Gitlab test succeeded on x86 now.

This is ready to merge.

@Lassulus Lassulus merged commit e209fc2 into NixOS:master Jul 20, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 1-10 10.rebuild-linux: 1-10
Projects
Development

Successfully merging this pull request may close these issues.

9 participants