Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removes zfs from only the "new_kernel" installer variants #59863

Closed
wants to merge 2 commits into from

Conversation

samueldr
Copy link
Member

Fixes #58959.

This was tested using this patch to neuter the zfs kernel module.

nix-build nixos/release-combined.nix -A nixos.iso_minimal_new_kernel.x86_64-linux

and on aarch64

nix-build nixos/release-combined.nix -A nixos.iso_minimal_new_kernel.aarch64-linux

I wasn't able to think of an elegant way to reduce the duplication of the list of supported filesystems. Referencing to a configured value within the module system is an infinite recursion. The only way I thought of is a bit ugly, is through having the list in a .nix file which would be imported (and filtered as needed). This would have made one bizarro file which would live within the other module files, but couldn't be added to imports.

Any inputs on ways to remove the mostly duplicated list?

This is because of the phases where zfs support is not yet
available for the new kernel, forcing "new_kernel" variants not to be
built. This might be affecting aarch64-linux more often than
x86_64-linux.

The x86_64-linux image is seldom used, most likely uses are for
extremely new hardware exhibiting issues on the current LTS. The
aarch64-linux image is more likely to be used considering how board
support is always improving. It was the previous default on
aarch64-linux.
@@ -49,7 +49,7 @@
];

# Include support for various filesystems.
boot.supportedFilesystems = [ "btrfs" "reiserfs" "vfat" "f2fs" "xfs" "zfs" "ntfs" "cifs" ];
boot.supportedFilesystems = lib.mkDefault [ "btrfs" "reiserfs" "vfat" "f2fs" "xfs" "zfs" "ntfs" "cifs" ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could condition including zfs based on the kernel version from config.boot.kernelPackages

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ seems interesting for sure. I'm a bit concerned about using mkDefault. Would users who have already added an additional FS find their system unbootable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Looks like it can't be done through comparing config.boot.kernelPackages.kernel.version; infinite recursion, which I think is coming from the zfs task, which also looks at supportedFilesystems, and config.boot.kernelPackages. Though I'm unsure about the diagnosis.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the trace with --show-trace?

Copy link
Member Author

@samueldr samueldr Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is:

Trace
error: while evaluating anonymous function at /Users/samuel/tmp/nixpkgs/nixpkgs/nixos/release-combined.nix:20:34, called from undefined position:
while evaluating 'removeMaintainers' at /Users/samuel/tmp/nixpkgs/nixpkgs/nixos/release-combined.nix:17:23, called from /Users/samuel/tmp/nixpkgs/nixpkgs/nixos/release-combined.nix:20:37:
while evaluating anonymous function at /Users/samuel/tmp/nixpkgs/nixpkgs/nixos/release.nix:146:32, called from /Users/samuel/tmp/nixpkgs/nixpkgs/lib/attrsets.nix:292:43:
while evaluating 'makeIso' at /Users/samuel/tmp/nixpkgs/nixpkgs/nixos/release.nix:42:5, called from /Users/samuel/tmp/nixpkgs/nixpkgs/nixos/release.nix:146:40:
while evaluating 'hydraJob' at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/customisation.nix:157:14, called from /Users/samuel/tmp/nixpkgs/nixpkgs/nixos/release.nix:46:5:
while evaluating the attribute 'drvPath' at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/customisation.nix:174:13:
while evaluating the attribute 'drvPath' at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/customisation.nix:141:13:
while evaluating the attribute 'sources' of the derivation 'nixos-minimal-19.09pre56789.gfedcba-x86_64-linux.iso' at /Users/samuel/tmp/nixpkgs/nixpkgs/nixos/lib/make-iso9660-image.nix:49:3:
while evaluating anonymous function at /Users/samuel/tmp/nixpkgs/nixpkgs/nixos/lib/make-iso9660-image.nix:56:18, called from undefined position:
while evaluating the attribute 'source' at /Users/samuel/tmp/nixpkgs/nixpkgs/nixos/modules/installer/cd-dvd/iso-image.nix:585:11:
while evaluating the attribute 'boot.kernelPackages.kernel' at undefined position:
while evaluating anonymous function at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:75:45, called from undefined position:
while evaluating the attribute 'value' at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:338:9:
while evaluating the option `boot.kernelPackages':
while evaluating the attribute 'isDefined' at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:375:5:
while evaluating the attribute 'values' at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:364:9:
while evaluating the attribute 'values' at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:458:7:
while evaluating anonymous function at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:350:28, called from /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:350:17:
while evaluating 'dischargeProperties' at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:417:25, called from /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:351:62:
while evaluating the attribute 'condition' at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:502:14:
while evaluating the attribute 'condition' at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:502:14:
while evaluating the attribute 'condition' at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:502:14:
while evaluating the attribute 'boot.supportedFilesystems' at undefined position:
while evaluating anonymous function at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:75:45, called from undefined position:
while evaluating the attribute 'value' at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:338:9:
while evaluating the option `boot.supportedFilesystems':
while evaluating the attribute 'isDefined' at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:375:5:
while evaluating the attribute 'values' at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:364:9:
while evaluating the attribute 'values' at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:458:7:
while evaluating anonymous function at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:350:28, called from /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:350:17:
while evaluating 'dischargeProperties' at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:417:25, called from /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:351:62:
while evaluating the attribute 'value' at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:234:44:
while evaluating 'optional' at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/lists.nix:241:20, called from /Users/samuel/tmp/nixpkgs/nixpkgs/nixos/modules/profiles/base.nix:64:6:
while evaluating 'versionOlder' at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/strings.nix:461:22, called from /Users/samuel/tmp/nixpkgs/nixpkgs/nixos/modules/profiles/base.nix:66:5:
while evaluating the attribute 'boot.kernelPackages' at undefined position:
while evaluating anonymous function at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:75:45, called from undefined position:
while evaluating the attribute 'value' at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:338:9:
infinite recursion encountered, at /Users/samuel/tmp/nixpkgs/nixpkgs/lib/modules.nix:338:9

(feel free to ping me on IRC if you have some time for me to pick your brain :))

Copy link
Member

@fpletz fpletz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove zfs on these images we should either remove it from all install mediums or not at all since this is an invisible difference only waiting for users to trip on it and start getting confused.

Another option would be to always use the latest Linux kernel compatible to zfs.

@samueldr
Copy link
Member Author

samueldr commented Sep 28, 2019

If we remove zfs on these images we should either remove it from all install mediums or not at all since this is an invisible difference only waiting for users to trip on it and start getting confused.

I agree, this is a reason why this hasn't progressed further yet.

Another option would be to always use the latest Linux kernel compatible to zfs.

Sometimes they get EOL'd before the next is compatible. This means that we may have "new kernel" pointing to the LTS (e.g. 4.19 last not EOL'd version) sporadically.

This not something that is exactly under our control.

We really are between a rock and a hard place here, the kernel advancing, and zfs not being ready for stable yet for weeks at a time.


We may want to review the policy on including zfs on all installer media. We may want to make it only on the media we can support adding zfs to, meaning LTS.

While this may be confusing at first, we could be doing this by having those media:

  • installer_lts-kernel[_zfs]
  • installer_latest-kernel[_zfs]

Always specifying the kernel type in the media name, and always appending _zfs as a "feature" flag.

The only published variants would be:

  • installer_lts-kernel_zfs
  • installer_latest-kernel

This would at least reduce confusion about the features of the available media, by making it something that is described in their names. Thus, removing the "invisible" difference in features.

This example skipped over the _graphical feature flag that should also be considered the same.

  • installer_lts-kernel_zfs_graphical

@stale
Copy link

stale bot commented Jun 1, 2020

Thank you for your contributions.
This has been automatically marked as stale because it has had no activity for 180 days.
If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity.
Here are suggestions that might help resolve this more quickly:

  1. Search for maintainers and people that previously touched the
    related code and @ mention them in a comment.
  2. Ask on the NixOS Discourse. 3. Ask on the #nixos channel on
    irc.freenode.net.

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 1, 2020
@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Oct 3, 2020
@aristidb
Copy link
Contributor

How frequent is it actually that a kernel is EOL'd before ZFSonLinux adds support for it? And is it really completely unacceptable to use an EOL'd kernel?

@stale
Copy link

stale bot commented Jun 4, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 4, 2021
@domenkozar
Copy link
Member

domenkozar commented Aug 2, 2021

As this is open for 2+ years, it seems like we won't make this perfect but it's better to have a live image with the latest kernel without ZFS support rather than none at all?

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Aug 2, 2021
@samueldr
Copy link
Member Author

samueldr commented Aug 4, 2021

I guess we could really go with the plan I previously proposed:

And deprecate the "unqualified" iso names. We could go with that scheme without producing all variants. E.g. produce only "lts+zfs", but (try to) produce "latest+zfs", and produce "latest+none".

@domenkozar
Copy link
Member

I guess we could really go with the plan I previously proposed:

And deprecate the "unqualified" iso names. We could go with that scheme without producing all variants. E.g. produce only "lts+zfs", but (try to) produce "latest+zfs", and produce "latest+none".

Isn't that identical to having only latest that includes zfs if zfs.meta.isBroken == false?

If we wanted to improve UX it would print that ZFS is disabled, but that seems like a separate step.

@samueldr
Copy link
Member Author

samueldr commented Aug 4, 2021

Do we know that zfs is broken at eval time?

I haven't checked, maybe we have a meta.broken set to the last known good major kernel version to build zfs.

Though there is a fundamental UX difference in always producing the unstable image without zfs, and when possible producing the one with zfs: consistency. If the "same" iso image is produced sometimes with, sometimes without, it will be harder for end-users to really know whether the current unstable iso has zfs support. If instead there is always the "unstable without zfs support" variant, and sometimes the additional "unstable with zfs" variant isn't built, the messaging stays more consistent.

So even if, at eval time, we can know zfs support is present or not, we shouldn't arbitrarily change the supported features of the published unstable iso.

Does that make sense?

@domenkozar
Copy link
Member

Do we know that zfs is broken at eval time?

I haven't checked, maybe we have a meta.broken set to the last known good major kernel version to build zfs.

No way to ensure that, but ZHF does end up marking it as broken.

Though there is a fundamental UX difference in always producing the unstable image without zfs, and when possible producing the one with zfs: consistency. If the "same" iso image is produced sometimes with, sometimes without, it will be harder for end-users to really know whether the current unstable iso has zfs support. If instead there is always the "unstable without zfs support" variant, and sometimes the additional "unstable with zfs" variant isn't built, the messaging stays more consistent.

So even if, at eval time, we can know zfs support is present or not, we shouldn't arbitrarily change the supported features of the published unstable iso.

Does that make sense?

Makes sense, although the expectation that you need to use another image for zfs is an UX issue itself (and the fact that currently users are used to zfs being part of latest kernel) so there's no ideal solution.

Either way, whatever gets us this fixed sounds good to me.

@domenkozar
Copy link
Member

We're losing users due to this bug: https://twitter.com/KirinDave/status/1439053837914968068

@vcunat
Copy link
Member

vcunat commented Sep 20, 2021

Can you point out to me what's the breakage in his case? I see lots of green for new_kernel ISOs: https://hydra.nixos.org/job/nixos/trunk-combined/nixos.iso_minimal_new_kernel.x86_64-linux

but they also wrote

I'm not confident in setting up NixOS without a gui

which would mean graphical ISO (right?) but we don't even have those with new_kernel and they're channel-blocking 🤷🏽

@vcunat
Copy link
Member

vcunat commented Sep 20, 2021

Ah, perhaps you meant the fact that we don't have new_kernel ISOs on 21.05 anymore due to ZFS? (combination mentioned in Graham's blog post; it's meta.broken)

@vcunat
Copy link
Member

vcunat commented Sep 20, 2021

The current code knows compatible version combinations and sets meta.broken based on that. I believe that should work automatically on kernel updates, etc... so perhaps include it in default based on that? (as suggested by SamuelDR)

@vcunat
Copy link
Member

vcunat commented Sep 20, 2021

The only other way I can see (that haven't been mentioned) is to use kernel that's not latest upstream but a bit older which hopefully has such issues resolved (also nvidia is often reported). But that seems too complicated to me; arguably I don't personally care for those cases prone to breaking.

@domenkozar
Copy link
Member

I don't see why ZFS should get special treatment, if it doesn't work on the latest kernel it should just be disabled. There are other uses of an OS than using only ZFS and latest kernel is important for newer hardware - laptops.

We could have another image with zfs and latest compatible kernel, but that's really another issue beyond fixing new_kernel.

@vcunat
Copy link
Member

vcunat commented Sep 20, 2021

Well, I personally don't mind removing ZFS from new_kernel ISOs (unconditionally). Using meta.broken just seemed like a cheap compromise.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/package-zfs-kernel-2-0-5-5-14-5-is-marked-as-broken-refusing-to-evaluate/15139/11

@vcunat vcunat mentioned this pull request Oct 2, 2021
@vcunat
Copy link
Member

vcunat commented Oct 2, 2021

Another option not mentioned above would be to use zfs.latestCompatibleLinuxPackages (which exists already), i.e. for new_kernel images use possibly a slightly older kernel.

Yes, it would be a special treatment for ZFS. I don't use ZFS myself, but I can understand it's hard to install with ZFS on newer HW without some image like this – such argument does not hold in other cases like nvidia drivers AFAIK.

@samueldr
Copy link
Member Author

samueldr commented Oct 2, 2021

I will leave my current thoughts about a solution:

Re-think image generation with "features"; we currently have two features:

  • new_kernel
  • zfs

We do not need to build all features combinations. Adding features should be done only when absolutely needed.

A generation needs the following two "feature sets" to pass to advance:

  • [ zfs ]
  • [ new_kernel ]

And these installer image feature sets will be tried to be built (exhaustively):

  • [ zfs ]
  • [ new_kernel ]
  • [ new_kernel zfs ]

The website can only directly link to the iso images with feature sets required for advancing. (But we don't necessarily need to link to the new kernel image directly.)

The website could link to a generated page (generated by the channels script, which already copies the iso) with a list all of the built images. Images not built could (should?) provide a short blurb explaining why they aren't built (do we have that info from the Hydra build attempt, or eval?). (This generated list of images could also include other more "exotic" images like aarch64 iso and sd cards in the future.)

As for naming, nixos-XX.YY-${concatStringsSep "-" features}.iso (and similar for SD card images).

Anything missing? Anything unclear?


Oops, should have read the previous comment. newest_zfs_kernel could be a feature too, if deemed desirable. And rather than [ new_kernel zfs ], we would build [ newest_zfs_kernel zfs ] images.

@stale
Copy link

stale bot commented Apr 16, 2022

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Apr 16, 2022
@samueldr
Copy link
Member Author

I give up, no one else is getting sniped into fixing this. I took the time to fix this.

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Dec 23, 2022
@samueldr samueldr closed this Dec 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
21.11 Blockers
In progress
Development

Successfully merging this pull request may close these issues.

NixOS image with latest kernel & ZFS Support
10 participants