Skip to content

Conversation

@lf-
Copy link
Member

@lf- lf- commented Sep 18, 2023

Description of changes

Fixes #253418

I have personally tested kernel 6.1 and confirmed that my machine now boots
again (yay!). I will set off the rest of them building overnight but may not
get around to actually validating on hardware very soon.

This is the regression this patches around:
https://bugzilla.kernel.org/show_bug.cgi?id=217802

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.11 Release Notes (or backporting 23.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@github-actions github-actions bot added the 6.topic: kernel The Linux kernel label Sep 18, 2023
@ofborg ofborg bot added 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches. 10.rebuild-linux: 1001-2500 This PR causes many rebuilds on Linux and should target the staging branches. labels Sep 18, 2023
@alyssais
Copy link
Member

Inclined to accept this since OpenSUSE has also reverted, but I'm a bit nervous that this was reported a month ago and still nothing has happened upstream AFAICT?

@lf-
Copy link
Member Author

lf- commented Sep 18, 2023

Inclined to accept this since OpenSUSE has also reverted, but I'm a bit nervous that this was reported a month ago and still nothing has happened upstream AFAICT?

I think that one of their hands isn't talking to the other, probably because lkml is a pain so the users and packagers didn't write in the thread. They could probably use a message reiterating that opensuse reverted downstream, including the bugzilla link, and that it is not just one weird hardware combo and happens to all copies of that laptop model.

@alyssais
Copy link
Member

Inclined to accept this since OpenSUSE has also reverted, but I'm a bit nervous that this was reported a month ago and still nothing has happened upstream AFAICT?

I think that one of their hands isn't talking to the other, probably because lkml is a pain so the users and packagers didn't write in the thread. They could probably use a message reiterating that opensuse reverted downstream, including the bugzilla link, and that it is not just one weird hardware combo and happens to all copies of that laptop model.

How about submitting a revert patch upstream, pointing this out, and linking to the previous LKML and Bugzilla discussion?

@lf-
Copy link
Member Author

lf- commented Sep 18, 2023

Inclined to accept this since OpenSUSE has also reverted, but I'm a bit nervous that this was reported a month ago and still nothing has happened upstream AFAICT?

I think that one of their hands isn't talking to the other, probably because lkml is a pain so the users and packagers didn't write in the thread. They could probably use a message reiterating that opensuse reverted downstream, including the bugzilla link, and that it is not just one weird hardware combo and happens to all copies of that laptop model.

How about submitting a revert patch upstream, pointing this out, and linking to the previous LKML and Bugzilla discussion?

I think I am hilariously more able to use git-send-email than I am able to practically send replies to the list so I may well do that today.

@alyssais
Copy link
Member

I think I am hilariously more able to use git-send-email than I am able to practically send replies to the list so I may well do that today.

(You can also use git-send-email to send non-patch replies)

intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Sep 18, 2023
This reverts commit 101bd90.

This commit causes the NVMe controller to not work on the Dell XPS 15
9560, and similar laptop models. It appears to happen with any SSD
model.

This commit is broken on 6.1, 6.4, 6.5, and 6.6-rc1.

OpenSUSE has already reverted, and I have submitted a revert to NixOS.
As far as I can tell, this regression has fallen through the cracks.

Symptom:

kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
kernel: nvme 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible
kernel: nvme nvme0: Disabling device after reset failure: -19
systemd-cryptsetup[169]: Device /dev/disk/by-uuid/b80aedf8-ddd4-46fa-8d09-5215d5f286b9 READ lock released.
systemd-cryptsetup[169]: IO error while decrypting keyslot.
systemd-cryptsetup[169]: Keyslot 0 (luks2) open failed with -5.
systemd-cryptsetup[169]: Keyslot open failed.
systemd-cryptsetup[169]: Failed to activate with specified passphrase: Input/output error

There are several downstream bugs, these are the ones I know of:
- https://bugzilla.suse.com/show_bug.cgi?id=1214428
- NixOS/nixpkgs#253418
- https://bugs.archlinux.org/task/79439#comment221866

Upstream revert links:
- openSUSE/kernel-source@1b02b15
- NixOS/nixpkgs#255824

Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217802
Reported-and-bisected-by: Gene <geneslists@sapience.com>
Link: https://lore.kernel.org/lkml/30b69186-5a6e-4f53-b24c-2221926fc3b4@sapience.com/
Signed-off-by: Jade Lovelace <lists@jade.fyi>
@lf-
Copy link
Member Author

lf- commented Sep 18, 2023

Alright, I have sent off a patch into this very specific ether. You have been cc'd :)

Copy link
Member

@alyssais alyssais left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's hope upstream reverts soon. I'll give my fellow kernel maintainers a short final chance to object.

# Reverts the buggy commit causing https://bugzilla.kernel.org/show_bug.cgi?id=217802
dell_xps_regression = {
name = "dell_xps_regression";
patch = fetchpatch {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For next time, don't worry about this now unless you want to:

Suggested change
patch = fetchpatch {
patch = fetchurl {

You're fetching a static file, so there's no need for normalisation and stuff. (And it's best to avoid that where possible as what "normalised" means depends on a specific version of patchutils.)

Alternatively, if there aren't any conflicts, you could fetchpatch the original commit and pass revert = true;, which is nice and declarative.

@delroth delroth added 12.approvals: 1 This PR was reviewed and approved by one person. 12.approved-by: package-maintainer This PR was reviewed and approved by a maintainer listed in any of the changed packages. labels Sep 19, 2023
@alyssais alyssais merged commit 12650cd into NixOS:master Sep 22, 2023
@github-actions
Copy link
Contributor

@Ma27
Copy link
Member

Ma27 commented Sep 22, 2023

ftr I meant to leave a comment before merging, but given that OpenSUSE did the same and this fixes an actual case of a machine not booting up, I'm perfectly fine with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: kernel The Linux kernel 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches. 10.rebuild-linux: 1001-2500 This PR causes many rebuilds on Linux and should target the staging branches. 12.approvals: 1 This PR was reviewed and approved by one person. 12.approved-by: package-maintainer This PR was reviewed and approved by a maintainer listed in any of the changed packages.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Storage issues on XPS 15 9560 due to kernel regression; failing cryptsetup

4 participants