Skip to content

Storage issues on XPS 15 9560 due to kernel regression; failing cryptsetup #253418

@lf-

Description

@lf-

There are several reports of storage issues all on this one laptop model here, so I am filing a bug: https://discourse.nixos.org/t/nvme-drive-not-detecting-after-calameres-initiates/32108/

Live posts of me doing debugging: https://matrix.to/#/!DBFhtjpqmJNENpLDOv:nixos.org/$9zWPOBSxYIou5aEGILjwnDFgEJLuIi7JTuUUo7iQkyw?via=nixos.org&via=matrix.org&via=tchncs.de

Results of testing on my machine yesterday:
Kernels I believe are ok: 6.4.9, 6.1.44
Kernels I believe are bad: 6.5.0, 6.5.1, 6.1.51, 6.1.49(?)

Debug log of a failing boot: https://gist.github.com/lf-/58fb6bfd13e4f3d09d8e2c39b279b46a

Describe the bug

Various storage-not-detected/io error symptoms on the XPS 15 9560, with the internal SSD. Seems to not matter massively what SSD it is, since there are people having it with the original SSD as far as I can tell.

Reproduced on this aftermarket drive (there's a firmware rev in here somewhere right?):

$ nvme id-ctrl /dev/nvme0
<snip>
mn        : WD Blue SN570 1TB
fr        : 234110WD
rab       : 4
ieee      : 001b44
cmic      : 0
mdts      : 7
cntlid    : 0
ver       : 0x10400
rtd3r     : 0x7a120
rtd3e     : 0xf4240
oaes      : 0x200
ctratt    : 0x2
rrls      : 0
cntrltype : 1
<snip>

Most relevant part of failed boot log:

Sep 04 11:56:38 localhost kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
Sep 04 11:56:38 localhost kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
Sep 04 11:56:38 localhost kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Sep 04 11:56:38 localhost kernel: nvme 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible
Sep 04 11:56:38 localhost kernel: nvme nvme0: Disabling device after reset failure: -19
Sep 04 11:56:38 localhost systemd-cryptsetup[169]: Device /dev/disk/by-uuid/b80aedf8-ddd4-46fa-8d09-5215d5f286b9 READ lock released.
Sep 04 11:56:38 localhost systemd-cryptsetup[169]: IO error while decrypting keyslot.
Sep 04 11:56:38 localhost systemd-cryptsetup[169]: Keyslot 0 (luks2) open failed with -5.

I have not yet tried the alleged workaround given in the message here. I might try it on NixCon hacking day when I don't need my computer to work.

Steps To Reproduce

  1. Use bad kernel
  2. ???
  3. Suffering

Expected behavior

Disk works, system boots.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

This is a regression between nixos-unstable revisions ce5e4a6 and 3efb0f6 which I have debugged to be kernel-version induced. systemd and cryptsetup versions are constant across both

Notify maintainers

@TredwellGit @Ma27 @NeQuissimus @alyssais @thoughtpolice

Metadata

 - system: `"x86_64-linux"`
 - host os: `Linux 6.4.9, NixOS, 23.11 (Tapir)
, 23.11.20230902.e569908`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.17.0`
 - nixpkgs: `/etc/nix/inputs/nixpkgs`

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions