Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPM not found on some hardware with kernel 6.4.11 #1555

Closed
hervst opened this issue Aug 28, 2023 · 20 comments
Closed

TPM not found on some hardware with kernel 6.4.11 #1555

hervst opened this issue Aug 28, 2023 · 20 comments
Labels

Comments

@hervst
Copy link

hervst commented Aug 28, 2023

Describe the bug

We have several bare metal installations of Fedora CoreOS on Intel NUCs running. Some of them use the "testing" stream. Therefor they upgraded from 38.20230806.2.0 to 38.20230819.2.0 recently. After the upgrade they fail decrypting the LUKS encrypted volume on boot. The error message is:

clevis-luks-askpass[XXX]: A TPM2 device with the in-kernel resource manager is needed!

When rolling back from 38.20230819.2.0 to 38.20230806.2.0 the system boots again without these issues.

Trying to reinstall Fedora CoreOS 38.20230819.2.0 on one of our base metal devices with LUKS enabled also fails. The same Ignition file works fine with older Fedora CoreOS versions.

Reproduction steps

  1. Install Fedora CoreOS bare metal testing stream using 38.20230806.2.0 or earlier with LUKS (TPM2) encryption enabled
  2. Upgrade system to 38.20230819.2.0

Expected behavior

System should decrypt boot volume on start-up without any issues.

Actual behavior

System fails to decrypt boot volume on start-up.

System details

  • Fedora CoreOs Bare Metal Testing Stream
  • Upgrade from 38.20230806.2.0 to 38.20230819.2.0
  • Device is an Intel NUC BXNUC10i5FNKN2

Butane or Ignition config

variant: fcos
version: 1.4.0

passwd:
  groups:
    - name: cockpit
      system: false
  users:
    - name: $USERNAME
      uid: 1001
      system: false
      ssh_authorized_keys:
        - ssh-ed25519 $KEY $KEY_NAME
      password_hash: $PASSWORD_HASH
      groups:
        - sudo
        - wheel

boot_device:
  luks:
    tpm2: true

Additional information

Log while booting with version 38.20230806.2.0: fcosBootBeforeUpdate.txt
Log while booting with version 38.20230819.2.0: fcosBootAfterUpdate.txt

Please be aware that I manually typed the display outputs in both log files so there might be small typos.

Log while trying to install Fedora CoreOS 38.20230819.2.0 testing stream with LUKS enabled: rdsosreport.txt

@dustymabe
Copy link
Member

First off I would like to say thank you for running testing. This is a great example of how you can help yourself and the community at the same time.

We have tests that do test luks encryption, but we rely heavily on Virtual Machines for those tests. I'm wondering if this is somehow specific to particular hardware (or even just VM versus real hardware specific).

Since I can't reproduce with our existing test suite could you help us try to figure out exactly which transition caused the regression? We have a testing-devel stream that is build daily (or sometimes multiple times a day). If you could walk that tree then we could find exactly which version introduced the regression and narrow down in on the problematic software sooner. You can find those builds here. If you don't want to do a full install you can rebase to the individual versions. For example to deploy an older version:

sudo rpm-ostree rebase fedora-compose:fedora/x86_64/coreos/testing-devel
sudo rpm-ostree deploy 38.20230806.20.0
sudo reboot 

@miabbott
Copy link
Member

miabbott commented Aug 28, 2023

Device is an Intel NUC BXNUC10i5FNKN2

The datasheet for that system appears to say that there is no TPM chip included... 🤔

https://www.intel.com/content/www/us/en/products/sku/214595/intel-nuc-10-performance-kit-nuc10i5fnkn/specifications.html

If there is no actual TPM2 device, then I don't think LUKS via TPM2 should ever work.

Might be worth confirming the presence (or absence) of a TPM chip on that system with:

$ dmesg | grep tpm
[    1.042976] tpm_tis IFX0785:00: 2.0 TPM (device-id 0x1B, rev-id 22)

@dustymabe
Copy link
Member

The datasheet for that system appears to say that there is no TPM chip included... 🤔

If that's true what's weird is that @hervst reports it is currently working and then stops (i.e. existing systems have it working and then it ceases to work on upgrade).

@hervst
Copy link
Author

hervst commented Aug 28, 2023

Thank you for the quick responses and help.

Some time ago I also read this specification and saw that TPM is not included according to this document. But when I looked into the BIOS I could disable the "Intel® Platform Trust Technology" (which I thought implements TPM2). Additionally we are using Fedora CoreOS with enabled LUKS -> TPM2 encryption for more than 1 year on these devices. Therefor I thought that this is just an issue in the specification.

Executing

$ dmesg | grep tpm

unfortunately returned no output, so it really seams, that a (physical) TPM2 chip is not included.

Additionally I tested every build of the testing-devel stream since "38.20230806.20.0". The last working version is "38.20230814.20.0". After deploying version "38.20230814.20.1" it stopped working.

@hervst
Copy link
Author

hervst commented Aug 28, 2023

Maybe the problem is related to this: https://bugzilla.redhat.com/show_bug.cgi?id=2232888?

@dustymabe
Copy link
Member

Additionally we are using Fedora CoreOS with enabled LUKS -> TPM2 encryption for more than 1 year on these devices. Therefor I thought that this is just an issue in the specification.

Yeah that is interesting.. I honestly don't know enough about the LUKs+TPM bound encryption to say if this should or should not have been working.

Executing

$ dmesg | grep tpm

unfortunately returned no output, so it really seams, that a TPM2 chip is not included.

Good to know.

Additionally I tested every build of the testing-devel stream since "38.20230806.20.0". The last working version is "38.20230814.20.0". After deploying version "38.20230814.20.1" it stopped working.

Thanks for doing that. This points pretty clearly at:

Upgraded:

    kernel 6.4.10-200.fc38.x86_64 → 6.4.11-200.fc38.x86_64
    kernel-core 6.4.10-200.fc38.x86_64 → 6.4.11-200.fc38.x86_64
    kernel-modules 6.4.10-200.fc38.x86_64 → 6.4.11-200.fc38.x86_64
    kernel-modules-core 6.4.10-200.fc38.x86_64 → 6.4.11-200.fc38.x86_64

Must be some sort of kernel regression. Could you try latest rawhide to see if the kernel-6.5.0-57.fc40 kernel still shows the
problem?

Of course all of these recommendations here are for debugging. We're asking you to run development versions, which could cause instability on the systems. Try to run it on systems where backups of critical data have been made and it would be OK if they needed to be reprovisioned.

@dustymabe
Copy link
Member

Maybe the problem is related to this: bugzilla.redhat.com/show_bug.cgi?id=2232888?

Looks promising!

@dustymabe
Copy link
Member

According to this page "Intel® Platform Trust Technology" is like a poor man's version of TPM. The page is for Dell server's, but I imagine the same applies here:

Some Dell systems do not ship with a TPM(Trusted Platform Module) module, and instead, use PTT (Platform Trust Technology). PTT is a lower-cost solution that supports the same functions of the TPM. From an OS perspective, there is very little difference between how TPM and PTT interact with bit locker.

@dustymabe
Copy link
Member

@hervst - can you add your information to bz#2232888?

@hervst
Copy link
Author

hervst commented Aug 28, 2023

Must be some sort of kernel regression. Could you try latest rawhide to see if the kernel-6.5.0-57.fc40 kernel still shows the
problem?

I tried the latest "rawhide" version (40.20230828.91.0) but unfortunately the problem remains. For all my tests I used a separate and newly installed system, so data loss is not an issue.

I will create an account and add some information including a reference to bz#2232888 tomorrow in the morning (in around 12 hours).

Thank you again for your help.

@dustymabe dustymabe added the meeting topics for meetings label Aug 29, 2023
@dustymabe
Copy link
Member

We discussed this in the community meeting today.

13:27:16* dustymabe | #agreed We won't revert the kernel in our testing/next streams but
                    | we will be careful and not introduce this regression to our stable
                    | stream for the next round of releases. We will monitor the upstream
                    | discussion surrounding the bug and try to deliver a fixed kernel in
                    | the next set of testing/next releases.

For people already affected by this and you need your testing or next systems to continue to run you can rebase to the stable stream to ensure your systems continue to boot for now.

@dustymabe
Copy link
Member

Another BZ about the same issue: BZ#2235100

@dustymabe
Copy link
Member

Proposed fix in kernel-6.4.15-200.fc38

@dustymabe dustymabe changed the title LUKS Encryption fails after upgrade to 38.20230819.2.0 testing stream on bare metal installation TPM not found on some hardware with kernel 6.4.11 Sep 7, 2023
@dustymabe dustymabe added status/pending-testing-release Fixed upstream. Waiting on a testing release. status/pending-next-release Fixed upstream. Waiting on a next release. labels Sep 7, 2023
@dustymabe
Copy link
Member

@hervst can you test with the latest testing-devel to see if this issue is resolved?

@hervst
Copy link
Author

hervst commented Sep 8, 2023

The latest version of the testing-devel stream with kernel version Linux 6.4.15-200.fc38.x86_64 works fine. Should I close the issue or will you close it as soon as the new kernel version is available on the testing stream?

Thank you for your work and the kernel freeze on stable stream.

@dustymabe
Copy link
Member

The latest version of the testing-devel stream with kernel version Linux 6.4.15-200.fc38.x86_64 works fine.

Awesome! Any chance you could add positive feedback to the Bodhi update in Fedora to help promote the package to stable in the main Fedora repositories?

Should I close the issue or will you close it as soon as the new kernel version is available on the testing stream?

We'll close out the issue when it reaches the testing and next streams. Thanks for the offer, though.

Thank you for your work and the kernel freeze on stable stream.

No problem. Having more engagement from people like you will help us stay stable for everyone. Running the testing and next streams is one good way (thank you). Another is that we have a test day/week twice a year to help some more dedicated testing. The next one is coming up in a few weeks, if you'd like to join.

@hervst
Copy link
Author

hervst commented Sep 11, 2023

Any chance you could add positive feedback to the Bodhi update in Fedora to help promote the package to stable in the main Fedora repositories?

Positive feedback is added.

Another is that we have a test day/week twice a year to help some more dedicated testing. The next one is #1565, if you'd like to join.

Unfortunately joining the upcoming test week is not possible for me. But I will check if we can run and monitor an additional device using the next stream with daily updates and some containers running so we can provide feedback in case we detect an issue.

@dustymabe
Copy link
Member

The fix for this went into next stream release 38.20230902.1.1. Please try out the new release and report issues.

@dustymabe
Copy link
Member

The fix for this went into testing stream release 38.20230902.2.1. Please try out the new release and report issues.

@dustymabe dustymabe removed status/pending-testing-release Fixed upstream. Waiting on a testing release. status/pending-next-release Fixed upstream. Waiting on a next release. labels Sep 11, 2023
@dustymabe
Copy link
Member

This issue never affected our stable stream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants