Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cc_grub_dpkg updates grub-pc or grub-efi debconf keys, but both can become incorrect on BIOS-booted Azure Ubuntu #4091

Open
ubuntu-server-builder opened this issue May 12, 2023 · 6 comments
Labels
incomplete Action required by submitter launchpad Migrated from Launchpad

Comments

@ubuntu-server-builder
Copy link
Collaborator

This bug was originally filed in Launchpad as LP: #2013419

Launchpad details
affected_projects = []
assignee = None
assignee_name = None
date_closed = None
date_created = 2023-03-31T01:52:18.234876+00:00
date_fix_committed = 2023-04-05T21:35:36.231101+00:00
date_fix_released = 2023-04-05T21:35:36.231101+00:00
id = 2013419
importance = undecided
is_complete = False
lp_url = https://bugs.launchpad.net/cloud-init/+bug/2013419
milestone = None
owner = r-asiebert
owner_name = Adrien Siebert
private = False
status = incomplete
submitter = r-asiebert
submitter_name = Adrien Siebert
tags = []
duplicates = []

Launchpad user Adrien Siebert(r-asiebert) wrote on 2023-03-31T01:52:18.234876+00:00

Platform: Azure (generation 1 VMs)
Image used: Ubuntu Server, SKU 20_04-lts (gen 1)
(cloudinit 22.4.2-0ubuntu0~20.04.2 - no cloudinit customization)

Azure generation 1 VMs boot in BIOS mode. Ubuntu comes with both BIOS and UEFI support installed, and cloudinit updates some debconf keys presumably to avoid mismatches when boot packages get updated on new machines:
https://github.com/canonical/cloud-init/blob/ubuntu/22.4.2-0ubuntu0_20.04.2/cloudinit/config/cc_grub_dpkg.py#L148-L149

Even when booted in BIOS mode, updating EFI packages (e.g. grub-efi-amd64-signed or shim-signed) will cause the debconf grub-efi/install_devices keys to be updated.
If a discrepancy occurs on the disk ID where GRUB is installed (one scenario below), cloudinit only updates the grub-pc debconf keys (link above). The mismatched grub-efi key can cause further EFI package upgrades to fail, requiring a user with a shell to validate a prompt for dpkg configuration.

[scenario]
Sample scenario where we encountered this issue, using Packer to build a custom VM image:

  • Packer creates a BIOS VM from the base Ubuntu 20.04 image (gen 1).
  • cloudinit updates the grub-pc key:
2023-02-26 08:40:19,507 - cc_grub_dpkg.py[DEBUG]: Setting grub debconf-set-selections with '/dev/disk/by-id/scsi-14d534654202020204f19e7ec574d624f9e27ff405f501bc0','false'
  • Packages get upgraded. Upgrades to EFI packages results in "Installing grub to /boot/efi" (dpkg logs) and debconf grub-efi/install_devices keys to be set, pointing at the Packer VM disk.
  • Customized VM gets saved by Packer as an image.

...

  • Later, we spin up gen 1 (BIOS) VMs from that image. Its root disk has its own serial ID.
    (GRUB partition = scsi-14d53465420202020da118904a05ed740b387a530ae506ac2-part15)
  • cloudinit updates the grub-pc key:
2023-03-07 00:25:44,780 - cc_grub_dpkg.py[DEBUG]: Setting grub debconf-set-selections with '/dev/disk/by-id/scsi-14d534654202020200e6290db5a56ef43ba1f16eef596d653','false'
  • Later, a headless apt upgrade breaks:
Setting up shim-signed (1.40.9+15.7-0ubuntu1) ...
mount: /var/lib/grub/esp: special device /dev/disk/by-id/scsi-14d534654202020204f19e7ec574d624f9e27ff405f501bc0-part15 does not exist.
# debconf-show grub-pc | egrep "grub-efi/install_devices:|grub-pc/install_devices:"
* grub-efi/install_devices: /dev/disk/by-id/scsi-14d534654202020204f19e7ec574d624f9e27ff405f501bc0-part15
* grub-pc/install_devices: /dev/disk/by-id/scsi-14d534654202020200e6290db5a56ef43ba1f16eef596d653

[/scenario]

In this situation, when running apt upgrade updating an EFI package (or dpkg --configure -a once broken) in a shell, a user can manually validate this prompt:

┌───────────────────────────────────┤ Configuring shim-signed ├────────────────────────────────────┐
│ The GRUB boot loader was previously installed to a disk that is no longer present, or whose      |
| unique identifier has changed for some reason. It is important to make sure that the installed   |
| GRUB core image stays in sync with GRUB modules and grub.cfg. Please check again to make sure    |
| that GRUB is written to the appropriate boot devices.                                            |
│                                                                                                  |
│ GRUB install devices:                                                                            |
│                                                                                                  |
│    [*] /dev/sda15 (111 MB; /boot/efi) on 32213 MB Virtual_Disk                                   |
│                                                                                                  |
│                                                                                                  │
│                                              <Ok>                                                │
│                                                                                                  │
└──────────────────────────────────────────────────────────────────────────────────────────────────┘

Accepting this prompt appears to update/fix the grub-efi debconf key.
In my testing, DEBIAN_FRONTEND=noninteractive disables the prompt but it instead blows up with the aforementioned mount: /var/lib/grub/esp... error, which may be related to https://bugs.launchpad.net/ubuntu/+source/shim-signed/+bug/1940723 --- a bug suspiciously related to Azure VMs.

Reminder: this is all on BIOS-booted VMs, as far as I know UEFI boot is never involved here.

This bug is a follow-up to a quick discussion on 2fd24cc
Support for EFI-booted machines to update grub debconf was introduced in that recent change, although based on the boot mode: if EFI-booted, update debconf grub-efi, otherwise update grub-pc.

This unfortunately doesn't solve the case above, where BIOS machines have EFI configured and an intermediate/customized image is used.

My uneducated guess is that we may want cloudinit to update either/both debconf keys if BIOS and/or EFI support is installed, instead of checking the current boot mode (= presence of /sys/firmware/efi).
I do not know how to detect this. Presence of a grub-efi* package? Presence of /boot/efi?

@ubuntu-server-builder ubuntu-server-builder added incomplete Action required by submitter launchpad Migrated from Launchpad labels May 12, 2023
@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Adrien Siebert(r-asiebert) wrote on 2023-03-31T01:54:10.104995+00:00

(Additional notes)

Disclaimer: I've discovered these cloud-init details while troubleshooting this issue involving Azure, Ubuntu, GRUB, EFI, and Packer. My knowledge of cloud-init and grub EFI support is fresh and very limited.

This seems like a cloud-init issue given that cc_grub_dpkg.py exists in the first place to patch the grub-pc debconf key in cloud environments.
On the flip side, the GRUB prompt and shim-signed bug linked above could mean a fix should be elsewhere, maybe even in the Azure "gen 1" Ubuntu image. grub-efi* and shim-signed are marked as 'essential' packages in APT.

Apologies I don't have the full cloud-init logs, I may need to setup a test environment to collect them.

This issue may have existed for months. We only detected it after OS disks on new Azure VMs started to receive new serial IDs, a change I was unable to trace. Other factors triggered this for us recently.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Brett Holman(holmanb) wrote on 2023-03-31T21:33:31.033023+00:00

Thanks for reporting this bug Adrien. I'm still working to understand what is happening here - in the meantime, I'm curious if, as a workaround, disabling this module might resolve your issue.

The following provided to your instance on first boot would disable this module:

#cloud-config
grub_dpkg:
enabled: false

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Chad Smith(chad.smith) wrote on 2023-04-05T21:46:43.139595+00:00

Thanks for the bug links to both related bug discussions and prior cloud-init commits in this space. I've marked incomplete as bug status in hopes of feedback from Adrien on either:

Please do set this bug back to 'New' status above once you are able to provide feedback on either of those options to bump this back on our radar for triage review and debugging further

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Adrien Siebert(r-asiebert) wrote on 2023-04-10T18:05:12.737746+00:00

@brett, grub_dpkg is behaving properly so I wouldn't disable it.
Arguably, it doesn't do 'enough' to fix such situations, as it only fixes the grub-pc (BIOS) or grub-efi debconf keys (only grub-pc before #2029), and somehow in our case both the BIOS and EFI keys need an update.

I've got some time today and may try to get a clean VM with the issue to collect logs

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Adrien Siebert(r-asiebert) wrote on 2023-04-10T19:32:23.138478+00:00

I've reproduced this on Azure with an updated image, Ubuntu Jammy / 22.04 with cloud-init 23.1.1-0ubuntu0~22.04.1

This should be a much easier example to follow along -- the cloud-init log TARs are included for the two VMs involved.
Summary of steps done in the Azure Portal:

  • First VM created from Jammy image at 2023-04-10 18:19
  • Simulated a shim-signed package upgrade -- grub-efi debconf gets set.
  • Captured this VM as an image
  • Second VM created from custom image at 2023-04-10 18:34
  • Simulated shim-signed upgrade -- breaks on grub-efi disk mismatch.

Notes about the TAR archives:

  • The logs from the second VM contain the logs from the first VM that were packaged in the image.
  • Some Azure metadata present in cloud-init data (e.g. Azure subscription) are redacted.
  • Logs may contain additional dpkg/apt activity from me poking around.

Accompanying notes and logs:

The first VM was created from image "Ubuntu Server 22.04 LTS - Gen1" (publisher/offer/sku: canonical / 0001-com-ubuntu-server-jammy / 22_04-lts) with default settings ("Premium SSD" OS disk) except for restricted networking.

(2023-04-10 18:19, adrien-test-cloud-init-imaging)

root@adrien-test-cloud-init-imaging:/home/azureuser# ls -la /dev/disk/by-id/ | grep part15; debconf-show grub-pc | egrep "grub-efi/install_devices:|grub-pc/install_devices:"; grep "Setting grub debconf-set-selections" /var/log/cloud-init.log
lrwxrwxrwx 1 root root  11 Apr 10 18:19 scsi-14d53465420202020e0488e713b6c5e4286129c7255dd721f-part15 -> ../../sdb15
lrwxrwxrwx 1 root root  11 Apr 10 18:19 scsi-360022480e0488e713b6c9c7255dd721f-part15 -> ../../sdb15
lrwxrwxrwx 1 root root  11 Apr 10 18:19 wwn-0x60022480e0488e713b6c9c7255dd721f-part15 -> ../../sdb15
* grub-pc/install_devices: /dev/disk/by-id/scsi-14d53465420202020e0488e713b6c5e4286129c7255dd721f
  grub-efi/install_devices:
2023-04-10 18:20:03,717 - cc_grub_dpkg.py[DEBUG]: Setting grub debconf-set-selections with '/dev/disk/by-id/scsi-14d53465420202020e0488e713b6c5e4286129c7255dd721f','false'

Nominal: grub-efi debconf not present yet, grub-pc from cloud-init.

Simulated our Packer installing a shim-signed upgrade (headless apt upgrade), which sets the grub-efi debconf

root@adrien-test-cloud-init-imaging:/home/azureuser# DEBIAN_FRONTEND=noninteractive dpkg-reconfigure shim-signed
Trying to migrate /boot/efi into esp config
Installing grub to /boot/efi.
Installing for x86_64-efi platform.
grub-install: warning: EFI variables cannot be set on this system.
grub-install: warning: You will have to complete the GRUB setup manually.
Installation finished. No error reported.

root@adrien-test-cloud-init-imaging:/home/azureuser# debconf-show grub-pc | egrep "grub-efi/install_devices:|grub-pc/install_devices:"
* grub-pc/install_devices: /dev/disk/by-id/scsi-14d53465420202020e0488e713b6c5e4286129c7255dd721f
* grub-efi/install_devices: /dev/disk/by-id/scsi-14d53465420202020e0488e713b6c5e4286129c7255dd721f-part15

Captured the VM image as adrien-test-cloud-init-imaging-image-20230410112812, created second VM from it

(2023-04-10 18:34, adrien-test-cloud-init-from-custom-image)

root@adrien-test-cloud-init-from-custom-image:/home/azureuser# ls -la /dev/disk/by-id/ | grep part15; debconf-show grub-pc | egrep "grub-efi/install_devices:|grub-pc/install_devices:"; grep "Setting grub debconf-set-selections" /var/log/cloud-init.log
lrwxrwxrwx 1 root root  11 Apr 10 18:34 scsi-14d53465420202020a6911a0eca07bb42a5ed90f05ba17f86-part15 -> ../../sdb15
lrwxrwxrwx 1 root root  11 Apr 10 18:34 scsi-360022480a6911a0eca0790f05ba17f86-part15 -> ../../sdb15
lrwxrwxrwx 1 root root  11 Apr 10 18:34 wwn-0x60022480a6911a0eca0790f05ba17f86-part15 -> ../../sdb15
* grub-efi/install_devices: /dev/disk/by-id/scsi-14d53465420202020e0488e713b6c5e4286129c7255dd721f-part15
* grub-pc/install_devices: /dev/disk/by-id/scsi-14d53465420202020a6911a0eca07bb42a5ed90f05ba17f86
2023-04-10 18:20:03,717 - cc_grub_dpkg.py[DEBUG]: Setting grub debconf-set-selections with '/dev/disk/by-id/scsi-14d53465420202020e0488e713b6c5e4286129c7255dd721f','false'
2023-04-10 18:35:01,155 - cc_grub_dpkg.py[DEBUG]: Setting grub debconf-set-selections with '/dev/disk/by-id/scsi-14d53465420202020a6911a0eca07bb42a5ed90f05ba17f86','false'

We can see the OS disk has a different serial. grub-pc debconf was updated by cloud-init, grub-efi is unchanged from the initial VM image.

An headless grub EFI upgrade blows up:

root@adrien-test-cloud-init-from-custom-image:/home/azureuser# DEBIAN_FRONTEND=noninteractive dpkg-reconfigure shim-signed
mount: /var/lib/grub/esp: special device /dev/disk/by-id/scsi-14d53465420202020e0488e713b6c5e4286129c7255dd721f-part15 does not exist.

apt/dpkg is broken at this point due to the misconfigured package, until manual resolution.

Launchpad attachments: cloud-init-logs-2023-04-10-two-vms.tar.gz

@r-asiebert
Copy link

If tracking has moved to GitHub, here's my user as bug reporter.

The last comment was hard to parse without formatting; updating for GitHub:

(Launchpad attachments: cloud-init-logs-2023-04-10-two-vms.tar.gz)
I've reproduced this on Azure with an updated image, Ubuntu Jammy / 22.04 with cloud-init 23.1.1-0ubuntu0~22.04.1

This should be a much easier example to follow along -- the cloud-init log TARs are included for the two VMs involved.
Summary of steps done in the Azure Portal:

  • First VM created from Jammy image at 2023-04-10 18:19
  • Simulated a shim-signed package upgrade -- grub-efi debconf gets set.
  • Captured this VM as an image
  • Second VM created from custom image at 2023-04-10 18:34
  • Simulated shim-signed upgrade -- breaks on grub-efi disk mismatch.

Notes about the TAR archives:

  • The logs from the second VM contain the logs from the first VM that were packaged in the image.
  • Some Azure metadata present in cloud-init data (e.g. Azure subscription) are redacted.
  • Logs may contain additional dpkg/apt activity from me poking around.

Accompanying notes and logs:

The first VM was created from image "Ubuntu Server 22.04 LTS - Gen1" (publisher/offer/sku: canonical / 0001-com-ubuntu-server-jammy / 22_04-lts) with default settings ("Premium SSD" OS disk) except for restricted networking.
(2023-04-10 18:19, adrien-test-cloud-init-imaging)

root@adrien-test-cloud-init-imaging:/home/azureuser# ls -la /dev/disk/by-id/ | grep part15; debconf-show grub-pc | egrep "grub-efi/install_devices:|grub-pc/install_devices:"; grep "Setting grub debconf-set-selections" /var/log/cloud-init.log
lrwxrwxrwx 1 root root 11 Apr 10 18:19 scsi-14d53465420202020e0488e713b6c5e4286129c7255dd721f-part15 -> ../../sdb15
lrwxrwxrwx 1 root root 11 Apr 10 18:19 scsi-360022480e0488e713b6c9c7255dd721f-part15 -> ../../sdb15
lrwxrwxrwx 1 root root 11 Apr 10 18:19 wwn-0x60022480e0488e713b6c9c7255dd721f-part15 -> ../../sdb15
* grub-pc/install_devices: /dev/disk/by-id/scsi-14d53465420202020e0488e713b6c5e4286129c7255dd721f
  grub-efi/install_devices:
2023-04-10 18:20:03,717 - cc_grub_dpkg.py[DEBUG]: Setting grub debconf-set-selections with '/dev/disk/by-id/scsi-14d53465420202020e0488e713b6c5e4286129c7255dd721f','false'

Nominal: grub-efi debconf not present yet, grub-pc from cloud-init.

Simulated our Packer installing a shim-signed upgrade (headless apt upgrade), which sets the grub-efi debconf

root@adrien-test-cloud-init-imaging:/home/azureuser# DEBIAN_FRONTEND=noninteractive dpkg-reconfigure shim-signed
Trying to migrate /boot/efi into esp config
Installing grub to /boot/efi.
Installing for x86_64-efi platform.
grub-install: warning: EFI variables cannot be set on this system.
grub-install: warning: You will have to complete the GRUB setup manually.
Installation finished. No error reported.

root@adrien-test-cloud-init-imaging:/home/azureuser# debconf-show grub-pc | egrep "grub-efi/install_devices:|grub-pc/install_devices:"
* grub-pc/install_devices: /dev/disk/by-id/scsi-14d53465420202020e0488e713b6c5e4286129c7255dd721f
* grub-efi/install_devices: /dev/disk/by-id/scsi-14d53465420202020e0488e713b6c5e4286129c7255dd721f-part15

Captured the VM image as adrien-test-cloud-init-imaging-image-20230410112812, created second VM from it
(2023-04-10 18:34, adrien-test-cloud-init-from-custom-image)

root@adrien-test-cloud-init-from-custom-image:/home/azureuser# ls -la /dev/disk/by-id/ | grep part15; debconf-show grub-pc | egrep "grub-efi/install_devices:|grub-pc/install_devices:"; grep "Setting grub debconf-set-selections" /var/log/cloud-init.log
lrwxrwxrwx 1 root root 11 Apr 10 18:34 scsi-14d53465420202020a6911a0eca07bb42a5ed90f05ba17f86-part15 -> ../../sdb15
lrwxrwxrwx 1 root root 11 Apr 10 18:34 scsi-360022480a6911a0eca0790f05ba17f86-part15 -> ../../sdb15
lrwxrwxrwx 1 root root 11 Apr 10 18:34 wwn-0x60022480a6911a0eca0790f05ba17f86-part15 -> ../../sdb15
* grub-efi/install_devices: /dev/disk/by-id/scsi-14d53465420202020e0488e713b6c5e4286129c7255dd721f-part15
* grub-pc/install_devices: /dev/disk/by-id/scsi-14d53465420202020a6911a0eca07bb42a5ed90f05ba17f86
2023-04-10 18:20:03,717 - cc_grub_dpkg.py[DEBUG]: Setting grub debconf-set-selections with '/dev/disk/by-id/scsi-14d53465420202020e0488e713b6c5e4286129c7255dd721f','false'
2023-04-10 18:35:01,155 - cc_grub_dpkg.py[DEBUG]: Setting grub debconf-set-selections with '/dev/disk/by-id/scsi-14d53465420202020a6911a0eca07bb42a5ed90f05ba17f86','false'

We can see the OS disk has a different serial. grub-pc debconf was updated by cloud-init, grub-efi is unchanged from the initial VM image.

An headless grub EFI upgrade blows up:

root@adrien-test-cloud-init-from-custom-image:/home/azureuser# DEBIAN_FRONTEND=noninteractive dpkg-reconfigure shim-signed
mount: /var/lib/grub/esp: special device /dev/disk/by-id/scsi-14d53465420202020e0488e713b6c5e4286129c7255dd721f-part15 does not exist.

apt/dpkg is broken at this point due to the misconfigured package, until manual resolution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
incomplete Action required by submitter launchpad Migrated from Launchpad
Projects
None yet
Development

No branches or pull requests

2 participants