Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xenial images won't reboot if disk size is > 2TB when using GPT #3431

Closed
ubuntu-server-builder opened this issue May 11, 2023 · 18 comments
Closed
Labels
launchpad Migrated from Launchpad

Comments

@ubuntu-server-builder
Copy link
Collaborator

This bug was originally filed in Launchpad as LP: #1840686

Launchpad details
affected_projects = ['grub2 (Ubuntu)', 'grub2-signed (Ubuntu)', 'grub2 (Ubuntu Xenial)', 'grub2-signed (Ubuntu Xenial)']
assignee = None
assignee_name = None
date_closed = 2019-08-20T13:56:11.919481+00:00
date_created = 2019-08-19T15:58:31.863854+00:00
date_fix_committed = None
date_fix_released = None
id = 1840686
importance = undecided
is_complete = True
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1840686
milestone = None
owner = rcj
owner_name = Robert C Jennings
private = False
status = wont_fix
submitter = patviafore
submitter_name = Pat Viafore
tags = ['id-5d484a6466c79944a30e4644', 'sts', 'verification-done-xenial']
duplicates = []

Launchpad user Pat Viafore(patviafore) wrote on 2019-08-19T15:58:31.863854+00:00

[Impact]

On Xenial images which use GPT instead of MBR to enable efi based booting, there is an issue where after booting an instance that has a disk size of 2049 GB or higher, we hang on the next subsequent boot (Logs indicate it hanging on "Booting Hard Disk 0").

This is a problem in grub2 where the system would become unbootable after ext* online resize if no resize_inode was created at ext* format time.

[Test Case]

To reproduce:

  1. Create an image with a disk size of 3072 GB using a serial that has GPT:

gcloud compute instances create test-3072-xenial --image daily-ubuntu-1604-xenial-v20190731 --image-project ubuntu-os-cloud-devel --boot-disk-size 3072

  1. Reboot the instance

The instance will hang on reboot and you cannot connect. If you go to GCP console and select Logs > Serial port 1 (console), you will see the boot process has stopped at "Booting Hard Disk 0".

I have built a test package, which is available here:

https://launchpad.net/~mruffell/+archive/ubuntu/lp1840686-test

If you do step 1) but do not reboot, and instead add the PPA, install the new grub like so:

  1. gcloud compute instances create test-3072-xenial --image daily-ubuntu-1604-xenial-v20190731 --image-project ubuntu-os-cloud-devel --boot-disk-size 3072
  2. sudo add-apt-repository ppa:mruffell/lp1840686-test
  3. sudo apt-get update
  4. sudo apt remove grub-common grub-efi-amd64 grub-efi-amd64-bin grub-efi-amd64-signed grub-pc-bin grub2-common
  5. sudo apt install grub-common grub-efi-amd64 grub-efi-amd64-bin grub-pc-bin grub2-common
  6. sudo grub-install /dev/sda
  7. sudo reboot

The instance will boot successfully and you will be able to connect.

Note, we must use "daily-ubuntu-1604-xenial-v20190731" as the image, as it is enabled for GPT and efi. GCP was reverted back to MBR and bios booting because of this bug, so the latest images will not reproduce the problem.

[Regression Potential]

Grub is a core package and every care must be taken in order to not introduce any regressions.

The commit is present in B, D, E and F, and is considered well tested and widely adopted by the community.

The commit comes with its own testcase, to test the ext4_metabg fix.

The changes are localised to ext* based filesystems, although since they are the most popular family of filesystems used by the community, this does not reduce risk of breakage by much.

If a regression were to happen, a regression would have a large impact, and in the worst case, can lead to unbootable systems and data loss for users who are not technical enough to reinstall grub from a working package inside the broken system chroot.

[Other Info]

In comment #4, Sultan identifies the fix as:

commit e20aa39ea4298011ba716087713cff26c6c52006
Author: Vladimir Serbinenko phcoder@gmail.com
Date: Mon Feb 16 20:53:26 2015 +0100
Subject: ext2: Support META_BG.

This commit is from upstream grub2, and can be found here:

https://git.savannah.gnu.org/cgit/grub.git/commit/?id=e20aa39ea4298011ba716087713cff26c6c52006

Looking at when this was merged:

$ git describe --contains e20aa39ea4298011ba716087713cff26c6c52006
2.02-beta3~429

This commit is present in B, D, E and F, leaving X as the only version needing an SRU.

The commit cleanly cherry picks to X, because the delta from 2.02beta2-36ubuntu3.22 to 2.02-beta3429 is small.

@ubuntu-server-builder ubuntu-server-builder added the launchpad Migrated from Launchpad label May 11, 2023
@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Scott Moser(smoser) wrote on 2019-08-19T20:57:50.529808+00:00

Seems related or at least "close to" bug 1762748.
If nothing else, that bug has nice local recreate information.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Pat Viafore(patviafore) wrote on 2019-08-19T21:22:47.977788+00:00

At first I thought it was related to https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=1e19b01be31fc5264a84d246023ecf29e44949df&context=25&ignorews=0&dt=0, and later I thought it was related to https://bugs.launchpad.net/bugs/1762748. However, I added the bionic archives to my Xenial instance and updated e2fsprogs and cloud-utils, then tried to grow the disk past 2048, and ran into the reboot issue again.

I agree it was very close to, and the recreate information in that bug helped me narrow down what I was seeing.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Dan Watkins(oddbloke) wrote on 2019-08-20T13:56:42.567369+00:00

Based on IRC conversation, I don't believe this is a cloud-init bug. Please set the task back to New if I'm mistaken!

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Sultan Alsawaf(kerneltoast) wrote on 2019-10-10T19:32:30.969561+00:00

This is an old bug in GRUB. This commit fixes it: http://git.savannah.gnu.org/cgit/grub.git/commit/?id=e20aa39ea4298011ba716087713cff26c6c52006

To test it, apply it to a GRUB source tree, compile it, install it, and then reinstall the bootloader with "sudo grub-install /dev/sda".

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Brian Murray(brian-murray) wrote on 2019-10-16T11:01:30.070741+00:00

This is fixed in the version of grub2 in Eoan which will become Ubuntu 19.10.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Matthew Ruffell(mruffell) wrote on 2019-10-19T06:34:55.349337+00:00

I have built a test package for grub2, which is available here:
https://launchpad.net/~mruffell/+archive/ubuntu/lp1840686-test

It contains http://git.savannah.gnu.org/cgit/grub.git/commit/?id=e20aa39ea4298011ba716087713cff26c6c52006 mentioned in comment #4.

I did the following to test that the test package fixes the problem:

  1. gcloud compute instances create test-3072-xenial --image daily-ubuntu-1604-xenial-v20190731 --image-project ubuntu-os-cloud-devel --boot-disk-size 3072
  2. sudo add-apt-repository ppa:mruffell/lp1840686-test
  3. sudo apt-get update
  4. sudo apt remove grub-common grub-efi-amd64 grub-efi-amd64-bin grub-efi-amd64-signed grub-pc-bin grub2-common
  5. sudo apt install grub-common grub-efi-amd64 grub-efi-amd64-bin grub-pc-bin grub2-common
  6. sudo grub-install /dev/sda
  7. sudo reboot

The instance started like normal and you can connect to it.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Pat Viafore(patviafore) wrote on 2019-10-29T18:45:42.416069+00:00

I have re-ran my test cases and the package you provided fixes the original issue that we saw.

Thank you

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Matthew Ruffell(mruffell) wrote on 2019-10-29T21:03:22.319521+00:00

Thanks for testing Pat.

Attached is the debdiff for xenial to fix this bug.
Launchpad attachments: grub2 debdiff for xenial

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Eric Desrochers(slashd) wrote on 2019-10-30T12:04:27.818673+00:00

Sponsored for Xenial.

The package is now waiting for SRU approval in order to start building in xenial-proposed for the testing phase of the SRU.

Thanks for your contribution Matthew !

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Łukasz Zemczak(sil2100) wrote on 2019-11-04T14:45:14.815057+00:00

Hello Pat, or anyone else affected,

Accepted grub2 into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2/2.02~beta2-36ubuntu3.23 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Łukasz Zemczak(sil2100) wrote on 2019-11-04T14:47:44.258871+00:00

Since the change seems to be quite invasive, I would like to ask to perform some additional testing on non-affected, regular ext* systems before we promote this to -updates.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Eric Desrochers(slashd) wrote on 2019-11-05T00:58:10.355843+00:00

Uploaded grub2-signed "Rebuild against grub2 2.02~beta2-36ubuntu3.23. (LP: #1840686)"

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Adam Conrad(adconrad) wrote on 2019-11-05T06:49:28.395231+00:00

Hello Pat, or anyone else affected,

Accepted grub2-signed into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2-signed/1.66.23 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Matthew Ruffell(mruffell) wrote on 2019-11-06T23:06:17.850174+00:00

As per Łukasz's request, I have performed verification testing across a wide range of affected and unaffected systems, with the grub2-2.02~beta2-36ubuntu3.23 package in -proposed.

All tests were performed on GCP.

Test case one:

Summary: efi enabled image with 3072gb disk.
Disk: 3072gb
Bios/efi: efi
Image: daily-ubuntu-1604-xenial-v20190731, ubuntu-os-cloud-devel
Affected: Yes

Behaviour with grub2-2.02~beta2-36ubuntu3.22 from -updates:
Fails to reboot - as expected because of this bug.
Log: https://paste.ubuntu.com/p/KKH7r3vdrC/

Behaviour with grub2-2.02~beta2-36ubuntu3.23 from -proposed:
If you follow the instructions in the test section, the instance reboots successfully. Note, grub-install must be explicitly called. If you do not manually run grub-install, rebooting will fail.
Log: https://paste.ubuntu.com/p/7KxmDCjFSG/
Log (apt upgrade, no manual grub-install): https://paste.ubuntu.com/p/j3VK5PV7GR/

Test case two:

Summary: efi enabled image with 10gb disk.
Disk: 10gb
Bios/efi: efi
Image: daily-ubuntu-1604-xenial-v20190731, ubuntu-os-cloud-devel
Affected: No

Behaviour with grub2-2.02~beta2-36ubuntu3.22 from -updates:
Reboots successfully.

Behaviour with grub2-2.02~beta2-36ubuntu3.23 from -proposed:
Reboots successfully.

Logs: https://paste.ubuntu.com/p/c65jjYjj6S/
Note, grub-install was not invoked manually, and represents a typical user apt upgrade with no interaction.

Test case three:

Summary: bios enabled image with 10gb disk.
Disk: 10gb
Bios/efi: bios
Image: ubuntu-1604-xenial-v20191024
Affected: No

Behaviour with grub2-2.02~beta2-36ubuntu3.22 from -updates:
Reboots successfully.

Behaviour with grub2-2.02~beta2-36ubuntu3.23 from -proposed:
Reboots successfully.

Logs: https://paste.ubuntu.com/p/N3QFX63WCS/
Note, grub-install was not invoked manually, and represents a typical user apt upgrade with no interaction.

Test case four:

Summary: bios enabled image with 3072gb disk.
Disk: 3072gb
Bios/efi: bios
Image: ubuntu-1604-xenial-v20191024
Affected: No

Behaviour with grub2-2.02~beta2-36ubuntu3.22 from -updates:
Reboots successfully.

Behaviour with grub2-2.02~beta2-36ubuntu3.23 from -proposed:
Reboots successfully.

Logs: https://paste.ubuntu.com/p/KZw4kcD6pS/
Note, grub-install was not invoked manually, and represents a typical user apt upgrade with no interaction.
Log (apt upgrade, WITH manual grub-install): https://paste.ubuntu.com/p/RFBR5BbbtH/

Conclusion:

grub2-2.02~beta2-36ubuntu3.23 from -proposed fixes this bug. It does not introduce any regressions for non-affected use cases.

The only thing to note, is that for efi based images with disk > 2tb, if the user does not manually run grub-install after installing the package in -proposed then rebooting will fail. This is no better than the current situation of failing to reboot regardless, and because of this, there is unlikely to be any users out there who are running an image with disk > 2tb and have never rebooted ever, so it is unlikely this will be a problem.

This will however, fix all images and new instances using the fixed version of grub moving forward.

Taking this into consideration, I am happy to mark this as verified.

Pat, feel free to also test. I will also see if the customer is interested in testing as well.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Pat Viafore(patviafore) wrote on 2019-11-08T14:44:19.683797+00:00

I have run our tests and am satisfied with the results. Our tests very closely matches what mruffell posted.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Launchpad Janitor(janitor) wrote on 2019-11-11T16:00:13.385157+00:00

This bug was fixed in the package grub2 - 2.02~beta2-36ubuntu3.23


grub2 (2.02~beta2-36ubuntu3.23) xenial; urgency=medium

  • d/p/fix_booting_for_large_root_volumes.patch: Cherry pick upstream
    fix for booting on systems with large root volumes, either by default
    or from resizing. (LP: #1840686)

-- Matthew Ruffell matthew.ruffell@canonical.com Sat, 19 Oct 2019 17:47:16 +1300

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Łukasz Zemczak(sil2100) wrote on 2019-11-11T16:00:38.104213+00:00

The verification of the Stable Release Update for grub2 has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Launchpad Janitor(janitor) wrote on 2019-11-11T16:10:28.018217+00:00

This bug was fixed in the package grub2-signed - 1.66.23


grub2-signed (1.66.23) xenial; urgency=medium

  • Rebuild against grub2 2.02~beta2-36ubuntu3.23. (LP: #1840686)

-- Eric Desrochers eric.desrochers@canonical.com Tue, 05 Nov 2019 00:43:00 +0000

@ubuntu-server-builder ubuntu-server-builder closed this as not planned Won't fix, can't repro, duplicate, stale May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
launchpad Migrated from Launchpad
Projects
None yet
Development

No branches or pull requests

1 participant