-
Notifications
You must be signed in to change notification settings - Fork 843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fstab entries written by cloud-config may not be mounted #2892
Comments
Launchpad user Scott Moser(smoser) wrote on 2017-05-17T14:35:56.773714+00:00 These tarballs are collected with 'save-old-data' at They represent: Launchpad attachments: orig-boot.tar.xz |
Launchpad user Scott Moser(smoser) wrote on 2017-05-17T14:36:31.886469+00:00 |
Launchpad user Scott Moser(smoser) wrote on 2017-05-17T14:37:18.877151+00:00 |
Launchpad user Scott Moser(smoser) wrote on 2017-05-17T14:42:54.358658+00:00 Launchpad attachments: upgrade-first-reboot.tar.xz |
Launchpad user Scott Moser(smoser) wrote on 2017-05-17T14:43:07.246888+00:00 Launchpad attachments: after-restart.tar.xz |
Launchpad user Scott Moser(smoser) wrote on 2017-05-17T14:43:18.861899+00:00 Launchpad attachments: after-restart-with-fsck.tar.xz |
Launchpad user Scott Moser(smoser) wrote on 2017-05-19T18:36:23.749601+00:00 It seems that in addition to blocking fsck, we should also block swap usage. This means that we're kind of limited to either
|
Launchpad user Scott Moser(smoser) wrote on 2017-05-19T18:38:11.262026+00:00 Dimitri, Do you know how I can limit swap usage until after cloud-init.service is done? I'm open to other ideas too. |
Launchpad user Balint Reczey(rbalint) wrote on 2017-06-19T16:32:28.975829+00:00 I tried finding other options, but to work around /etc/fstab containing potentially invalid swap partition the only options seems to be calling "swapoff -a" and then later "swapon -a" from cloud-init when it detects that a partition re-initialization needs to take place. The same stands for systemd-fsckd.service. IMO it should be stopped for the time reformatting takes place instead of adding the drop-in which would potentially slow down boot even when this workaround is not needed. |
Launchpad user Scott Moser(smoser) wrote on 2017-06-21T18:22:59.008357+00:00 Balint, Thanks for the reply. With regard to slowing down boot, I'm not too concerned about that. Because in almost all properly functioning scenarios, cloud-init's generator will enable or disable cloud-init. So the slow down would be limited to scenarios where cloud-init was supposed to run, primarily on non-first boots of an instance. I agree though, it does put a bottleneck in boot. With reard to 'swapoff -a' or 'swapon -a' or the systemd-fsck.service equivalent, I'm not opposed to that, but I don't know how it could be made to be non-racey. Do you have a solution in mind that doesn't have a race in it? Ie, for swap:
while systemd in parallel
This can be mitigated some by being more granular (swapoff /dev/XXX), but still racy unless cloud-init can coordinate that with systemd. Is that possible? Thanks again for the input. |
Launchpad user Balint Reczey(rbalint) wrote on 2017-06-23T22:48:26.408976+00:00 I filed a merge request to limit the fsck delay to Azure, please take a look at it. Regarding the swap I think the least hack-ish safe solution would be relying on systemd-fstab-generator to create the .swap units as usual, and instead of running swapoff/swapon cloud init could find all .swap units and stop them for the time it does things. That would avoid the race because the generator runs early, before the units, and stopping .swap units is done by systemd. |
Launchpad user Launchpad Janitor(janitor) wrote on 2017-07-31T14:37:05.899465+00:00 This bug was fixed in the package cloud-init - 0.7.9-231-g80bf98b9-0ubuntu1 cloud-init (0.7.9-231-g80bf98b9-0ubuntu1) artful; urgency=medium
-- Scott Moser smoser@ubuntu.com Mon, 31 Jul 2017 09:47:34 -0400 |
Launchpad user Chris J Arges(arges) wrote on 2017-08-23T12:27:59.297447+00:00 Hello Scott, or anyone else affected, Accepted cloud-init into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-233-ge586fe35-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance! |
Launchpad user Chris J Arges(arges) wrote on 2017-08-23T12:31:23.456563+00:00 Hello Scott, or anyone else affected, Accepted cloud-init into zesty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-233-ge586fe35-0ubuntu1~17.04.1 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-zesty to verification-done-zesty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-zesty. In either case, details of your testing will help us make a better decision. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance! |
Launchpad user Chad Smith(chad.smith) wrote on 2017-09-12T22:02:42.384457+00:00 Validated across multiple (5) 'clean' reboots that Azure vms don't hit the race condition with mounts and don't result in cloud-init errors. ubuntu@xen1: |
Launchpad user Chad Smith(chad.smith) wrote on 2017-09-12T22:20:55.710927+00:00 Zesty verification: Saw initial failure before upgradeubuntu@zesty1: Saw 5 successes across reprovisions after upgradeubuntu@zesty1: |
Launchpad user Launchpad Janitor(janitor) wrote on 2017-09-13T01:26:05.837714+00:00 This bug was fixed in the package cloud-init - 0.7.9-233-ge586fe35-0ubuntu1~16.04.1 cloud-init (0.7.9-233-ge586fe35-0ubuntu1~16.04.1) xenial-proposed; urgency=medium
-- Scott Moser smoser@ubuntu.com Mon, 31 Jul 2017 16:36:16 -0400 |
Launchpad user Chris Halse Rogers(raof) wrote on 2017-09-13T01:26:37.254138+00:00 The verification of the Stable Release Update for cloud-init has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions. |
Launchpad user Launchpad Janitor(janitor) wrote on 2017-09-13T01:27:27.937540+00:00 This bug was fixed in the package cloud-init - 0.7.9-233-ge586fe35-0ubuntu1~17.04.1 cloud-init (0.7.9-233-ge586fe35-0ubuntu1~17.04.1) zesty; urgency=medium
-- Scott Moser smoser@ubuntu.com Mon, 31 Jul 2017 16:33:24 -0400 |
Launchpad user thermoman(thermoman) wrote on 2017-09-15T09:57:01.003524+00:00 This release broke a lot of my machines, generating ordering cycles on every machine. Please see #1717477 |
Launchpad user Scott Moser(smoser) wrote on 2017-09-15T18:36:54.000612+00:00 Not sure what to do here. |
Launchpad user Ryan Harper(raharper) wrote on 2017-09-15T21:34:26.400732+00:00 As far as I can tell, I don't think we can "delay" the fsck service due to how the systemd-fstab-generator works on /etc/fstab entries For entries with a no-zero value for fsck (6th column), then the generator will write out a .mount file that looks like this: ubuntu@ubuntu:/run/systemd/generator$ cat btrfs.mount Automatically generated by systemd-fstab-generator[Unit] [Mount] This will want to run fsck on the device, and then mount it, and all before local-fs.target cloud-init cannot run until after local-fs.target is reached. Asking fsck service to run later is always going to be in-conflict with fsck+mount from the generator. I'm not sure we can reliably interrupt these services; the .mount unit is going to require a fsck; if we stop the fsck, then the mount won't happen. This is going to require some more thought and discussion. |
Launchpad user Scott Moser(smoser) wrote on 2017-09-23T02:32:44.341069+00:00 This bug is believed to be fixed in cloud-init in 17.1. If this is still a problem for you, please make a comment and set the state back to New Thank you. |
This bug was originally filed in Launchpad as LP: #1691489
Launchpad details
Launchpad user Scott Moser(smoser) wrote on 2017-05-17T14:21:07.391494+00:00
=== Begin SRU Template ===
[Impact]
There is a race condition on a re-deployment of cloud-init on Azure
where /mnt will not get properly formatted or mounted. This is due to
"dirty" entries in /etc/fstab that cause a device to be busy when
cloud-init goes to format it. This shows itself usually as 'mkfs'
complaining that the device is busy. The cause is that systemd
starts an fsck and collides with cloud-init re-formatting the disk.
The problem can be seen other places but seemed to be most reproducible
and originally found on Azure.
[Test Case]
1.) Launch a Azure vm, ideally size L32S.
2.) Log in and verify the system properly mounted /mnt.
3.) Re-deploy the vm through the web ui and try again.
[Regression Potential]
Worst case scenario, these changes unnecessarily slow down boot and
do not fix the problem.
[Regression]
This SRU change caused bug 1717477.
[Other Info]
Upstream commit at
https://git.launchpad.net/cloud-init/commit/?id=1f5489c258
=== End SRU Template ===
As reported in bug 1686514, sometimes /mnt will not get mounted when re-delpoying or stopping-then-starting a Azure vm of L32S. This is probably a more generic issue, I suspect shown due to the speed of disks on these systems.
Related bugs:
* bug 1686514: Azure: cloud-init does not handle reformatting GPT partition ephemeral disks
The text was updated successfully, but these errors were encountered: