-
Notifications
You must be signed in to change notification settings - Fork 840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud-init may hang OS boot process due to grep for the entire ISO file when it is attached #3293
Comments
Launchpad user Scott Moser(smoser) wrote on 2018-12-04T15:47:39.142506+00:00 You're correct that ds-identify both that ds-identify does grep through Note that commit 530850f improves the situation on vmware to If we can rely on that label on VMWare OVF systems, then I think I would be Note that the grep is protected from executing in many ways. It will only If we cannot drop the grep all together because VMware/OVF spec does not provide Thoughts? |
Launchpad user Pengzhen(Peter) Cao(pengzhencao) wrote on 2018-12-04T23:44:12.522023+00:00 In the case of OVF customization the protection could work. However, if a user choose to install an OS from the installation ISO image then it could not. In our example we were trying to install a SLES15 OS from its installer DVD attached by virtual CDROM(/dev/sr0). It is the OS vender's choice for what the ISO's lable is, usually "XXX installer DVD". "XXX package DVD"... If we can not just check if certain config file exists on the ISO image, then I think your suggestion for limiting the ISO file size before grep seems a solution. As the config iso file will defenitely be a small one. |
Launchpad user Scott Moser(smoser) wrote on 2018-12-05T14:53:24.399928+00:00 I think I'm happy to drop the 'grep' if you tell me that vmware installations will always label the OVF transport 'OVF ENV' which is my current experience with a vSphere Client 6.0.0 and vCenter Server 6.0.0. if we cannot rely on that, then other heuristic checks is all we can do. |
Launchpad user Scott Moser(smoser) wrote on 2018-12-05T14:54:28.417211+00:00 well, not all we can do, but without adding either a 'mount' or a dependency on something like 'isoinfo'. |
Launchpad user Pengzhen(Peter) Cao(pengzhencao) wrote on 2018-12-06T08:47:30.784306+00:00 Let me check with OVF team internally and update you if we use fixed name for the iso lable. |
Launchpad user Pengpeng Sun(pengpengs) wrote on 2019-02-22T08:18:32.885755+00:00 Hi Scott, If only consider to speed up VMware guest customization process, could we move if ovf_vmware_guest_customization check to be in front of "is_cdrom_ovf" in dsCheckOVF()? Not sure if there is side-effect to do this change. Thanks, |
Launchpad user Scott Moser(smoser) wrote on 2019-03-05T14:38:42+00:00 it does seem like that would improve things, pengpeng. Much better On Fri, Feb 22, 2019 at 3:30 AM Pengpeng Sun 1806701@bugs.launchpad.net wrote:
|
Launchpad user Pengpeng Sun(pengpengs) wrote on 2019-03-06T06:47:48.971508+00:00 @scott, agree with you. While I checked the current versions, the Label contains no fixed string. If you attached a VMTools ISO, then LABEL=VMware Tools, TYPE=iso9660. Thanks, |
Launchpad user Pengpeng Sun(pengpengs) wrote on 2019-04-17T02:16:21.165227+00:00
As Scott mentioned, got the label 'OVF ENV' when attached OVF cdrom which need enable vApp options on a VM. Copy Ryan and Dan on this. *** An article on how to enable vApp options on a VM: https://www.virtuallyghetto.com/2012/06/ovf-runtime-environment.html |
Launchpad user Pengpeng Sun(pengpengs) wrote on 2019-06-18T09:56:48.367078+00:00 Post a merge request: |
Launchpad user Server Team CI bot(server-team-bot) wrote on 2019-07-16T14:31:20.836121+00:00 This bug is fixed with commit d9769c4 to cloud-init on branch master. |
Launchpad user Ryan Harper(raharper) wrote on 2019-07-17T17:12:08.643739+00:00 This bug is believed to be fixed in cloud-init in version 19.2. If this is still a problem for you, please make a comment and set the state back to New Thank you. |
Launchpad user Nicklas(nicklaswallgren) wrote on 2019-12-16T11:27:33.283669+00:00 We got recurring problems with cloud-init/ds-identify process. It spawns sub-processes "blkid -c /dev/null -o export" which gets stuck in the "D" uninterruptible sleep state. The processes cannot be killed, so the only solution is to reboot the affected server. root 3839 0.0 0.0 4760 1840 ? S Dec05 0:00 /bin/sh /usr/lib/cloud-init/ds-identify Distributor ID: Ubuntu 5.0.0-36-generic #39~18.04.1-Ubuntu SMP Tue Nov 12 11:09:50 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
Launchpad user Pengpeng Sun(pengpengs) wrote on 2019-12-17T03:21:51.729502+00:00 @Nicklas, I think the /var/run/cloud-init/ds-identify.log is helpful to investigate the problem. |
This bug was originally filed in Launchpad as LP: #1806701
Launchpad details
Launchpad user Pengzhen(Peter) Cao(pengzhencao) wrote on 2018-12-04T14:20:45.721645+00:00
We have found in our test for SLES15 with cloud-init installed, if we attach a ISO file with the VM before VM is boot, it often takes more than 10 minutes to start the SLES OS. Sometimes it failed to start the SLES OS at all.
We've root caused it is due to the "is_cdrom_ovf()" func of "tools/ds-identify".
In this function, there is the following logic to detect if an ISO contains certain string:
It is trying to grep the who ISO file for a certain string, which causes intense IO pressure for the system.
What is worse is that sometimes the ISO file is large (e.g. >5GB for installer DVD) and it is mounted over NFS. The "grep" process often consume 99% CPU and seems hang. Then the systemd starts more and more "grep" process which smoke the CPU and consumes all the IO bandwidth for the ISO file. Then the system may hang for a long time and sometimes failed to start.
To fix this issue, I suggest that we should not grep for the entire ISO file. Rather then we should just check if the file/dir exists with os.path.exists().
-------------------------debug log snip------------------------
pek2-gosv-16-dhcp180:~ # ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 13:32 ? 00:00:04 /usr/lib/systemd/systemd --switched-root --system --deserialize 24
…
root 474 1 0 13:34 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify
root 482 474 2 13:34 ? 00:00:15 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1
root 1020 1 0 13:35 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify
root 1039 1020 1 13:35 ? 00:00:07 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1
polkitd 1049 1 0 13:37 ? 00:00:00 /usr/lib/polkit-1/polkitd --no-debug
root 1051 1 0 13:37 ? 00:00:00 /usr/sbin/wickedd --systemd --foreground
root 1052 1 0 13:37 ? 00:00:00 /usr/lib/systemd/systemd-logind
root 1054 1 0 13:37 ? 00:00:00 /usr/sbin/wickedd-nanny --systemd --foreground
root 1073 1 0 13:37 ? 00:00:00 /usr/bin/vmtoolsd
root 1097 1 0 13:37 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify
root 1110 1097 1 13:37 ? 00:00:04 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1
root 1304 1 0 13:38 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify
root 1312 1304 1 13:38 ? 00:00:03 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1
root 1537 1 0 13:40 ? 00:00:00 /usr/bin/plymouth --wait
root 1613 1 0 13:40 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify
root 1645 1613 0 13:40 ? 00:00:02 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1
…
Grep use nearly 100% cpu, system very slow.
top - 13:46:37 up 26 min, 2 users, load average: 14.14, 15.03, 10.57
Tasks: 225 total, 6 running, 219 sleeping, 0 stopped, 0 zombie
%Cpu(s): 40.1 us, 49.3 sy, 0.0 ni, 0.0 id, 1.4 wa, 0.0 hi, 9.1 si, 0.0 st
KiB Mem : 1000916 total, 64600 free, 355880 used, 580436 buff/cache
KiB Swap: 1288168 total, 1285600 free, 2568 used. 492688 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4427 root 20 0 40100 3940 3084 R 99.90 0.394 0:27.41 top
1016 root 20 0 197796 4852 3400 R 99.90 0.485 1:26.44 vmtoolsd
1723 root 20 0 7256 1860 1556 D 99.90 0.186 0:28.44 grep
484 root 20 0 7256 1684 1396 D 99.90 0.168 1:51.22 grep
1278 root 20 0 7256 1856 1556 D 99.90 0.185 0:38.44 grep
1398 root 20 0 7256 1860 1556 R 99.90 0.186 0:28.53 grep
1061 root 20 0 7256 1856 1556 D 99.90 0.185 0:56.62 grep
-------------------------debug log snip------------------------
The text was updated successfully, but these errors were encountered: