Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ds-identify - stuck in uninterruptible sleep state #3520

Closed
ubuntu-server-builder opened this issue May 12, 2023 · 13 comments
Closed

ds-identify - stuck in uninterruptible sleep state #3520

ubuntu-server-builder opened this issue May 12, 2023 · 13 comments
Labels
launchpad Migrated from Launchpad

Comments

@ubuntu-server-builder
Copy link
Collaborator

This bug was originally filed in Launchpad as LP: #1856560

Launchpad details
affected_projects = ['util-linux (Ubuntu)']
assignee = None
assignee_name = None
date_closed = 2019-12-17T15:24:38.351077+00:00
date_created = 2019-12-16T13:13:02.107707+00:00
date_fix_committed = None
date_fix_released = None
id = 1856560
importance = undecided
is_complete = True
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1856560
milestone = None
owner = nicklaswallgren
owner_name = Nicklas
private = False
status = invalid
submitter = nicklaswallgren
submitter_name = Nicklas
tags = ['rls-bb-incoming']
duplicates = []

Launchpad user Nicklas(nicklaswallgren) wrote on 2019-12-16T13:13:02.107707+00:00

We got recurring issues with the cloud-init/ds-identify process. It spawns sub-processes "blkid -c /dev/null -o export" which gets stuck in the "D" uninterruptible sleep state.

The processes cannot be killed, so the only solution is to reboot the affected server.

root 3839 0.0 0.0 4760 1840 ? S Dec05 0:00 /bin/sh /usr/lib/cloud-init/ds-identify
root 3844 0.0 0.0 11212 2836 ? D Dec05 0:00 _ blkid -c /dev/null -o export
root 6943 0.0 0.0 4760 1880 ? S Dec05 0:00 /bin/sh /usr/lib/cloud-init/ds-identify
root 6948 0.0 0.0 11212 2844 ? D Dec05 0:00 _ blkid -c /dev/null -o export
root 6111 0.0 0.0 4760 1916 ? S Dec12 0:00 /bin/sh /usr/lib/cloud-init/ds-identify
root 6149 0.0 0.0 11212 2940 ? D Dec12 0:00 _ blkid -c /dev/null -o export
root 8765 0.0 0.3 926528 24968 ? Ssl Dec12 0:12 /usr/lib/snapd/snapd
root 9179 0.0 0.0 4760 1892 ? S Dec12 0:00 /bin/sh /usr/lib/cloud-init/ds-identify
root 9185 0.0 0.0 11980 3552 ? D Dec12 0:00 _ blkid -c /dev/null -o export

Distributor ID: Ubuntu
Description: Ubuntu 18.04.3 LTS
Release: 18.04
Codename: bionic

5.0.0-36-generic #39~18.04.1-Ubuntu SMP Tue Nov 12 11:09:50 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

@ubuntu-server-builder ubuntu-server-builder added the launchpad Migrated from Launchpad label May 12, 2023
@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Dan Watkins(oddbloke) wrote on 2019-12-16T15:19:06.869930+00:00

Hi Nicklas,

Thanks for the bug report, this is a bit of a strange one! blkid hanging suggests a deeper issue with your VM/hypervisor than just a cloud-init issue, but let's gather the info we need to work that out. Could you run cloud-init collect-logs on an affected machine and attach the tarball it creates to this bug, please?

What happens if you run sudo blkid -c /dev/null -o export manually on the instance?

(Once you've attached this requested info, please move the bug status back to New.)

Thanks!

Dan

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Nicklas(nicklaswallgren) wrote on 2019-12-16T16:27:37.495674+00:00

Hey,

Thanks for the response.

If we run the blkid -c /dev/null -o export manually, it gets stuck as well.

The strange thing is, this only happens once in a while. If we restart the instance, and run the command again it runs successfully.

Launchpad attachments: cloud-init.tar.gz

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Nicklas(nicklaswallgren) wrote on 2019-12-17T06:34:16.931260+00:00

Here is the output of the /proc//stack

cat /proc/3844/stack
[<0>] __blkdev_get+0x7a/0x580
[<0>] blkdev_get+0x131/0x340
[<0>] blkdev_open+0x92/0x100
[<0>] do_dentry_open+0x1f8/0x3a0
[<0>] vfs_open+0x2f/0x40
[<0>] path_openat+0x2e8/0x1700
[<0>] do_filp_open+0x9b/0x110
[<0>] do_sys_open+0x1bb/0x2d0
[<0>] __x64_sys_openat+0x20/0x30
[<0>] do_syscall_64+0x5a/0x120
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0>] 0xffffffffffffffff

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Dan Watkins(oddbloke) wrote on 2019-12-17T15:24:32.930412+00:00

Hi Nicklas,

Thanks for the extra info! This sounds like an issue with blkid (and/or your system), rather than a cloud-init bug. I've added a util-linux bug task (which is the package that ships blkid), and I'm going to mark the cloud-init bug task Invalid as I don't believe that there's anything we can do to help you from our perspective.

If you think this is incorrect, please do let us know!

Thanks!

Dan

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Dimitri John Ledkov(xnox) wrote on 2019-12-17T16:25:16.504155+00:00

ls -latr /proc//fd

would be nice too, to see what it has open/holding/reading.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Dimitri John Ledkov(xnox) wrote on 2019-12-19T16:24:08.645123+00:00

[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.0.0-32-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro maybe-ubiquity

[ 0.000000] DMI: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
[ 0.000000] Hypervisor detected: VMware

Could you please expalin what are you doing?

Because everything about this is odd. You are booting of LVM volume? Running ubiquity desktop installer? And using cloud-init?

It would be interesting for you to describe your setup and suggest how to reproduce your instance type.

Also you appear to run without a cloud hence nocloud seed.

If you are modifying cloud.config anyway, can you specify that ds-identify should just use NoCloud?

journal.txt seems to be incomplete, it's not a full journal of boot with this issue.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Nicklas(nicklaswallgren) wrote on 2019-12-19T16:58:49.077459+00:00

Hey,

We have restarted the affect nodes/servers. I will retrieve the output of ls -latr /proc/<pid>/fd once it starts happening again. Often takes a few days.

I haven't setup our Ubuntu environments. Our IT/devops team (with no experience of linux) has installed the Ubuntu-nodes on our VMware environment.

I don't believe we have made any changes to "cloud.config". It's just a clean install of 18.04.3, with 5.0.0-36-generic.

Can I provide any other information?

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Dimitri John Ledkov(xnox) wrote on 2020-01-09T16:25:49.160918+00:00

cloud-init is used to perform first time configuration of cloud-images using a cloud metadata source. I.e. a generic image is launched by openstack and provisions per-user keys / packages / etc.

If someone manually installs Ubuntu Desktop / Server using an installer, one should not be installing or using cloud-init as clearly, it will never complete without a correct metadata source present. And when it fails to complete, it re-attempts to reprovision the machine on every boot.

If you don't know whether or not there is a cloud metadata source, and whether or not cloud-init should be provisioning the instances on first boot, or whether a desktop/server installers were used - you need to sort that out first. Please escalate to your IT/devops team to understand whether or not it is intentional that cloud-init is installed and attempted on every boot, and where the metadata for it is suppoed to come from. Then fix cloud-init/metadata such that it completes, or remove/disable/uninstall cloud-init.

This is not a bug, but effectively a support issue at this point. The logs are inconsistent with normal cloud-init usage, and there is no reproducer.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Scott Moser(smoser) wrote on 2020-01-09T19:11:26.585158+00:00

If someone manually installs Ubuntu Desktop / Server using an installer,
one should not be installing or using cloud-init as clearly, it will never
complete without a correct metadata source present. And when it fails to
complete, it re-attempts to reprovision the machine on every boot.

I'd just like to clarify something. The above statement is neither true
nor helpful.

cloud-init (since 18.04) does not run unless it is on a cloud or has cloud
platform metadata. It is perfectly fine to have cloud-init installed on a
desktop or a server that is not running "on a cloud".

In order to accomplish this, cloud-init is only enabled via a generator.
That generator ran 'blkid', and 'blkid' blocked indefinitely.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Dan Watkins(oddbloke) wrote on 2020-01-09T21:02:15.782129+00:00

Moved back to New, because this is happening in normal cloud-init operation.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Nicklas(nicklaswallgren) wrote on 2020-01-14T11:34:38.679840+00:00

Affected process

root      1166  0.0  0.0  72292  5812 ?        Ss    2019   0:00 /usr/sbin/sshd -D

Output of ls -latr /proc/1166/fd

total 0
dr-xr-xr-x 9 root root 0 Jan 14 12:00 ..
dr-x------ 2 root root 0 Jan 14 12:31 .
lrwx------ 1 root root 64 Jan 14 12:31 4 -> 'socket:[24485]'
lrwx------ 1 root root 64 Jan 14 12:31 3 -> 'socket:[24483]'
lrwx------ 1 root root 64 Jan 14 12:31 2 -> 'socket:[24455]'
lrwx------ 1 root root 64 Jan 14 12:31 1 -> 'socket:[24455]'
lr-x------ 1 root root 64 Jan 14 12:31 0 -> /dev/null

Should we disable cloud-init altogether?

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Nicklas(nicklaswallgren) wrote on 2020-01-14T11:39:28.398416+00:00

Never mind my last message...

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Nicklas(nicklaswallgren) wrote on 2020-01-30T14:23:48.393767+00:00

Here is the output of ls -latr /proc/<pid>/fd

root 3377 0.0 0.0 11980 3352 ? D Jan24 0:00 blkid -c /dev/null -o export
root 6293 0.0 0.0 11980 3316 ? D Jan24 0:00 blkid -c /dev/null -o export
root 10422 0.0 0.0 11980 3556 ? D Jan26 0:00 blkid -c /dev/null -o export
root 13020 0.0 0.0 11980 3492 ? D Jan24 0:00 blkid -c /dev/null -o export
root 13192 0.0 0.0 11980 3752 ? D Jan26 0:00 blkid -c /dev/null -o export
root 15494 0.0 0.0 11980 3704 ? D Jan27 0:00 blkid -c /dev/null -o export
root 15506 0.0 0.0 11212 2532 ? D Jan23 0:00 blkid -c /dev/null -o export
root 15630 0.0 0.0 11980 3456 ? D Jan24 0:00 blkid -c /dev/null -o export
root 15768 0.0 0.0 11980 3700 ? D Jan26 0:00 blkid -c /dev/null -o export
root 16282 0.0 0.0 11980 3704 ? D Jan27 0:00 blkid -c /dev/null -o export
root 18328 0.0 0.0 11980 3616 ? D Jan27 0:00 blkid -c /dev/null -o export
root 18364 0.0 0.0 11980 3432 ? D Jan23 0:00 blkid -c /dev/null -o export
root 18549 0.0 0.0 11980 3324 ? D Jan24 0:00 blkid -c /dev/null -o export
root 18755 0.0 0.0 11980 3620 ? D Jan26 0:00 blkid -c /dev/null -o export
root 19266 0.0 0.0 11980 3556 ? D Jan27 0:00 blkid -c /dev/null -o export
root 20480 0.0 0.0 11980 3616 ? D Jan26 0:00 blkid -c /dev/null -o export
root 21131 0.0 0.0 11980 3504 ? D Jan23 0:00 blkid -c /dev/null -o export
root 21149 0.0 0.0 11980 3636 ? D Jan27 0:00 blkid -c /dev/null -o export
root 21378 0.0 0.0 11980 3180 ? D Jan24 0:00 blkid -c /dev/null -o export
root 21927 0.0 0.0 11980 3624 ? D Jan27 0:00 blkid -c /dev/null -o export
root 23165 0.0 0.0 11980 3704 ? D Jan26 0:00 blkid -c /dev/null -o export
root 23795 0.0 0.0 11980 3616 ? D Jan27 0:00 blkid -c /dev/null -o export
root 23911 0.0 0.0 11980 3456 ? D Jan23 0:00 blkid -c /dev/null -o export
root 24554 0.0 0.0 11980 3592 ? D Jan27 0:00 blkid -c /dev/null -o export
root 25672 0.0 0.0 11980 3752 ? D Jan26 0:00 blkid -c /dev/null -o export
root 28344 0.0 0.0 11980 3704 ? D Jan26 0:00 blkid -c /dev/null -o export
root 30382 0.0 0.0 11980 3216 ? D Jan24 0:00 blkid -c /dev/null -o export

dr-xr-xr-x 9 root root 0 Jan 30 06:29 ..
lr-x------ 1 root root 64 Jan 30 15:22 4 -> /proc/partitions
l-wx------ 1 root root 64 Jan 30 15:22 3 -> /run/cloud-init/ds-identify.log
lrwx------ 1 root root 64 Jan 30 15:22 2 -> /dev/null
l-wx------ 1 root root 64 Jan 30 15:22 1 -> 'pipe:[399588299]'
lrwx------ 1 root root 64 Jan 30 15:22 0 -> /dev/null
dr-x------ 2 root root 0 Jan 30 15:22 .

root@ubuntuapp02:/# ls -latr /proc/23795/fd
total 0
dr-xr-xr-x 9 root root 0 Jan 30 06:29 ..
lr-x------ 1 root root 64 Jan 30 15:23 4 -> /proc/partitions
l-wx------ 1 root root 64 Jan 30 15:23 3 -> /run/cloud-init/ds-identify.log
lrwx------ 1 root root 64 Jan 30 15:23 2 -> /dev/null
l-wx------ 1 root root 64 Jan 30 15:23 1 -> 'pipe:[415355377]'
lrwx------ 1 root root 64 Jan 30 15:23 0 -> /dev/null
dr-x------ 2 root root 0 Jan 30 15:23 .

@ubuntu-server-builder ubuntu-server-builder closed this as not planned Won't fix, can't repro, duplicate, stale May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
launchpad Migrated from Launchpad
Projects
None yet
Development

No branches or pull requests

1 participant