Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud-final.service does not run due to dependency cycle #2728

Closed
ubuntu-server-builder opened this issue May 10, 2023 · 14 comments
Closed

cloud-final.service does not run due to dependency cycle #2728

ubuntu-server-builder opened this issue May 10, 2023 · 14 comments
Labels
launchpad Migrated from Launchpad

Comments

@ubuntu-server-builder
Copy link
Collaborator

This bug was originally filed in Launchpad as LP: #1623868

Launchpad details
affected_projects = ['cloud-init (Ubuntu)', 'cloud-init (Ubuntu Xenial)']
assignee = None
assignee_name = None
date_closed = 2016-12-23T17:42:35.549992+00:00
date_created = 2016-09-15T09:42:31.238691+00:00
date_fix_committed = 2016-12-23T17:42:35.549992+00:00
date_fix_released = 2016-12-23T17:42:35.549992+00:00
id = 1623868
importance = undecided
is_complete = True
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1623868
milestone = None
owner = smoser
owner_name = Scott Moser
private = False
status = fix_released
submitter = pitti
submitter_name = Martin Pitt
tags = ['regression-release', 'verification-done', 'yakkety']
duplicates = []

Launchpad user Martin Pitt(pitti) wrote on 2016-09-15T09:42:31.238691+00:00

==== Begin SRU Template ====
[Impact]
As part of the change in bug 1576692, we made cloud-final.target
run After multi-user.target. That created a dependency loop between
cloud-init.target and multi-user.target and cloud-final.target.

Most of the time systemd would break that loop by dropping
cloud-init.target. But sometimes, it would break the loop by dropping
cloud-final.target, which would mean that user scripts do not run
and generally cloud-init doesn't finish.

[Test Case]

Failure in a xenial image can only be reproduced by

patching an image up to the previous xenial-proposed upload

(0.7.7-31-g65ace7b-0ubuntu1~16.04.1), then cleaning it

and then restarting. We will focus on verifying there is not

a problem.

Launch an instance and patch it up to xenial-proposed

$ release=xenial
$ name=x1
$ lxc launch ubuntu-daily:$release $name

wait for it to boot

$ while ! lxc exec $name -- [ -e /run/cloud-init/result.json ]; do sleep 1; done

Now update container, clean and reboot to show first boot

$ lxc exec $name -- sh -c '
    p=/etc/apt/sources.list.d/proposed.list
    echo deb http://archive.ubuntu.com/ubuntu xenial-proposed main > "$p" &&
    apt-get update -q && apt-get -qy install cloud-init'
$ lxc exec $name -- sh -c '
    cd /var/lib/cloud && for d in ; do [ "$d" = "seed" ] || rm -Rf "$d"; done
    rm -Rf /var/log/cloud-init
'

This is like first boot now.

$ lxc exec $name reboot
$ while ! lxc exec $name -- [ -e /run/cloud-init/result.json ]; do sleep 1; done

The services should show active

$ lxc exec $name -- journalctl | grep Break || echo Good, no breaks
$ lxc exec $name -- systemctl --no-pager status cloud-final.service
$ lxc exec $name -- systemctl --no-pager status cloud-init.target

[Regression Potential]
Playing with boot order can cause problems. Regression would be around
some targets not running. On a booted system this would show itself inx
  journalctl | grep -i Break
or
  journalctl | grep -i ordering

==== End SRU Template ====

With current yakkety cloud images (at least in Scalingstack), I often run into this dependency cycle at boot:

Sep 15 09:28:51 ubuntu systemd[1]: cloud-init.target: Found ordering cycle on cloud-init.target/start
Sep 15 09:28:51 ubuntu systemd[1]: cloud-init.target: Found dependency on cloud-final.service/start
Sep 15 09:28:51 ubuntu systemd[1]: cloud-init.target: Found dependency on multi-user.target/start
Sep 15 09:28:51 ubuntu systemd[1]: cloud-init.target: Found dependency on cloud-init.target/start
Sep 15 09:28:51 ubuntu systemd[1]: cloud-init.target: Breaking ordering cycle by deleting job cloud-final.service/start
Sep 15 09:28:51 ubuntu systemd[1]: cloud-final.service: Job cloud-final.service/start deleted to break ordering cycle starting with cloud-init.target/start

● cloud-final.service - Execute cloud user/final scripts
   Loaded: loaded (/lib/systemd/system/cloud-final.service; enabled; vendor preset: enabled)
   Active: inactive (dead)

Thus /var/lib/cloud/instance/boot-finished never gets written and thus waiting for an instance to init just times out.

This is with the most recent https://launchpad.net/ubuntu/+source/cloud-init/0.7.7-31-g65ace7b-0ubuntu1

Related bugs:

  • bug 1576692: fully support package installation in systemd
@ubuntu-server-builder ubuntu-server-builder added the launchpad Migrated from Launchpad label May 10, 2023
@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Martin Pitt(pitti) wrote on 2016-09-15T09:50:50.105493+00:00

Most of the time (at least in LXD, timing issue) cloud-init.target is inactive instead, which is also unintended. This is because it is installed into multi-user.target and thus gets an implied Before=multi-user.target as well. With explicitly adding After=multi-user.target (since one part of it, cloud-final.service, runs after it) all units start and I cannot reproduce the cycle any more.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Launchpad Janitor(janitor) wrote on 2016-09-15T09:54:00.975860+00:00

Status changed to 'Confirmed' because the bug affects multiple users.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Christian Ehrhardt (paelzer) wrote on 2016-09-15T12:46:00.292803+00:00

Running into the same since today.
Seems alto reproducible with uvtool.

uvt-kvm create --memory 2048 --cpu 4 --password=ubuntu yakkety-test-not-finalizing-cloud-init release=yakkety arch=amd64 label=daily

I see the final stage completing in the log:
journalctl | grep "final: SUCCESS"
Sep 15 12:37:26 yakkety-test-not-finalizing-cloud-init cloud-init[1450]: [CLOUDINIT] handlers.py[DEBUG]: finish: modules-final: SUCCESS: running modules for final

Yet I don't get /var/lib/cloud/instance/boot-finished created.

My case seems to match Martins second finding that cloud-init.target seems to be inactive.
systemctl status cloud-init.target
● cloud-init.target - Cloud-init target
Loaded: loaded (/lib/systemd/system/cloud-init.target; static; vendor preset: enabled)
Active: inactive (dead)

OTOH - The cloud-final.service looks good:
● cloud-final.service - Execute cloud user/final scripts
Loaded: loaded (/lib/systemd/system/cloud-final.service; enabled; vendor preset: enabled)
Active: active (exited) since Thu 2016-09-15 12:37:26 UTC; 2min 20s ago
Process: 1450 ExecStart=/usr/bin/cloud-init modules --mode=final (code=exited, status=0/SUCCESS)
Main PID: 1450 (code=exited, status=0/SUCCESS)
Tasks: 0 (limit: 4915)
Memory: 276.0K
CPU: 1.654s
CGroup: /system.slice/cloud-final.service

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Scott Moser(smoser) wrote on 2016-09-15T13:14:46.006540+00:00

Launchpad attachments: journalctl of vm where cloud-final was deleted

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Scott Moser(smoser) wrote on 2016-09-15T13:15:10.249130+00:00

Launchpad attachments: journalctl of vm where cloud-init.target was deleted

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Scott Moser(smoser) wrote on 2016-09-15T13:20:07.099761+00:00

For completeness, I've attached journalctl of 2 vms.
Both are the same yakkety image, launched with no user-data within seconds of each other.
In one, systemd decided to:
Sep 15 12:59:34 ubuntu systemd[1]: cloud-init.target: Breaking ordering cycle by deleting job cloud-final.service/start

In the other:
Sep 15 12:59:32 ubuntu systemd[1]: multi-user.target: Breaking ordering cycle by deleting job cloud-init.target/start

Experience right now is that it is much more common for systemd to delete cloud-init.target than cloud-final.service, but both do exist.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Scott Moser(smoser) wrote on 2016-09-15T13:28:44.643489+00:00

Just for ease of reading:

== deleting job cloud-init.target ==
Sep 15 12:59:32 ubuntu systemd[1]: multi-user.target: Found ordering cycle on multi-user.target/start
Sep 15 12:59:32 ubuntu systemd[1]: multi-user.target: Found dependency on cloud-init.target/start
Sep 15 12:59:32 ubuntu systemd[1]: multi-user.target: Found dependency on cloud-final.service/start
Sep 15 12:59:32 ubuntu systemd[1]: multi-user.target: Found dependency on multi-user.target/start
Sep 15 12:59:32 ubuntu systemd[1]: multi-user.target: Breaking ordering cycle by deleting job cloud-init.target/start
Sep 15 12:59:32 ubuntu systemd[1]: cloud-init.target: Job cloud-init.target/start deleted to break ordering cycle starting with multi-user.target/start

== deleting job cloud-final.target ==
Sep 15 12:59:34 ubuntu systemd[1]: cloud-init.target: Found ordering cycle on cloud-init.target/start
Sep 15 12:59:34 ubuntu systemd[1]: cloud-init.target: Found dependency on cloud-final.service/start
Sep 15 12:59:34 ubuntu systemd[1]: cloud-init.target: Found dependency on multi-user.target/start
Sep 15 12:59:34 ubuntu systemd[1]: cloud-init.target: Found dependency on cloud-init.target/start
Sep 15 12:59:34 ubuntu systemd[1]: cloud-init.target: Breaking ordering cycle by deleting job cloud-final.service/start
Sep 15 12:59:34 ubuntu systemd[1]: cloud-final.service: Job cloud-final.service/start deleted to break ordering cycle starting with cloud-init.target/start

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Martin Pitt(pitti) wrote on 2016-09-15T14:32:15.216302+00:00

Hello Martin, or anyone else affected,

Accepted cloud-init into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.8-1-g3705bb5-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Launchpad Janitor(janitor) wrote on 2016-09-15T15:14:33.166032+00:00

This bug was fixed in the package cloud-init - 0.7.8-1-g3705bb5-0ubuntu1


cloud-init (0.7.8-1-g3705bb5-0ubuntu1) yakkety; urgency=medium

  • New upstream release 0.7.8.
  • New upstream snapshot.
    • systemd: put cloud-init.target After multi-user.target (LP: #1623868)

-- Scott Moser smoser@ubuntu.com Thu, 15 Sep 2016 09:47:11 -0400

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Scott Moser(smoser) wrote on 2016-09-17T02:38:52.386535+00:00

verified with xenial-proposed and 0.7.8-1-g3705bb5-0ubuntu1~16.04.1 as described in bug.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Launchpad Janitor(janitor) wrote on 2016-09-22T17:34:08.065412+00:00

This bug was fixed in the package cloud-init - 0.7.8-1-g3705bb5-0ubuntu1~16.04.1


cloud-init (0.7.8-1-g3705bb5-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

  • New upstream release 0.7.8.
  • New upstream snapshot.
    • systemd: put cloud-init.target After multi-user.target (LP: #1623868)

cloud-init (0.7.7-31-g65ace7b-0ubuntu1~16.04.2) xenial-proposed; urgency=medium

  • debian/control: add Breaks of older versions of walinuxagent (LP: #1623570)

cloud-init (0.7.7-31-g65ace7b-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

  • debian/control: fix missing dependency on python3-serial,
    and make SmartOS datasource work.
  • debian/cloud-init.templates fix capitalisation in template so
    dpkg-reconfigure works to select OpenStack. (LP: #1575727)
  • d/README.source, d/control, d/new-upstream-snapshot, d/rules: sync
    with yakkety for changes due to move to git.
  • d/rules: change PYVER=python3 to PYVER=3 to adjust to upstream change.
  • debian/rules, debian/cloud-init.install: remove install file
    to ensure expected files are collected into cloud-init deb.
    (LP: #1615745)
  • debian/dirs: remove obsolete / unused file.
  • upstream move from bzr to git.
  • New upstream snapshot.
    • Allow link type of null in network_data.json [Jon Grimm] (LP: #1621968)
    • DataSourceOVF: fix user-data as base64 with python3 (LP: #1619394)
    • remove obsolete .bzrignore
    • systemd: Better support package and upgrade. (LP: #1576692, #1621336)
    • tests: cleanup tempdirs in apt_source tests
    • apt config conversion: treat empty string as not provided. (LP: #1621180)
    • Fix typo in default keys for phone_home [Roland Sommer] (LP: #1607810)
    • salt minion: update default pki directory for newer salt minion.
      (LP: #1609899)
    • bddeb: add --release flag to specify the release in changelog.
    • apt-config: allow both old and new format to be present.
      [Christian Ehrhardt] (LP: #1616831)
    • python2.6: fix dict comprehension usage in _lsb_release. [Joshua Harlow]
    • Add a module that can configure spacewalk. [Joshua Harlow]
    • add install option for openrc [Matthew Thode]
    • Generate a dummy bond name for OpenStack (LP: #1605749)
    • network: fix get_interface_mac for bond slave, read_sys_net for ENOTDIR
    • azure dhclient-hook cleanups
    • Minor cleanups to atomic_helper and add unit tests.
    • Fix Gentoo net config generation [Matthew Thode]
    • distros: fix get_primary_arch method use of os.uname [Andrew Jorgensen]
    • Apt: add new apt configuration format [Christian Ehrhardt]
    • Get Azure endpoint server from DHCP client [Brent Baude]
    • DigitalOcean: use the v1.json endpoint [Ben Howard]
    • MAAS: add vendor-data support (LP: #1612313)
    • Upgrade to a configobj package new enough to work [Joshua Harlow]
    • ConfigDrive: recognize 'tap' as a link type. (LP: #1610784)
    • NoCloud: fix bug providing network-interfaces via meta-data.
      (LP: 1577982)
    • Add distro tags on config modules that should have it [Joshua Harlow]
    • ChangeLog: update changelog for previous commit.
    • add ntp config module [Ryan Harper]
    • SmartOS: more improvements for network configuration
    • tools/read-version: update to address change in version
    • make-tarball: older versions of git with --format=tar.
    • read-version: do not attempt git-describe if no git.
    • Newer requests have strong type validation [Joshua Harlow]
    • For upstream snapshot versions do not modify git-describe output.
    • adjust signal_handler for version changes.
    • revert unintended change to ubuntu sources list
    • drop modification of version during make-tarball, tools changes.
    • adjust tools and version information.
    • Update build tools to work with git [Lars Kellogg-Stedman]
    • fix pep8 errors in mcollective unit tests
    • mcollective: add tests, cleanups and bug fix when no config in /etc.

-- Scott Moser smoser@ubuntu.com Thu, 15 Sep 2016 09:57:27 -0400

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Chris J Arges(arges) wrote on 2016-09-22T17:36:08.235762+00:00

The verification of the Stable Release Update for cloud-init has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user John Keenleyside(jkeenleyside) wrote on 2016-09-23T16:52:13.665847+00:00

What is the recommended way to have an application systemd service run after cloud-final.service ?

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Scott Moser(smoser) wrote on 2016-12-23T17:42:33.837109+00:00

This is fixed in cloud-init 0.7.9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
launchpad Migrated from Launchpad
Projects
None yet
Development

No branches or pull requests

1 participant