Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boot.rst: add First Boot Determination section #568

Merged
merged 2 commits into from Sep 16, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
86 changes: 86 additions & 0 deletions doc/rtd/topics/boot.rst
Expand Up @@ -157,4 +157,90 @@ finished, the ``cloud-init status`` subcommand can help block external
scripts until cloud-init is done without having to write your own systemd
units dependency chains. See :ref:`cli_status` for more info.

First Boot Determination
************************

cloud-init has to determine whether or not the current boot is the first boot
of a new instance or not, so that it applies the appropriate configuration. On
an instance's first boot, it should run all "per-instance" configuration,
whereas on a subsequent boot it should run only "per-boot" configuration. This
section describes how cloud-init performs this determination, as well as why it
is necessary.

When it runs, cloud-init stores a cache of its internal state for use across
stages and boots.

If this cache is present, then cloud-init has run on this system before.
[#not-present]_ There are two cases where this could occur. Most commonly,
the instance has been rebooted, and this is a second/subsequent boot.
Alternatively, the filesystem has been attached to a *new* instance, and this
is an instance's first boot. The most obvious case where this happens is when
an instance is launched from an image captured from a launched instance.

By default, cloud-init attempts to determine which case it is running in by
checking the instance ID in the cache against the instance ID it determines at
runtime. If they do not match, then this is an instance's first boot;
otherwise, it's a subsequent boot. Internally, cloud-init refers to this
behavior as ``check``.

This behavior is required for images captured from launched instances to
behave correctly, and so is the default which generic cloud images ship with.
OddBloke marked this conversation as resolved.
Show resolved Hide resolved
However, there are cases where it can cause problems. [#problems]_ For these
cases, cloud-init has support for modifying its behavior to trust the instance
ID that is present in the system unconditionally. This means that cloud-init
OddBloke marked this conversation as resolved.
Show resolved Hide resolved
will never detect a new instance when the cache is present, and it follows that
the only way to cause cloud-init to detect a new instance (and therefore its
first boot) is to manually remove cloud-init's cache. Internally, this
behavior is referred to as ``trust``.

To configure which of these behaviors to use, cloud-init exposes the
``manual_cache_clean`` configuration option. When ``false`` (the default),
cloud-init will ``check`` and clean the cache if the instance IDs do not match
(this is the default, as discussed above). When ``true``, cloud-init will
``trust`` the existing cache (and therefore not clean it).

Manual Cache Cleaning
=====================

cloud-init ships a command for manually cleaning the cache: ``cloud-init
clean``. See :ref:`cli_clean`'s documentation for further details.

Reverting ``manual_cache_clean`` Setting
========================================

Currently there is no support for switching an instance that is launched with
``manual_cache_clean: true`` from ``trust`` behavior to ``check`` behavior,
other than manually cleaning the cache.

.. warning:: If you want to capture an instance that is currently in ``trust``
mode as an image for launching other instances, you **must** manually clean
the cache. If you do not do so, then instances launched from the captured
image will all detect their first boot as a subsequent boot of the captured
instance, and will not apply any per-instance configuration.

This is a functional issue, but also a potential security one: cloud-init is
responsible for rotating SSH host keys on first boot, and this will not
happen on these instances.

.. [#not-present] It follows that if this cache is not present, cloud-init has
not run on this system before, so this is unambiguously this instance's
first boot.

.. [#problems] A couple of ways in which this strict reliance on the presence
of a datasource has been observed to cause problems:

* If a cloud's metadata service is flaky and cloud-init cannot obtain the
instance ID locally on that platform, cloud-init's instance ID
determination will sometimes fail to determine the current instance ID,
which makes it impossible to determine if this is an instance's first or
subsequent boot (`#1885527`_).
* If cloud-init is used to provision a physical appliance or device and an
attacker can present a datasource to the device with a different instance
ID, then cloud-init's default behavior will detect this as an instance's
first boot and reset the device using the attacker's configuration
(this has been observed with the NoCloud datasource in `#1879530`_).

.. _#1885527: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1885527
.. _#1879530: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1879530

.. vi: textwidth=79