Allow user control over update events #834

TheRealFalcon · 2021-03-05T20:40:51Z

Proposed Commit Message

Allow user control over update events

Control is currently limited to boot events, though this should
allow us to more easily incorporate HOTPLUG support. Disabling
'instance-first-boot' is not supported as we apply networking config
too early in boot to have processed userdata (along with the fact
that this would be a pretty big foot-gun).

The concept of update events on datasource has been split into
supported update events and default update events. Defaults will be
used if there is no user-defined update events, but user-defined
events won't be supplied if they aren't supported.
When applying the networking config, we now check to see if the event
is supported by the datasource as well as if it is enabled.

Configuration looks like:
updates:
  network:
    when: ['boot']

Additional Context

Most of the functionality here has been taken from the initial hotplug PR, though this PR is intentionally more limited in scope, and changes have been made as I saw fit.

Test Steps

WIP

TheRealFalcon · 2021-03-05T21:08:31Z

cloudinit/stages.py

+        if not any([
+            self.datasource is NULL_DATA_SOURCE,
+            self.is_new_instance(),  # or should it be self._is_first_boot(),
+            (self.update_event_enabled(current_boot_type, scope='network')
+             and self.datasource.update_metadata([current_boot_type]))
+        ]):


This section has been where I've been banging my head most against the wall. (With a few exceptions), apply_networking_config gets called twice during boot. It get called first in LOCAL mode, and then again in NETWORK mode.

Previous to this PR, on first boot, if the datasource got initialized in LOCAL mode, we'd return at this point and not apply networking config in NETWORK mode (is_new_instance() returns False in network mode on first boot if there's a FILESYSTEM dep). On every subsequent boot, If the BOOT update event is enabled we apply networking config twice (one for LOCAL, one for NETWORK).

That same general behavior should still be true, but applying networking config twice on every boot seems wrong to me. Since self.is_new_instance() will always return False on every subsequent boot, that can't be our determining factor. I had considered checking datasource dependencies, and apply during LOCAL if there's a filesystem dependency, and at NETWORK if there's a network dependency, but that could cause problems on first boot because even if we find no local datasource, we still generate the fallback config and use that

@TheRealFalcon just caught up on mail so sorry for the delay.

I think we've had a bug for the longest time on apply_networking_config twice
as you've noticed. Generally this isn't an issue (other than wasteful) in
that for Datasources that are local and have network-config (say OpenStack)
we already get the network config we're going to use. In the case that the DS
is network-only (GCE) local never wrote a network config.

That said, I have wanted to apply it only once and the rub has been we've not
decided on how we can keep state between local and init that we've already
rendered network config. If we want to do so we could use either explicit
state on the cloud object such that when we re-use the object at net time (we
load the object with existing=trust (stages.py:_restore_checked_cache() we'll
have something that can be used to determine if we've already applied.

If we want to avoid attributes on the instance object, we can use ephemeral
files in /run/cloud-init; this won't have any state implications from boot to
boot but we should carefully think about the case of something re-running
cloud-init from the command line.

TheRealFalcon · 2021-03-08T14:49:29Z

@raharper , do you have any insight to my comment here?

OddBloke

First pass, familiarising myself with the overall approach here. Thanks for the work, both Ryan and James!

OddBloke · 2021-03-23T20:53:15Z

doc/rtd/topics/events.rst

+======
+
+`Cloud-init`_ will fetch and apply cloud and user data configuration
+upon serveral event types. The two most common events for cloud-init


Suggested change

upon serveral event types. The two most common events for cloud-init

upon several event types. The two most common events for cloud-init

OddBloke · 2021-03-23T20:54:26Z

doc/rtd/topics/events.rst

+Each Datasource will provide a set of events that it is capable of handling.
+Datasources may not support all event types. In some cases a system
+may be configured to allow a particular event but may be running on
+a platform who's datasource cannot support the event.


Suggested change

a platform who's datasource cannot support the event.

a platform whose datasource cannot support the event.

OddBloke · 2021-03-23T20:57:17Z

doc/rtd/topics/events.rst

+event occur. Currently there are two known scopes: ``network`` and
+``storage``. Scopes are defined by convention but arbitrary values


Do we have a storage scope yet?

OddBloke · 2021-03-23T20:58:41Z

doc/rtd/topics/events.rst

+Cloud-init has a default updates policy to handle new instance
+events always. Vendors may want an instance to handle additional
+events. Users have the final say and may provide update configuration
+which can be used to enable or disable handling of specific events. The
+exception is first boot. Configuration will always be applied at first
+boot, and the user cannot disable this.


This paragraph is quite dense. Most readers won't care about vendors (or, honestly, even know what they mean in a cloud-init context), so that could be mentioned separately; it also feels like the first and last sentences are stating the same thing slightly differently.

OddBloke · 2021-03-23T20:59:46Z

doc/rtd/topics/events.rst

+========================
+
+All :ref:`datasources` by default support the ``BOOT_NEW_INSTANCE`` event.
+Each Datasource will provide a set of events that it is capable of handling.


Suggested change

Each Datasource will provide a set of events that it is capable of handling.

Each Datasource will declare a set of these events that it is capable of handling.

"provide a set of event" implies more agency for the datasource than I think we intend

cloudinit/stages.py

OddBloke · 2021-03-23T21:12:06Z

cloudinit/stages.py

+        if self.is_new_instance():
+            current_boot_type = EventType.BOOT_NEW_INSTANCE
+
+        if not any([


Using any([...]) here means that we'll evaluate all of these conditions, ironically, unconditionally:

In [2]: def foo(): ...: print("foo") ...: return True ...: In [3]: any([foo(), foo()]) foo foo Out[3]: True

rather than lazily:

In [6]: foo() or foo() foo Out[6]: True

This is particularly problematic here because it means there's a decent chance we will call self.datasource.update_metadata in cases where we previously wouldn't: that could cause us to re-fetch metadata in cases where it isn't necessary.

OddBloke · 2021-03-23T21:12:46Z

cloudinit/stages.py

+        current_boot_type = EventType.BOOT
+        if self.is_new_instance():
+            current_boot_type = EventType.BOOT_NEW_INSTANCE


This feels like it should be a first-class attribute (presumably on self).

I don't see any other method where this would currently be used. Can we take a YAGNI approach here and make it first class later if it's needed?

My fear is that people will just reimplement it (but probably slightly differently) instead of performing a refactor. I think it would also aid readability here, as it would be clear(er) that the current_boot_type is an invariant property of the Init stage we're running (in the same way that is_new_instance is an invariant property).

(But I certainly won't block landing on it.)

OddBloke · 2021-03-23T21:20:47Z

cloudinit/stages.py

+        else:
+            # refresh netcfg after update
+            netcfg, src = self._find_networking_config()


A nit, for sure, but as we unconditionally return in the if branch, we don't need the else:

Suggested change

else:

# refresh netcfg after update

netcfg, src = self._find_networking_config()

# refresh netcfg after update

netcfg, src = self._find_networking_config()

OddBloke · 2021-03-23T21:26:13Z

doc/rtd/topics/events.rst

+.. _events:
+
+******************
+Events and Updates


This doc could use some work to make it a little more accessible for cloud-init users.

OddBloke

Thanks James, this is looking good! I have some suggestions to improve clarity in cloudinit.event as well as some thoughts on testing, but the core of this functionality LGTM: now I'm mostly concerned with making sure we're setting ourselves up well for the future, in terms of API choices.

tests/integration_tests/modules/test_user_events.py

OddBloke · 2021-04-14T14:49:49Z

cloudinit/stages.py

+        bpath = os.path.join(
+            self.paths.get_cpath(),
+            'sem',
+            'config_scripts_per_once.once'


There will be a brief window of cloud-init execution where this file exists on a first boot: after scripts-per-once runs, but before the remaining modules complete. I don't think this is a problem that we need to address, but a comment or docstring here indicating that corner case would be good.

OddBloke · 2021-04-14T15:11:37Z

cloudinit/stages.py

+            'sem',
+            'config_scripts_per_once.once'
+        )
+        first_boot = not os.path.exists(bpath)


This looks similar to

cloud-init/cloudinit/helpers.py

Lines 114 to 135 in 9f9e154

def has_run(self, name, freq):

if not freq or freq == PER_ALWAYS:

return False

cname = canon_sem_name(name)

sem_file = self._get_path(cname, freq)

# This isn't really a good atomic check

# but it suffices for where and when cloudinit runs

if os.path.exists(sem_file):

return True

# this case could happen if the migrator module hadn't run yet

# but the item had run before we did canon_sem_name.

if cname != name and os.path.exists(self._get_path(name, freq)):

LOG.warning("%s has run without canonicalized name [%s].\n"

"likely the migrator has not yet run. "

"It will run next boot.\n"

"run manually with: cloud-init single --name=migrator",

name, cname)

return True

return False

but missing some specific handling; should we be using that somehow instead?

OddBloke · 2021-04-14T17:30:30Z

cloudinit/sources/__init__.py


    # Default: generate network config on new instance id (first boot).
-    update_events = {'network': set([EventType.BOOT_NEW_INSTANCE])}
+    supported_update_events = {'network': set([


frozenset, maybe? (A nit, regardless.)

OddBloke · 2021-04-14T17:33:06Z

cloudinit/sources/__init__.py

+        EventType.BOOT_NEW_INSTANCE,
+        EventType.BOOT,
+    ])}
+    default_update_events = {'network': set([EventType.BOOT_NEW_INSTANCE])}


Should we use a constant for 'network' so that we don't have a "magic" string affecting behaviour? (We'll notice EventScope.NETWROK a lot sooner than 'netwrok' via linting, etc.)

OddBloke · 2021-04-14T17:34:37Z

cloudinit/stages.py

+        allowed = get_enabled_events(
+            config_events, default_events
+        )
+
+        if not scope:
+            scopes = [allowed.keys()]
+        else:
+            scopes = [scope]
+        LOG.debug('Possible scopes for this event: %s', scopes)
+
+        for evt_scope in scopes:
+            if event_source_type in allowed.get(evt_scope, []):
+                LOG.debug('Event Allowed: scope=%s EventType=%s',
+                          evt_scope, event_source_type)
+                return True
+
+        LOG.debug('Event Denied: scopes=%s EventType=%s',
+                  scopes, event_source_type)
+        return False


None of this references self: perhaps a candidate to move into cloudinit.event as a function?

OddBloke · 2021-04-14T17:40:06Z

cloudinit/tests/test_stages.py

        self.init.apply_network_config(True)
-        self.init.distro.apply_network_config_names.assert_called_with(net_cfg)
+        self.init.distro.apply_network_config_names.assert_called_with(


Is this mock autospec'd?

OddBloke · 2021-04-14T17:42:31Z

cloudinit/tests/test_stages.py

+        self._apply_network_setup(m_macs)
+
+        self.init.datasource.supported_update_events = {
+            'network': [EventType.BOOT_NEW_INSTANCE]


Should this be a set instead of a list?

TheRealFalcon · 2021-04-21T13:59:48Z

@OddBloke Pushed a big-ish update. I (hopefully) simplified event.py a lot, and a lot of the other changes flow out from there.

OddBloke

Thanks James, this refactor has made it much easier to follow along! I think structurally we're pretty much there: I have some more specific comments/questions inline.

OddBloke · 2021-04-23T17:32:58Z

cloudinit/event.py

 """Classes and functions related to event handling."""

+from enum import Enum
+from typing import Dict, Set


This presents a compatibility break with Python 3.4, unfortunately. It does make the code much easier to follow, so perhaps switch to type comments without imports (which will cause mypy to complain, but we don't run it today and fixing these imports once we do will likely be one of the easier fixes we have to make)?

OddBloke · 2021-04-23T17:51:30Z

cloudinit/sources/__init__.py

+    supported_update_events = {EventScope.NETWORK: set([
+        EventType.BOOT_NEW_INSTANCE,
+        EventType.BOOT,
+    ])}
+    default_update_events = {EventScope.NETWORK: set(
+        [EventType.BOOT_NEW_INSTANCE
+    ])}


The formatting here is inconsistent; we won't be adding to these too often, so I think it would be fine to do without a trailing comma:

supported_update_events = { EventScope.NETWORK: set([EventType.BOOT_NEW_INSTANCE, EventType.BOOT]) } default_update_events = { EventScope.NETWORK: set([EventType.BOOT_NEW_INSTANCE]) }

(but that's a nit: the inconsistency is not 😄 )

OddBloke · 2021-04-23T17:52:52Z

cloudinit/sources/DataSourceRbxCloud.py

@@ -206,10 +206,10 @@ def read_user_data_callback(mount_dir):

 class DataSourceRbxCloud(sources.DataSource):
    dsname = "RbxCloud"
-    update_events = {'network': [
+    default_update_events = {EventScope.NETWORK: set([


Did you consider using set literals for these definitions?

It's a personal preference because I often think I'm seeing a dictionary when I see a set literal. I don't have a problem changing it though.

OddBloke · 2021-04-23T17:59:17Z

cloudinit/sources/tests/test_init.py

+        self.original_supported_events = copy.deepcopy(
+            self.datasource.supported_update_events)


Do we need these lines in addition to the line in setUpClass?

OddBloke · 2021-04-23T18:01:36Z

cloudinit/sources/tests/test_init.py

+        cls.original_supported_events = copy.deepcopy(
+            DataSource.supported_update_events)


Could we use mock.patch for this? Borderline nit, but I expect it will handle corner cases we aren't considering, for example.

OddBloke · 2021-04-23T18:06:26Z

cloudinit/stages.py

+        LOG.debug('Allowed events: %s', allowed)
+
+        if not scope:
+            scopes = list(allowed.keys())


No need to change it, it's harmless, but is there a specific reason I'm missing for needing the list call here?

Because of the "almost list" structures that dicts give us back.

>>> x = {1:2} >>> [x.keys()] [dict_keys([1])] >>> list(x.keys()) [1]

Since I'm using a comprehension it later, I need it to be a "real" list.

dict_keys is iterable:

In [22]: from collections.abc import Iterable In [21]: isinstance({}.keys(), Iterable) Out[21]: True

So is usable in comprehensions:

In [3]: from enum import Enum In [4]: class Test(Enum): ...: thing = "string" In [8]: d = {Test("string"): "foo"} In [9]: [x for x in d.keys()] Out[9]: [<Test.thing: 'string'>] In [10]: [x.value for x in d.keys()] Out[10]: ['string']

It isn't a sequence:

In [22]: from collections.abc import Sequence In [23]: isinstance({}.keys(), Sequence) Out[23]: False

because it doesn't implement __getitem__ (so you can't subscript it, which is perhaps what you were thinking of).

Actually...no 😄

The code I was copying from had a scopes = [allowed.keys()] and I misinterpreted why that wasn't working. I thought it was leaving it as a dict_keys, which is why I converted it to a proper list, when in actuality it was wrapping the dict_keys into another list.

All this to say...I'll drop the list(...)

OddBloke · 2021-04-23T18:08:15Z

cloudinit/stages.py

+                return True
+
+        LOG.debug('Event Denied: scopes=%s EventType=%s',
+                  [s.value for s in scopes], event_source_type)


These log messages look like they'll be a little inconsistent: the "Possible scopes" message doesn't call s.value on each scope before logging it; the "Event Allowed" one doesn't call it on evt.scope. Can we get these aligned?

OddBloke · 2021-04-23T19:01:21Z

cloudinit/stages.py

+        if not (
+            self.datasource is NULL_DATA_SOURCE or
+            self.boot_type == EventType.BOOT_NEW_INSTANCE or
+            (
+                self.update_event_enabled(
+                    self.boot_type, scope=EventScope.NETWORK
+                ) and self.datasource.update_metadata([self.boot_type])
+            )
+        ):


I'm having a hard time following the exact logic here; could we decompose it into one or more variables, that'll make it a little easier to understand what we're checking?

OddBloke · 2021-04-23T19:10:09Z

tests/integration_tests/modules/test_user_events.py

+    log = client.read_from_file('/var/log/cloud-init.log')
+    assert 'Applying network configuration' in log
+    assert 'dummy0' not in client.execute('ls /sys/class/net')
+
+    _add_dummy_bridge_to_netplan(client)
+    client.execute('rm /var/log/cloud-init.log')
+    client.restart()
+    log = client.read_from_file('/var/log/cloud-init.log')
+
+    assert 'Event Allowed: scope=network EventType=boot' in log
+    assert 'Applying network configuration' in log
+    assert 'dummy0' not in client.execute('ls /sys/class/net')


Worth refactoring this into a _test_network_config_applied_on_reboot which both this and the Azure test can call?

OddBloke · 2021-04-23T19:11:21Z

tests/integration_tests/modules/test_user_events.py

+This is currently limited to applying network config on BOOT events.
+Since we reapply the same config on reboot, verification is limited to
+checking log messages.
+
+Because the supported/default events are datasource specific, the datasource
+is being limited to LXD containers/vms (NoCloud).
+For NoCloud, BOOT_NEW_INSTANCE and BOOT are supported, but only
+BOOT_NEW_INSTANCE is default.


These tests look much better now, thank you! (But this docstring is now out-of-date. 🏃💨)

TheRealFalcon · 2021-04-26T13:35:59Z

cloudinit/stages.py

+                          evt_scope.value, event_source_type)
+                return True
+
+        LOG.debug('Event Denied: scopes=%s EventType=%s',


I'm still not happy with this. We attempt to apply networking config twice on every boot. Pre-this-PR behavior, if only BOOT_NEW_INSTANCE was enabled, we would apply exactly once on the first part of first boot, then deny on every subsequent attempt. If BOOT was enabled, we'd apply every single time (twice per boot, including first boot).

With this PR, on first boot we don't have access to userdata, so we fall back to the default update events of the datasource. If only BOOT_NEW_INSTANCE is enabled, we'll get an event denied message in the logs, even if BOOT is specified in the userdata. I'm concerned having an event denied message in the logs will be confusing given that it is enabled in the userdata.

I could add a check for first boot here and slightly alter the log message. It could be something like "Event Denied from data source...", but that also seems like overkill checking every boot for a not-much-more-helpful log message.

Any thoughts for improvement?

Are there cases when we would only successfully apply networking configuration on the second attempt during first boot? If so, then does it follow that BOOT_NEW_INSTANCE should be the toggle for both of those runs? And, therefore, should we actually be using the check for first boot for more than just a log message?

It also occurs to me that we already have language for these occasions, from our script execution modules: once, instance, and boot. I'm not suggesting we change the language we're using here ("instance" is obviously ~contained within "BOOT_NEW_INSTANCE"), but it does make me wonder if we should be following those modules and tracking whether we have applied network configuration for this instance, rather than trying to infer what we should be doing based on the type of a boot we think we have.

If we haven't yet configured networking, BOOT_NEW_INSTANCE or BOOT will cause it to be applied; if we have, then only BOOT will.

What do you think?

Are there cases when we would only successfully apply networking configuration on the second attempt during first boot?

Yes, I've been assuming that in my explanations. We will be in BOOT_NEW_INSTANCE state until the datasource has been initialized. It's entirely possible that the first attempt can result in "wait till networking is up", so the second attempt is still BOOT_NEW_INSTANCE. We always attempt to apply networking when BOOT_NEW_INSTANCE

If so, then does it follow that BOOT_NEW_INSTANCE should be the toggle for both of those runs? And, therefore, should we actually be using the check for first boot for more than just a log message?

I'm not sure I follow what should be the toggle means. But to the second question, I think the answer is no. First boot is really only relevant because we don't have userdata available yet.

but it does make me wonder if we should be following those modules and tracking whether we have applied network configuration for this instance, rather than trying to infer what we should be doing based on the type of a boot we think we have.
If we haven't yet configured networking, BOOT_NEW_INSTANCE or BOOT will cause it to be applied; if we have, then only BOOT will.

This is essentially what I'm already doing. Or am I misunderstanding what you're saying?

Sorry if I didn't make it clear, the but the case I'm concerned about is when networking is applied on first attempt and userdata specifies BOOT (which we can't access on first boot), what's happening on the second attempt.

CURRENT BEHAVIOR:

master when datasource specifies BOOT_NEW_INSTANCE: first boot, one: apply first boot, two: skip subsequent boot, one: skip subsequent boot, two: skip master when datasource specifies BOOT: first boot, one: apply first boot, two: apply subsequent boot, one: apply subsequent boot, two: apply

The above should be exactly the same on this branch when userdata specifies BOOT_NEW_INSTANCE. However, when userdata specifies BOOT:

BRANCH BEHAVIOR:

branch with userdata BOOT: first boot, one: apply first boot, two: apply or skip based on datasource default because we don't have userdata subsequent boot, one: apply subsequent boot, two: apply

(These scenarios are assuming networking gets applied on the first attempt)

If on first boot, two we skip because of the datasource default, we get a big ugly "Event Denied" log message that will likely be confusing for somebody that doesn't know userdata won't take effect till next boot.

Changing this to instead apply on both attempts isn't difficult, but it would also apply in the BOOT_NEW_INSTANCE case, which is a change in current behavior. This could be important for backwards compatibility, in case we specifically DON'T want network config applied more than once when BOOT_NEW_INSTANCE is specified.

Does my reasoning here make sense? If so, do you think it makes sense to modify the log message on first boot for the case I mentioned?

TheRealFalcon · 2021-04-27T14:29:48Z

@OddBloke should be ready for another review

OddBloke · 2021-04-27T20:40:25Z

cloudinit/stages.py

-                else:
-                    # refresh netcfg after update
-                    netcfg, src = self._find_networking_config()
+        def _boot_event_enabled_and_supported():


Thanks for the update here James, it's much easier to follow what the intent is here!

OddBloke · 2021-04-27T20:43:45Z

cloudinit/stages.py

+        def _boot_event_enabled_and_supported():
+            return self.update_event_enabled(
+                EventType.BOOT, scope=EventScope.NETWORK
+            ) and self.datasource.update_metadata([EventType.BOOT])


This call still throws me off: this isn't just checking for support, but will actually cause the datasource to discard cached data and fetch it anew. That feels to me like it should be two separate calls: "does this datasource support $event_type" and "update the datasource's metadata for $event_type".

It also feels a little odd that update_metadata doesn't take a scope: we only want to update network metadata here, if (e.g.) storage metadata has changed then we don't particularly want to update it.

Relatedly, it may look like update_metadata should just update network metadata (or, generally, metadata for the explicitly configured scopes), but our _get_data implementations generally fetch and store All The Metadata, so I think this will completely refresh all metadata regardless: perhaps not a problem, but worth noting.

TheRealFalcon · 2021-04-28T20:54:57Z

@OddBloke most recent commit uses the semaphore as we discussed. Still needs a ton of test changes though.

default_update_events

Suggestion from OddBloke Co-authored-by: Daniel Watkins <daniel@daniel-watkins.co.uk>

It's a bad idea in general, but this is specifically for tests

…tworking config this boot

OddBloke · 2021-05-10T16:47:51Z

doc/rtd/topics/events.rst

+- **BOOT_NEW_INSTANCE**: ``New instance first boot``
+- **BOOT**: ``Any system boot other than 'BOOT_NEW_INSTANCE'``


We need to update this with the legacy event.

…e doc)

OddBloke

This LGTM now, thanks for all the work, James! I've confirmed that a full LXD container run and a full Azure run both succeed locally.

In canonical#834, we refactored the handling of events for fetching new metadata. Previously, in Azure's __init__, the BOOT event was added to the update_events, so it was assumed that Azure required the standard BOOT behavior, which is to apply metadata twice every boot: once during local-init, then again during standard init phase. https://github.com/canonical/cloud-init/blob/21.2/cloudinit/sources/DataSourceAzure.py#L356 However, this line was effectively meaningless. After the metadata was fetched in local-init, it was then pickled out to disk. Because "update_events" was a class variable, the EventType.BOOT was not persisted into the pickle. When the pickle was then unpickled in the init phase, metadata did not get re-fetched because EventType.BOOT was not present, so Azure is effectely only BOOT_NEW_INSTANCE. Fetching metadata twice during boot causes some issue for pre-provisioning on Azure because updating metadata during re-provisioning will cause cloud-init to poll for reprovisiondata again in DataSourceAzure, which will infinitely return 404(reprovisiondata is deleted from IMDS after health signal was sent by cloud-init during init-local). This makes cloud-init stuck in 'init'

In #834, we refactored the handling of events for fetching new metadata. Previously, in Azure's __init__, the BOOT event was added to the update_events, so it was assumed that Azure required the standard BOOT behavior, which is to apply metadata twice every boot: once during local-init, then again during standard init phase. https://github.com/canonical/cloud-init/blob/21.2/cloudinit/sources/DataSourceAzure.py#L356 However, this line was effectively meaningless. After the metadata was fetched in local-init, it was then pickled out to disk. Because "update_events" was a class variable, the EventType.BOOT was not persisted into the pickle. When the pickle was then unpickled in the init phase, metadata did not get re-fetched because EventType.BOOT was not present, so Azure is effectely only BOOT_NEW_INSTANCE. Fetching metadata twice during boot causes some issue for pre-provisioning on Azure because updating metadata during re-provisioning will cause cloud-init to poll for reprovisiondata again in DataSourceAzure, which will infinitely return 404(reprovisiondata is deleted from IMDS after health signal was sent by cloud-init during init-local). This makes cloud-init stuck in 'init'

TheRealFalcon added the wip Work in progress, do not land label Mar 5, 2021

TheRealFalcon force-pushed the events-user-data branch 2 times, most recently from 852ba9c to ec59faa Compare March 5, 2021 20:52

TheRealFalcon commented Mar 5, 2021

View reviewed changes

OddBloke reviewed Mar 23, 2021

View reviewed changes

TheRealFalcon changed the title ~~WIP: Allow user control over update events~~ Allow user control over update events Apr 12, 2021

TheRealFalcon removed the wip Work in progress, do not land label Apr 12, 2021

OddBloke reviewed Apr 14, 2021

View reviewed changes

TheRealFalcon force-pushed the events-user-data branch from d8ccb3f to 96f4934 Compare April 21, 2021 13:57

TheRealFalcon requested a review from OddBloke April 21, 2021 13:59

OddBloke reviewed Apr 23, 2021

View reviewed changes

TheRealFalcon commented Apr 26, 2021

View reviewed changes

OddBloke reviewed Apr 27, 2021

View reviewed changes

TheRealFalcon force-pushed the events-user-data branch from 18c075c to 9fb6353 Compare April 28, 2021 16:45

TheRealFalcon force-pushed the events-user-data branch from f8cd1bf to 399483a Compare April 28, 2021 21:02

TheRealFalcon requested a review from OddBloke April 29, 2021 14:57

TheRealFalcon and others added 9 commits April 29, 2021 09:58

Bring in initial solution from Ryan

c32dd87

Split update_events into supported_update_events and

39fd8c7

default_update_events

Remove references to policy version and default config

fafc96d

Tweak some logic and cleanup code

02f0799

fix unit tests

b10b22e

addressing initial comments

5f64254

MOAR TESTS

3d91d77

Update cloudinit/stages.py

aea538e

Suggestion from OddBloke Co-authored-by: Daniel Watkins <daniel@daniel-watkins.co.uk>

PR comments: Add boot_type to instance

619b10e

TheRealFalcon added 10 commits April 29, 2021 09:58

Fix test failures

2431d63

test updates

cb5a47c

a heavy refactor and addressing comments

890cf24

Review comments

2d2060a

test ordering is fun

513ac7b

stop modifying default_update_events.

62fd3b3

It's a bad idea in general, but this is specifically for tests

linting

162a41c

Make "update_metadata" a little more clear

fd0a78b

Use /run/cloud-init semaphore to determine if weve already applied ne…

f51d67b

…tworking config this boot

Test updates and dont initialize semaphore in constructor

2d3d9c5

TheRealFalcon force-pushed the events-user-data branch from d931565 to 2d3d9c5 Compare April 29, 2021 14:58

OddBloke reviewed May 10, 2021

View reviewed changes

Update events.rst with legacy info (and some other improvements to th…

384d756

…e doc)

OddBloke approved these changes May 10, 2021

View reviewed changes

Merge branch 'master' into events-user-data

6d20103

TheRealFalcon force-pushed the events-user-data branch from 4549f37 to 6d20103 Compare May 11, 2021 15:20

Merge branch 'master' into events-user-data

b18be76

TheRealFalcon merged commit 8643469 into canonical:master May 13, 2021

TheRealFalcon deleted the events-user-data branch May 13, 2021 17:55

ubuntu-server-builder mentioned this pull request May 12, 2023

Release 21.3 #3898

Closed

	upon serveral event types. The two most common events for cloud-init
	upon several event types. The two most common events for cloud-init

	a platform who's datasource cannot support the event.
	a platform whose datasource cannot support the event.

		event occur. Currently there are two known scopes: ``network`` and
		``storage``. Scopes are defined by convention but arbitrary values

	Each Datasource will provide a set of events that it is capable of handling.
	Each Datasource will declare a set of these events that it is capable of handling.

	def has_run(self, name, freq):
	if not freq or freq == PER_ALWAYS:
	return False

	cname = canon_sem_name(name)
	sem_file = self._get_path(cname, freq)
	# This isn't really a good atomic check
	# but it suffices for where and when cloudinit runs
	if os.path.exists(sem_file):
	return True

	# this case could happen if the migrator module hadn't run yet
	# but the item had run before we did canon_sem_name.
	if cname != name and os.path.exists(self._get_path(name, freq)):
	LOG.warning("%s has run without canonicalized name [%s].\n"
	"likely the migrator has not yet run. "
	"It will run next boot.\n"
	"run manually with: cloud-init single --name=migrator",
	name, cname)
	return True

	return False

		self.original_supported_events = copy.deepcopy(
		self.datasource.supported_update_events)

		cls.original_supported_events = copy.deepcopy(
		DataSource.supported_update_events)

		- BOOT_NEW_INSTANCE: ``New instance first boot``
		- BOOT: ``Any system boot other than 'BOOT_NEW_INSTANCE'``

Allow user control over update events #834

Allow user control over update events #834

Conversation

TheRealFalcon commented Mar 5, 2021 • edited Loading

Proposed Commit Message

Additional Context

Test Steps

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheRealFalcon commented Mar 8, 2021

OddBloke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OddBloke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheRealFalcon commented Apr 21, 2021

OddBloke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheRealFalcon commented Apr 27, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheRealFalcon commented Apr 28, 2021

Choose a reason for hiding this comment

OddBloke left a comment

Choose a reason for hiding this comment

TheRealFalcon commented Mar 5, 2021 •

edited

Loading