Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only invoke hotplug socket when functionality is enabled (SC-251) #952

Merged
merged 8 commits into from Aug 13, 2021

Conversation

TheRealFalcon
Copy link
Member

Proposed Commit Message

Only invoke hotplug socket when functionality is enabled

Alter's hotplug hook to have a query mechanism checking if the
functionality is enabled. This allows us to avoid using the hotplug
socket and service when hotplug is disabled.

`cloud-init devel hotplug-hook -s net query` will return 'enabled' or
'disabled' depending on hotplug availability.

Additional Context

n/a

Test Steps

see integration test

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

@github-actions
Copy link

github-actions bot commented Aug 5, 2021

Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close.

If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging mitechie, and he will ensure that someone takes a look soon.

(If the pull request is closed and you would like to continue working on it, please do tag mitechie to reopen it.)

@github-actions github-actions bot added the stale-pr Pull request is stale; will be auto-closed soon label Aug 5, 2021
@TheRealFalcon TheRealFalcon removed the stale-pr Pull request is stale; will be auto-closed soon label Aug 5, 2021
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. minor changes for CLI usability.

udevaction=args.udevaction,
)
if args.hotplug_action == 'query':
hotplug_init.fetch(existing="trust")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we run this before the datasource is detected, we trace here:

cloudinit.sources.DataSourceNotFoundException: Did not find any data source, searched classes: (DataSourceNone)

We might want to catch that case and at least report "unknown" or "no-datasource" etc.

Suggested change
hotplug_init.fetch(existing="trust")
try:
hotplug_init.fetch(existing="trust")
except cloudinit.sources.DataSourceNotFoundException:
print("Unable to determine hotplug state. No datasource detected")
sys.exit(1)

Also is it worth just baking this hotplug_init.fetch(existing="trust") into is_enabled() function because I see this same potential problem in handle_hotplug too?

I realize generally this cmdline shouldn't normally get run until after the datasource detection has occurred, just a race there until that discovery happens.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our udev hook should disable any hotplug commands until cloud-init has finished its initial run. How did you get this exception to happen? I don't necessarily disagree with your suggestion, just trying to understand.

Edit: Nevermind...hadn't fully swapped this code back into my head 😄 . I committed something very similar to your suggestion.

cloudinit/cmd/devel/hotplug_hook.py Show resolved Hide resolved
@@ -133,6 +149,14 @@ def device_detected(self) -> bool:
}


def is_enabled(hotplug_init, subsystem):
scope = SUBSYSTEM_PROPERTES_MAP[subsystem][1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run try/except hotplug_init.fetch(existing="trust") here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want it here. I applied the try/except to the other place you suggested, but I don't think we always want to print and exit 1 when we get that exception. E.g., on line 174, if we have a problem there, I want it written the logs. I'm ok with the exception just falling through at that point.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on avoiding the this suggestion iside is_enabled. I think there is another gap that we are missing below though with cloud-init devel handle calls.

TheRealFalcon and others added 4 commits August 10, 2021 15:09
Alter's hotplug hook to have a query mechanism checking if the
functionality is enabled. This allows us to avoid using the hotplug
socket and service when hotplug is disabled.
Co-authored-by: Chad Smith <chad.smith@canonical.com>
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the minor nit about the subparser title I think there are two other things:

UeventHandler.update_metadata raises a RuntimeError when a datasource is not configured to with a matching self.supported_update_event type for the subsystem and or hotplug. This configuration is static, so retrying on this RuntimeError doesn't make sense.

I'd prefer we just try handler.update_metadata and if fail we break the retry look in handle_hotplug.

Example, trying to provide the following in a lxc container:/etc/cloud/cloud.cfg

updates:
    network:
       when: ['boot', 'hotplug']

The result is blocking repeated RuntimeErrors logged with

RuntimeError: Datasource DataSourceNoCloud [seed=/var/lib/cloud/seed/nocloud-net][dsmode=net] not updated for event hotplug

I think instead we probably want to break the retry loop with a more specific failure mode if someone tries to enable hotplug on an unsupported datasource.

Here's a diff to ponder that fixes up a couple things I've suggested

diff --git a/cloudinit/cmd/devel/hotplug_hook.py b/cloudinit/cmd/devel/hotplug_hook.py
index 5c85c623a..3e1ffeecc 100644
--- a/cloudinit/cmd/devel/hotplug_hook.py
+++ b/cloudinit/cmd/devel/hotplug_hook.py
@@ -38,7 +38,9 @@ def get_parser(parser=None):
         choices=['net']
     )
 
-    subparsers = parser.add_subparsers(dest='hotplug_action')
+    subparsers = parser.add_subparsers(
+        title="Hotplug Action", dest='hotplug_action'
+    )
     subparsers.required = True
 
     subparsers.add_parser(
@@ -100,14 +102,8 @@ class UeventHandler(abc.ABC):
         return self.success_fn()
 
     def update_metadata(self):
-        result = self.datasource.update_metadata_if_supported([
         return self.success_fn()
 
     def update_metadata(self):
-        result = self.datasource.update_metadata_if_supported([
+        return self.datasource.update_metadata_if_supported([
             EventType.HOTPLUG])
-        if not result:
-            raise RuntimeError(
-                'Datasource %s not updated for '
-                'event %s' % (self.datasource, EventType.HOTPLUG)
-            )
-        return result
 
 
 class NetHandler(UeventHandler):
@@ -194,7 +190,12 @@ def handle_hotplug(
         )
         try:
             LOG.debug('Refreshing metadata')
-            event_handler.update_metadata()
+            if not event_handler.update_metadata():
+                LOG.debug(
+                    "Datasource%s does not support hotplug events for"
+                    " subsystem %s", event_handler.datasource.dsname, subsystem
+                )
+                break  # Don't retry as supported_update_events are hard-coded
             LOG.debug('Detecting device in updated metadata')
             event_handler.detect_hotplugged_device()
             LOG.debug('Applying config change')

@@ -133,6 +149,14 @@ def device_detected(self) -> bool:
}


def is_enabled(hotplug_init, subsystem):
scope = SUBSYSTEM_PROPERTES_MAP[subsystem][1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on avoiding the this suggestion iside is_enabled. I think there is another gap that we are missing below though with cloud-init devel handle calls.

choices=['net']
)

subparsers = parser.add_subparsers(dest='hotplug_action')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve title so it matches error message if absent param on cmdline

Suggested change
subparsers = parser.add_subparsers(dest='hotplug_action')
subparsers = parser.add_subparsers(title="Hotplug Action", dest='hotplug_action')

This changes --help from positional arguments: to Hotplug Action:

We could just make title="Command", dest=command too as lxc & ua-tools do

@TheRealFalcon
Copy link
Member Author

TheRealFalcon commented Aug 13, 2021

@blackboxsw , thanks, I definitely missed a condition there.

I think instead we probably want to break the retry loop with a more specific failure mode if someone tries to enable hotplug on an unsupported datasource.

I updated the code so that before we attempt to do any metadata update, we first fetch the datasource and ensure it is supported and enabled. I didn't do that for the query previously.

I still haven't tested yet (so sorry if there's major dumb errors), but let me know if this idea works for you.

Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent @TheRealFalcon . tested on lxc and openstack, cli docs are good and behavior is expected

2021-08-13 20:11:27,422 - handlers.py[DEBUG]: finish: hotplug-hook: SUCCESS: Handle reconfiguration on hotplug events

:)

@TheRealFalcon TheRealFalcon merged commit 6560740 into canonical:main Aug 13, 2021
@TheRealFalcon TheRealFalcon deleted the hotplug-query branch August 13, 2021 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants