-
Notifications
You must be signed in to change notification settings - Fork 822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Warn user of unexpected run mode #5209
base: main
Are you sure you want to change the base?
feat: Warn user of unexpected run mode #5209
Conversation
On systemd, services are started by PID 1. When this doesn't happen, cloud-init is in an unknown run state and should warn the user. Reorder pid log to be able to reuse Distro information. Add docstring deprecating util.is_Linux().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking some boxes:
- ✔ Alpine (openrc-based) will invoke cloud-init boot stages from non-1 PPID, but we are ignoring non-systemd cases
- ✔non-linux and therefore non-systemd is ignored
Here's a suggested diff to ensure the logs are actually written, and the log message has a bit more data about the launched boot stage. It doesn't resolve either the lack of logging PID on non-linux/non-systemd or the dropping of util.is_Linux
.
diff --git a/cloudinit/cmd/main.py b/cloudinit/cmd/main.py
index ad9910e04..aff4d380f 100644
--- a/cloudinit/cmd/main.py
+++ b/cloudinit/cmd/main.py
@@ -72,15 +72,20 @@ def print_exc(msg=""):
sys.stderr.write("\n")
-def log_ppid(distro):
+def log_ppid(distro, bootstage_name):
if distro.is_linux:
ppid = os.getppid()
log = LOG.info
extra_message = ""
if 1 != ppid and distro.uses_systemd():
log = LOG.warning
- extra_message = ("Not a supported configuration.",)
- log("PID [%s] started cloud-init. %s", ppid, extra_message)
+ extra_message = " Unsupported configuration: boot stage called outside of systemd"
+ log(
+ "PID [%s] started cloud-init '%s'.%s",
+ ppid,
+ bootstage_name,
+ extra_message,
+ )
def welcome(action, msg=None):
@@ -321,14 +326,11 @@ def main_init(name, args):
# objects config as it may be different from init object
# 10. Run the modules for the 'init' stage
# 11. Done!
- if not args.local:
- w_msg = welcome_format(name)
- else:
- w_msg = welcome_format("%s-local" % (name))
+ bootstage_name = "%s-local" % (name) if args.local else name
+ w_msg = welcome_format(bootstage_name)
init = stages.Init(ds_deps=deps, reporter=args.reporter)
# Stage 1
init.read_cfg(extract_fns(args))
- log_ppid(init.distro)
# Stage 2
outfmt = None
errfmt = None
@@ -354,6 +356,7 @@ def main_init(name, args):
# config applied. We send the welcome message now, as stderr/out have
# been redirected and log now configured.
welcome(name, msg=w_msg)
+ log_ppid(init.distro, bootstage_name)
# re-play early log messages before logging was setup
for lvl, msg in early_logs:
@@ -581,11 +584,11 @@ def main_modules(action_name, args):
# the modules objects configuration
# 5. Run the modules for the given stage name
# 6. Done!
- w_msg = welcome_format("%s:%s" % (action_name, name))
+ bootstage_name = "%s:%s" % (action_name, name)
+ w_msg = welcome_format(bootstage_name)
init = stages.Init(ds_deps=[], reporter=args.reporter)
# Stage 1
init.read_cfg(extract_fns(args))
- log_ppid(init.distro)
# Stage 2
try:
init.fetch(existing="trust")
@@ -620,6 +623,7 @@ def main_modules(action_name, args):
# now that logging is setup and stdout redirected, send welcome
welcome(name, msg=w_msg)
+ log_ppid(init.distro, bootstage_name)
if name == "init":
util.deprecate(
cloudinit/cmd/main.py
Outdated
@@ -324,6 +328,7 @@ def main_init(name, args): | |||
init = stages.Init(ds_deps=deps, reporter=args.reporter) | |||
# Stage 1 | |||
init.read_cfg(extract_fns(args)) | |||
log_ppid(init.distro) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cannot happen until after log.setup_logging
has been called.
cloudinit/cmd/main.py
Outdated
@@ -580,6 +585,7 @@ def main_modules(action_name, args): | |||
init = stages.Init(ds_deps=[], reporter=args.reporter) | |||
# Stage 1 | |||
init.read_cfg(extract_fns(args)) | |||
log_ppid(init.distro) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to move this down a bit too after log.setup_logging
otherwise no logs are written to /var/log/cloud-init.log. To retain original behavior, we may want to log this just after the welcome(... w_msg)
to log it during the boot stage block.
cloudinit/cmd/main.py
Outdated
extra_message = "" | ||
if 1 != ppid and distro.uses_systemd(): | ||
log = LOG.warning | ||
extra_message = ("Not a supported configuration.",) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add details in the extra message to tell use what's unsupported about it.
Unsupported configuration: boot stage called outside of systemd
@@ -490,6 +490,12 @@ def multi_log( | |||
|
|||
@lru_cache() | |||
def is_Linux(): | |||
"""deprecated: prefer Distro object's `is_linux` property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could just drop this function altogether if we can figure a better approach to sort cloud-init analyze dump
as the only other use-case right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep!
cloudinit/cmd/main.py
Outdated
if 1 != ppid and distro.uses_systemd(): | ||
log = LOG.warning | ||
extra_message = ("Not a supported configuration.",) | ||
log("PID [%s] started cloud-init. %s", ppid, extra_message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we intentionally not logging the PID when not distro.is_linux
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what the code did before, but I suppose we should be able to log PPID on non-linux too. I'm curious whether the PPID=1 should hold true on any of the BSDs, and it wouldn't hurt to log it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good and behaves well. One thing I think we need to sort here is integration tests which call cloud-init --local over SSH as they'll now exit non-zero due to the warning.
Tests affected are at least the following which both check response.ok
from a cloud-init init --local call.
- tests/integration_tests/datasources/test_ec2_ipv6.py
- tests/integration_tests/test_upgrade.py
We might be able to adapt those tests to generally disregard exit 2 or maybe specifically call verify_clean_boot with a non-empty ignore_warnings item to ignore this expected warning log.
Because we patched out the We would see an |
yep and yep. I think it's worth a followup quilt patch in ubuntu/noble:debian/patches/retain-ppid-unsupported-config-as-info-level.patch to ensure we don't set warning level on the logs in noble. |
we could always just make this a deprecation log and deal with this in the same way that we choose to deal with other new deprecations on older series - a patch per log like this seems really expensive to maintain in the long run |
That is a much better idea. Good suggestion, let's go with deprecation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So on stable releaeses, we'll patch the warning to a deprecation. Only thing left to land this PR is integration test fixes per this comment
I think this is just waiting on two integration test fixes. But, I'd like this in 24.3 as we are introducing a "breaking change" as far as warnings being emitted where there were not warnings before-hand and I'd like us to avoid trying to quickly stuff quilt patches into /noble for this across the SRU boundary for 24.2 |
Resolving this involves either:
I don't think that we want 1) (it involves muddying
In order to make I'll share what I've been working on shortly. |
Implement verify_clean_boot() to ignore certain expected logs in a platform-specific way.
@blackboxsw Last commit contains what I had in mind. With this commit I can successfully run
And it passes. Thoughts? |
Proposed Commit Message
Merge type