-
Notifications
You must be signed in to change notification settings - Fork 818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug(status): retry systemctl show to avoid dbus disconnect #4679
bug(status): retry systemctl show to avoid dbus disconnect #4679
Conversation
When cloud-init encounters user-data and vendor-data that does not contain a valid header, cloud-init logs a warning and creates an empty processed user-data or vendor-data file at /var/lib/cloud/instane/cloud-config.txt or vendor-cloud-config.txt. In cases where invalid user-data or vendor-data was provided, prefer to process the raw data files to alert the end-user to the specific failure in parsing the raw data provided. This approach improves error reporting from the CLI when user-data, vendor-data or vendor2-data contain undeclared or invalid headers. Add a utility function get_processed_or_fallback_path which will return the preferred data path processed or raw fallback file depending on whether the processed file or the raw data file has content. Two other minor changes: - Align exit 1 behavior for cloud-init schema --annotate command with and without --annotate parameter - refactor parts of handle_schema_args into simpler helper functions, _assert_exclusive_args and get_config_paths_from_args
Move get_processed_or_fallback_path out of helpers and into schema.py. Add type hinting and fixup docstring.
e091fcc
to
b4f683b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This leaves users that want a quick response to cloud-init status
waiting for dbus. What if we did something different:
In this case, obviously cloud-init isn't in "done"
or "degraded done"
state yet, and it can't be in "not run"
or "disabled"
either, given the check before the call to _get_systemd_status()
so that leaves either "error"
or "running"
or "degraded running"
. Since this function only returns "running"
or "error"
("degraded"
gets assigned later), and the absence of dbus means that we cannot introspect to discover an "error"
, I think that it would be reasonable for this function to return "running"
when this happens, because to the best of our knowledge, it is running. If cloud-init is called with --wait
, it will just loop anyways. In the case that cloud-init status
was called without --wait
, the user requested an immediate status, not to wait for something to get a status. We should report the status, to the best of our knowledge.
Also, looking at the call site of _get_systemd_status()
, I see no reason that this function should be called if we have already seen that status is set to UXAppStatus.ERROR
. The presence of an error cannot be proven to be "not an error" by systemd introspection. We already saw that there are errors, so in that case this should just fall through.
error becomes "not an error" when cloud-init is still running. The reason for that code was that |
Closing this approach. As Brett brought up, retrying in all cases is not ideal if we are just looking for a quick status on cloud-init without blocking on more |
Proposed Commit Message
Additional Context
Test Steps
Checklist
Merge type