New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix last reported event possibly not being sent #1796
Conversation
The run of every cloud-init mode is wrapped in a reporting context manager. The final flush of events before the process exits was happening within the context manager, however, one final event is sent when the context manager exits. Since this event isn't subject to waiting for event flush, cloud-init can exit before this event gets sent. This commit fixes this issue and also adds logging of POST data when POSTING to a URL.
cloudinit/url_helper.py
Outdated
i, | ||
"infinite" if infinite else manual_tries, | ||
url, | ||
filtered_req_args, | ||
f" and POST data: {data}" if data else "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to scrub this post data to redact secrets right this could be a security problem as we'd log all potential post data
provided to this common function to /var/log/cloud-init.log which is world-readable.
Two known areas that may cause problems, or lead to sensitive data leaks:
Azure and GCP both send data in their requests.
cloudinit/sources/helpers/azure.py: http_with_retries
cloudinit/sources/DataSourceGCE.py:_write_host_key_to_guest_attributes maybe?
I pushed an alternative for logging. Log the event data rather than the
|
58a0579
to
74decde
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also something seems to be falling over as far as network being brought up as I run through this integration test on a lxd_vm platform. It is likely something unrelated that landed in tip of main, but I haven't been able to put my finger on it yet. So, a bit more testing is needed to tease out whether network not being brought up is related to this branch or something else in main.
I'd uploaded the failing bits (tip of main plus your branch for kinetic) to ppa:chad.smith/maas-testing
So, we should be able to peek at VM integration test failures:
CLOUD_INIT_KEEP_INSTANCE=1 CLOUD_INIT_PLATFORM=lxd_container CLOUD_INIT_OS_IMAGE=kinetic CLOUD_INIT_CLOUD_INIT_SOURCE=ppa:chad.smith/maas-testing tox -e integration-tests tests/integration_tests/reporting/test_webhook_reporting.py
ds_events = [ | ||
e for e in events if e["name"] == "init-network/activate-datasource" | ||
] | ||
assert len(ds_events) == 2 # 1 for start, 1 for stop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert there are paired start and finish events for each scope. I'm thinking we are occasionally seeing starts or finishes getting swallowed across bring up of a stage and teardown.
Minimally I want to assert each high level boot stage scope has bookend start/finish events:
As it is currently, it seems like we may be missing a matching start event for
"name": "modules-final",
"event_type": "finish",
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't assert this, because we still don't get every start event. Once that's fixed, we can make the change to this test.
Co-authored-by: Chad Smith <chad.smith@canonical.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think we can iterate on the start event issues separately.
This commit message should probably also capture that it fixes:
LP: #1993836
in the footer of the message.
We can address https://bugs.launchpad.net/cloud-init/+bug/1992711 separately for squelched "start" events. |
Proposed Commit Message
Additional Context
New log format actually includes useful information now:
Test Steps
Checklist: