Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix last reported event possibly not being sent #1796

Merged
merged 5 commits into from Nov 10, 2022

Conversation

TheRealFalcon
Copy link
Member

Proposed Commit Message

Fix last reported event possibly not being sent

The run of every cloud-init mode is wrapped in a reporting
context manager. The final flush of events before the process exits was
happening within the context manager, however, one final event is sent
when the context manager exits. Since this event isn't subject to
waiting for event flush, cloud-init can exit before this event gets
sent.

This commit fixes this issue and also adds logging of POST data when
POSTING to a URL.

Additional Context

New log format actually includes useful information now:

2022-10-25 16:37:56,759 - url_helper.py[DEBUG]: [0/4] open 'http://10.20.30.1:55555/new_stuff' with {'url': 'http://10.20.30.1:55555/new_stuff', 'stream': False, 'allow_redirects': True, 'method': 'POST', 'timeout': 5.0, 'headers': {'User-Agent': 'Cloud-Init/22.3'}} configuration and POST data: {"name": "init-network/config-set_hostname", "description": "config-set_hostname ran successfully", "event_type": "finish", "origin": "cloudinit", "timestamp": 1666715876.7045417, "result": "SUCCESS"}

Test Steps

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

The run of every cloud-init mode is wrapped in a reporting
context manager. The final flush of events before the process exits was
happening within the context manager, however, one final event is sent
when the context manager exits. Since this event isn't subject to
waiting for event flush, cloud-init can exit before this event gets
sent.

This commit fixes this issue and also adds logging of POST data when
POSTING to a URL.
i,
"infinite" if infinite else manual_tries,
url,
filtered_req_args,
f" and POST data: {data}" if data else "",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to scrub this post data to redact secrets right this could be a security problem as we'd log all potential post data provided to this common function to /var/log/cloud-init.log which is world-readable.

Two known areas that may cause problems, or lead to sensitive data leaks:
Azure and GCP both send data in their requests.
cloudinit/sources/helpers/azure.py: http_with_retries
cloudinit/sources/DataSourceGCE.py:_write_host_key_to_guest_attributes maybe?

@TheRealFalcon
Copy link
Member Author

I pushed an alternative for logging. Log the event data rather than the readurl data directly. Looks like this:

2022-10-27 21:17:24,062 - handlers.py[DEBUG]: Queuing POST to http://127.0.0.1:55555/, data: {'name': 'init-network/activate-datasource', 'description': 'activating datasource', 'event_type': 'start', 'origin': 'cloudinit', 'timestamp': 1666905444.0623877}

Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also something seems to be falling over as far as network being brought up as I run through this integration test on a lxd_vm platform. It is likely something unrelated that landed in tip of main, but I haven't been able to put my finger on it yet. So, a bit more testing is needed to tease out whether network not being brought up is related to this branch or something else in main.

I'd uploaded the failing bits (tip of main plus your branch for kinetic) to ppa:chad.smith/maas-testing
So, we should be able to peek at VM integration test failures:

CLOUD_INIT_KEEP_INSTANCE=1 CLOUD_INIT_PLATFORM=lxd_container CLOUD_INIT_OS_IMAGE=kinetic CLOUD_INIT_CLOUD_INIT_SOURCE=ppa:chad.smith/maas-testing tox -e integration-tests tests/integration_tests/reporting/test_webhook_reporting.py

Comment on lines +60 to +63
ds_events = [
e for e in events if e["name"] == "init-network/activate-datasource"
]
assert len(ds_events) == 2 # 1 for start, 1 for stop
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert there are paired start and finish events for each scope. I'm thinking we are occasionally seeing starts or finishes getting swallowed across bring up of a stage and teardown.
Minimally I want to assert each high level boot stage scope has bookend start/finish events:
As it is currently, it seems like we may be missing a matching start event for

  "name": "modules-final",
  "event_type": "finish",

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't assert this, because we still don't get every start event. Once that's fixed, we can make the change to this test.

Co-authored-by: Chad Smith <chad.smith@canonical.com>
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think we can iterate on the start event issues separately.
This commit message should probably also capture that it fixes:

LP: #1993836
in the footer of the message.

@blackboxsw
Copy link
Collaborator

We can address https://bugs.launchpad.net/cloud-init/+bug/1992711 separately for squelched "start" events.

@TheRealFalcon TheRealFalcon merged commit 892ad9e into canonical:main Nov 10, 2022
@TheRealFalcon TheRealFalcon deleted the fix-reporting branch November 10, 2022 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants