Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements in waagent.log #2558

Merged
merged 4 commits into from Apr 19, 2022
Merged

Conversation

narrieta
Copy link
Member

This PR includes several improvements in the agent's local log that help readability and debugging of goal state issues.

  • Added a new message at the beginning of the ExtensionHandler similar to the Daemon's. This helps point out when the former is actually starting
  • Added blank lines around fetching the goal state and processing it
  • Improved the messages around certificates to make clearer when they are being downloaded/installed
  • Saved the certificates thumbprints to the history folder
  • Added a check for FastTrack goal states for which certificates have not been installed yet

Lastly, I added waagent_status to the list of legacy files that need to be cleaned up from the agent's root directory.

larohra
larohra previously approved these changes Apr 19, 2022
Copy link
Contributor

@larohra larohra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor comments else LGTM

continue
certificates = self.certs.summary
if not any(settings.certificateThumbprint == c['thumbprint'] for c in certificates):
message = "Certificate {0} needed by {1} is missing from the goal state".format(settings.certificateThumbprint, extension.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, do we skip GS or just log error and keep going?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point it just writes to the local log to, should this ever happen, make the error obvious, and to collect telemetry.

As far as handling the error, at this point the code already handles all the scenarios we anticipated. If we are missing other scenarios the telemetry would point this out and then we can decide how to handle those.

@@ -324,20 +324,11 @@ def run(self, debug=False):
"""

try:
logger.info("{0} Version: {1}", AGENT_LONG_NAME, CURRENT_AGENT)
logger.info("OS: {0} {1}", DISTRO_NAME, DISTRO_VERSION)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Repeated logs (Line 328-329 are already being logged in line 345)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this repeats some of the info on L345, but removing the duplicated info is not worth it. Also, the intention is to make it as close as possible to the same message logged by the Daemon. And L345 is also send to telemetry.

@codecov
Copy link

codecov bot commented Apr 19, 2022

Codecov Report

Merging #2558 (306c906) into release-2.8.0.0 (50eaac1) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@                 Coverage Diff                 @@
##           release-2.8.0.0    #2558      +/-   ##
===================================================
+ Coverage            72.08%   72.11%   +0.02%     
===================================================
  Files                  102      102              
  Lines                15314    15336      +22     
  Branches              2426     2431       +5     
===================================================
+ Hits                 11039    11059      +20     
+ Misses                3787     3786       -1     
- Partials               488      491       +3     
Impacted Files Coverage Δ
...protocol/extensions_goal_state_from_vm_settings.py 83.33% <ø> (ø)
azurelinuxagent/common/protocol/goal_state.py 95.44% <100.00%> (+0.18%) ⬆️
azurelinuxagent/common/utils/archive.py 88.65% <100.00%> (+0.24%) ⬆️
azurelinuxagent/common/version.py 70.23% <100.00%> (ø)
azurelinuxagent/daemon/main.py 71.13% <100.00%> (ø)
azurelinuxagent/ga/exthandlers.py 85.72% <100.00%> (+0.01%) ⬆️
azurelinuxagent/ga/update.py 88.24% <100.00%> (+0.05%) ⬆️
azurelinuxagent/ga/collect_telemetry_events.py 89.72% <0.00%> (-1.72%) ⬇️
azurelinuxagent/common/event.py 86.03% <0.00%> (+0.67%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9e0bb7b...306c906. Read the comment docs.

@@ -303,6 +317,7 @@ def _fetch_full_wire_server_goal_state(self, incarnation, xml_doc):
# Note that we do not save the certificates to the goal state history
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we save the certificate to history folder, should we remove this comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not save the certificate to the history folder, we never should do that since it is a secret. we only save the thrumbprint, which can be helpful when debugging

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, i see the confusion now... i'll update the comment to say we save the summary

# Initialize the common parameters for telemetry events
initialize_event_logger_vminfo_common_parameters(protocol)

# Send telemetry for the OS-specific info.
add_event(AGENT_NAME, op=WALAEventOperation.OSInfo, message=os_info_msg)
Copy link
Contributor

@nagworld9 nagworld9 Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intentional? why don't we keep this line next to the logger L345.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

telemetry depends on the goal state, so we need to initialize the gs first (line 353)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's too big of a deal, but is there any reason not to move the logger.info line down till after grabbing the goal state?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the log messages wrt the goal state tend to obscure the ID info. The intention is to have all the information that identifies the agent at the top of the log


self.assertTrue(
len(events) == 1,
"Missing certificate 59A10F50FFE2A0408D3F03FE336C8FD5716CF25C was note reported. Telemetry: {0}".format([kwargs['message'] for _, kwargs in add_event.call_args_list]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: note reported vs not reported

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, will update on my next PR

@narrieta narrieta merged commit fa8e370 into Azure:release-2.8.0.0 Apr 19, 2022
@narrieta narrieta deleted the waagent-log branch April 20, 2022 03:41
narrieta added a commit to narrieta/WALinuxAgent that referenced this pull request Apr 25, 2022
* Improvements in waagent.log

* fix function call

* update comment

* typo

Co-authored-by: narrieta <narrieta>
(cherry picked from commit fa8e370)
narrieta added a commit that referenced this pull request Apr 26, 2022
* Improvements in waagent.log

* fix function call

* update comment

* typo

Co-authored-by: narrieta <narrieta>
(cherry picked from commit fa8e370)
nagworld9 added a commit that referenced this pull request Mar 7, 2023
* release prepare-2.4.0.0 (#2280)

* Fix bug with dependent extensions with no settings (#2285)

* update test-requirements to pin pylint. (#2288) (#2299)

Co-authored-by: Kevin Clark <kevin.clark@microsoft.com>

* Do not create placeholder status file for AKS extensions (#2298)

* Exception for Linux Patch Extension for creating placeholder status file (#2307)

* update release version (#2308)

* Dont create default status file for Single-Config extensions (#2318)

* version update (#2319)

* Update Version (#2348)

* Fix bug with dependent extensions with no settings (#2285) (#2349)

Co-authored-by: Laveesh Rohra <larohra@microsoft.com>

* Use handler status if extension status is None when computing the Ext… (#2358)

* Use handler status if extension status is None when computing the ExtensionsSummary

* Add NotReady to convergence statuses

Co-authored-by: narrieta <narrieta>

* Release preparation 2.5.0.1 (#2360)

* Define ExtensionsSummary.__eq__ (#2371)

* define ExtensionsSummary.__eq__

* Fix test name

* Fix test name

* Fix test name

Co-authored-by: narrieta <narrieta>

* Release preparation 2.5.0.2 (#2372)

* update cgroups monitoring expiry date (#2392)

* Updated Agent Version to 2.6.0.0 (#2393)

* Do not parse status blob (#2394)

Co-authored-by: narrieta <narrieta>

* Exclude dcr from setup (#2396)

* Improve error message for vmSettings errors (#2397)

* Improve error message for vmSettings errors

* Add etag and correlation id

* pylint warning

Co-authored-by: narrieta <narrieta>

* Updated agent version to 2.6.0.1 (#2400)

* Verify that extensions goal state from vmsettings has been initialized (#2404)

Co-authored-by: narrieta <narrieta>

* Updated agent version to 2.6.0.2 (#2405)

* Set Agent version to 2.7.0.0 (#2457)

* Set Agent version to 2.7.0.0

* Remove duplicate import

Co-authored-by: narrieta <narrieta>

* Simplify the logic to update the extensions goal state (#2466)

* Simplify the logic to update the extensions goal state

* Added telemetry for NotSupported

* Added comments

* Do not support old hostgaplugin

Co-authored-by: narrieta <narrieta>

* Retry update_goal_state on GoalStateMismatchError (#2470)

* Retry update_goal_state on GoalStateMismatchError

* Add sleep before retry

* Enable test

* Enable test

* Update test

* Add unit test

* Add data file

* pylint warning

* Add comment; fix typos

* fix typo

Co-authored-by: narrieta <narrieta>

* Set agent version to 2.7.0.1 (#2473)

* Redact settings from mismatch message (#2477)

Co-authored-by: narrieta <narrieta>

* Set Agent Version to 2.7.0.2 (#2478)

Co-authored-by: narrieta <narrieta>

* Improvements in telemetry for vmSettings (#2482)

Co-authored-by: narrieta <narrieta>

* Set Agent version to 2.7.0.3 (#2483)

* Do not raise on missing status blob; reduce amount of logging for vms… (#2492)

* Do not raise on missing status blob; reduce amount of logging for vmsettings

* remove extra file; fix typo

Co-authored-by: narrieta <narrieta>

* Set agent version to 2.7.0.4 (#2494)

Co-authored-by: narrieta <narrieta>

* Save vmSettings on parse errors; improve messages in parse errors (#2503)

* Save vmSettings on parse errors; improve messages in parse errors

* pylint warnings

* pylint warnings

Co-authored-by: narrieta <narrieta>

* Set agent version to 2.7.0.5 (#2504)

Co-authored-by: narrieta <narrieta>

* Revert "Set agent version to 2.7.0.5 (#2504)" (#2505)

This reverts commit ae5a222.

Co-authored-by: narrieta <narrieta>

* ignore firewall packets reset error, check enable firewall config flag and extend cgroup extension monitoring expiry time (#2502)

* Set agent version to 2.7.0.5 (#2506)

Co-authored-by: narrieta <narrieta>

* Handle OOM errors by stopping the periodic log collector (#2510)

* Set agent version to 2.7.0.6 (#2511)

* Add keep_alive property to collect_logs (#2533)

* Set agent version to 2.7.1.0. (#2534)

* Set agent version to 2.8.0.0 (#2545)

Co-authored-by: narrieta <narrieta>

* Update HGAP minimum required version for FastTrack (#2549)

Co-authored-by: narrieta <narrieta>

* Update agent version to 2.8.0.1 (#2550)

Co-authored-by: narrieta <narrieta>

* Improvements in waagent.log (#2558)

* Improvements in waagent.log

* fix function call

* update comment

* typo

Co-authored-by: narrieta <narrieta>

* Change format of history items (#2560)

* Change format of history directory

* Update message; fix typo

* py2 compat

* py2 compat

Co-authored-by: narrieta <narrieta>

* Update agent version to 2.8.0.2 (#2561)

Co-authored-by: narrieta <narrieta>

* Refresh goal state when certificates are missing (#2562)

* Refresh goal state when certificates are missing

* Improve error reporting

* Fix assert message

Co-authored-by: narrieta <narrieta>

* Update agent version to 2.8.0.3 (#2563)

Co-authored-by: narrieta <narrieta>

* Do not mark goal state as processed when goal state fails to update (#2569)

Co-authored-by: narrieta <narrieta>

* Update agent version to 2.8.0.4 (#2570)

Co-authored-by: narrieta <narrieta>

* Bug fix for fetching a goal state with empty certificates property (#2575)

* Move error counter reset down to end of block. (#2576)

* Bug Fix: Change fast track timestamp default from None to datetime.min (#2577)

* Update agent version to 2.8.0.5. (#2580)

* Create placeholder GoalState.*.xml file (#2594)

Co-authored-by: narrieta <narrieta>

* Update version to 2.8.0.6 (#2595)

Co-authored-by: narrieta <narrieta>

* fix network interface restart in RHEL9 (#2592) (#2612)

(cherry picked from commit b8ca432)

* set agent version to 2.7.2.0 (#2613)

* Parse missing agent manifests as empty

* Set agent version to 2.8.0.7

* Retry HGAP's extensionsArtifact requests on BAD_REQUEST status (#2621)

* Retry HGAP's extensionsArtifact requests on BAD_REQUEST status

* python 2.6 compat

Co-authored-by: narrieta <narrieta>

* Retry HGAP's extensionsArtifact requests on BAD_REQUEST status (#2621) (#2622)

* Retry HGAP's extensionsArtifact requests on BAD_REQUEST status

* python 2.6 compat

Co-authored-by: narrieta <narrieta>
(cherry picked from commit dbc82d3)

* fix if command in rhel v8.6+ (#2624)

* Set agent version to 2.7.3.0 (#2625)

Co-authored-by: narrieta <narrieta>

* Set Agent version to 2.8.0.8 (#2627)

Co-authored-by: narrieta <narrieta>

* fix network interface restart in RHEL9 (#2592) (#2629)

(cherry picked from commit b8ca432)

Co-authored-by: Nageswara Nandigam <84482346+nagworld9@users.noreply.github.com>

* fix if command in rhel v8.6+ (#2624) (#2630)

(cherry picked from commit e540728)

Co-authored-by: Nageswara Nandigam <84482346+nagworld9@users.noreply.github.com>

* Set Agent version to 2.8.0.9 (#2631)

Co-authored-by: narrieta <narrieta>

* Cleanup history directory when creating new subdirectories (#2633)

* Cleanup history directory when creating new subdirectories

* Review feedback

Co-authored-by: narrieta <narrieta>

* Set agent version to 2.8.0.10

* Save sharedconfig to disk (#2649)

* Save sharedconfig to disk

* Update tests

* pylint warnings

Co-authored-by: narrieta <narrieta>

* Set Agent version to 2.8.0.11 (#2650)

Co-authored-by: narrieta <narrieta>

* test fixes

* Microsoft mandatory file (#2737)

Co-authored-by: microsoft-github-policy-service[bot] <77245923+microsoft-github-policy-service[bot]@users.noreply.github.com>

---------

Co-authored-by: Laveesh Rohra <larohra@microsoft.com>
Co-authored-by: Kevin Clark <kevin.clark@microsoft.com>
Co-authored-by: Norberto Arrieta <narrieta@users.noreply.github.com>
Co-authored-by: Dhivya Ganesan <dhivyaganesan@users.noreply.github.com>
Co-authored-by: narrieta <narrieta>
Co-authored-by: Kevin Clark <keclar@microsoft.com>
Co-authored-by: Norberto Arrieta <narrieta@microsoft.com>
Co-authored-by: microsoft-github-policy-service[bot] <77245923+microsoft-github-policy-service[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants