Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast Track: Ignore Fabric goal states from HostGAPlugin; process most recent goal state from WireServer/HostGAPlugin #2541

Merged
merged 14 commits into from Apr 5, 2022

Conversation

narrieta
Copy link
Member

The goal states returned by the HostGAPlugin and the WireServer can be out of sync. This presents a problem when the most recent goal state includes an updated tenant certificate (this is always a Fabric goal state). In this case, we need to ensure that we first download the certificate and the process the extensions.

To achieve this, we now ignore Fabric goal states from the HostGAPlugin and we process the most recent goal state from the 2 goal states we receive from WireServer/HostGAPlugin.

This PR also includes some minor cleanup of portions of the Fast Track code. I'll add comments in the PR to point those out.

@@ -121,16 +121,6 @@ class ExtensionsConfigError(ExtensionsGoalStateError):
"""


class VmSettingsError(ExtensionsGoalStateError):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this exception to VmSettingsParseError (since it is only used to report parse errors) and moved it to extensions_goal_state.py (since that is the module that raises it)

@@ -56,7 +55,8 @@ def __init__(self, etag, json_text, correlation_id):
self._parse_vm_settings(json_text)
self._do_common_validations()
except Exception as e:
raise VmSettingsError("Error parsing vmSettings [HGAP: {0}]: {1}".format(self._host_ga_plugin_version, ustr(e)), etag, self.get_redacted_text())
message = "Error parsing vmSettings [HGAP: {0} Etag:{1}]: {2}".format(self._host_ga_plugin_version, etag, ustr(e))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the etag to the error message


@staticmethod
def redact(text):
return re.sub(r'("protectedSettings"\s*:\s*)"[^"]+"', r'\1"*** REDACTED ***"', text)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The redact() method was used in a previous iteration of the code, but now there is no need for it, so I folded it into get_redacted_text()

@@ -165,7 +161,7 @@ def _parse_simple_attributes(self, vm_settings):
# "correlationId": "9a47a2a2-e740-4bfc-b11b-4f2f7cfe7d2e",
# "inSvdSeqNo": 1,
# "extensionsLastModifiedTickCount": 637726657706205217,
# "extensionGoalStatesSource": "Fabric",
# "extensionGoalStatesSource": "FastTrack",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the vmSettings test data and sample code to FastTrack (since now we ignore Fabric)

self._history = GoalStateHistory(timestamp, incarnation)

self._initialize_basic_properties(xml_doc)
self._fetch_extended_goal_state(xml_text, xml_doc)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are part of the new fix. I merged _initialize_basic_properties and _fetch_extended_goal_state into a single method (_fetch_full_wire_server_goal_state) and invoked that method from update().

I also removed code from init() that is already done by update()

@@ -139,12 +139,11 @@ def http_get_handler(url, *_, **__):
self._test_supported_features_includes_fast_track(protocol, False)

def _test_supported_features_includes_fast_track(self, protocol, expected):
with patch("azurelinuxagent.common.conf.get_enable_fast_track", return_value=True):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed all the patches that were setting EnableFastTrack to True. I made FastTrack the default several weeks ago, but was keeping these patches around in case we needed a release with FastTrack disable. No need to keep them around anymore.

'''
Sets the ETag for the mock response
'''
def set_etag(self, etag, timestamp=None):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need new test helpers to change the source (fast track vs fabric) and timestamp of the goal states in the mock data

@@ -19,7 +18,7 @@ def assert_property(name, value):

assert_property("activity_id", "a33f6f53-43d6-4625-b322-1a39651a00c9")
assert_property("correlation_id", "9a47a2a2-e740-4bfc-b11b-4f2f7cfe7d2e")
assert_property("created_on_timestamp", "2021-11-16T13:22:50.620522Z")
assert_property("created_on_timestamp", "2021-11-16T13:22:50.620529Z")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the timestamp on a few data files to make them the most recent goal state


_ = GoalState(protocol.client)

self._assert_directory_contents(
self._find_history_subdirectory("999"),
self._find_history_subdirectory("999-888"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new suffix for history directories (incarnation-etag)

self.assertEqual(expected, actual, "The vmSettings were not saved correctly")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test cases for the certificate fix

@codecov
Copy link

codecov bot commented Mar 31, 2022

Codecov Report

Merging #2541 (416765a) into develop (75fcab8) will increase coverage by 0.07%.
The diff coverage is 96.33%.

❗ Current head 416765a differs from pull request most recent head 06d977c. Consider uploading reports for the commit 06d977c to get more accurate results

@@             Coverage Diff             @@
##           develop    #2541      +/-   ##
===========================================
+ Coverage    71.99%   72.07%   +0.07%     
===========================================
  Files          102      102              
  Lines        15202    15228      +26     
  Branches      2406     2416      +10     
===========================================
+ Hits         10945    10975      +30     
+ Misses        3776     3773       -3     
+ Partials       481      480       -1     
Impacted Files Coverage Δ
azurelinuxagent/common/exception.py 95.95% <ø> (-0.20%) ⬇️
azurelinuxagent/common/protocol/hostplugin.py 88.85% <83.33%> (-0.45%) ⬇️
azurelinuxagent/common/utils/timeutil.py 88.88% <87.50%> (-11.12%) ⬇️
azurelinuxagent/common/protocol/goal_state.py 95.30% <98.61%> (-1.52%) ⬇️
...inuxagent/common/protocol/extensions_goal_state.py 72.93% <100.00%> (+1.27%) ⬆️
...protocol/extensions_goal_state_from_vm_settings.py 83.33% <100.00%> (-0.19%) ⬇️
azurelinuxagent/common/utils/archive.py 88.32% <100.00%> (+0.35%) ⬆️
azurelinuxagent/ga/update.py 88.56% <100.00%> (+0.16%) ⬆️
azurelinuxagent/common/event.py 86.03% <0.00%> (+0.67%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 75fcab8...06d977c. Read the comment docs.

@@ -659,17 +661,9 @@ def __get_vmagent_update_status(self, protocol, goal_state_changed):
def _report_status(self, exthandlers_handler):
vm_agent_update_status = self.__get_vmagent_update_status(exthandlers_handler.protocol, self._processing_new_extensions_goal_state())
# report_ext_handlers_status does its own error handling and returns None if an error occurred
#
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first goal state is always Fabric, and now we ignore Fabric goal states from the HostGAPlugin, so

self._goal_state.extensions_goal_state.channel == GoalStateChannel.HostGAPlugin
would never be True.

Now we do this in _try_update_goal_state(), using check_vm_settings_support()

if not goal_state_updated and not vm_settings_updated:
return

# Start a new history subdirectory and capture the updated goal state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wont this get confusing?

except VmSettingsParseError as exception:
# ensure we save the vmSettings if there were parsing errors, but save them only once per ETag
if not GoalStateHistory.tag_exists(exception.etag):
GoalStateHistory(timeutil.create_timestamp(), exception.etag).save_vm_settings(exception.vm_settings_text)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If by some magic the parsing error goes away in the next iteration for the same etag, will we overwrite the same history directory? Would we want to keep the bad file for debugging purposes?

vm_settings, _ = protocol.client.get_host_plugin().fetch_vm_settings()
if vm_settings.etag != etag:
raise Exception("The HostGAPlugin is no in sync. Expected ETag {0}. Got {1}".format(etag, vm_settings.etag))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small typo, no in sync to not in sync.

goal_state = GoalState(protocol.client)
if goal_state.incarnation != incarnation:
raise Exception("The WireServer is no in sync. Expected incarnation {0}. Got {1}".format(incarnation, goal_state.incarnation))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here as above

@narrieta narrieta merged commit f91bf78 into Azure:develop Apr 5, 2022
@narrieta narrieta deleted the certificate branch April 5, 2022 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants