New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast Track: Ignore Fabric goal states from HostGAPlugin; process most recent goal state from WireServer/HostGAPlugin #2541
Conversation
@@ -121,16 +121,6 @@ class ExtensionsConfigError(ExtensionsGoalStateError): | |||
""" | |||
|
|||
|
|||
class VmSettingsError(ExtensionsGoalStateError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed this exception to VmSettingsParseError (since it is only used to report parse errors) and moved it to extensions_goal_state.py (since that is the module that raises it)
@@ -56,7 +55,8 @@ def __init__(self, etag, json_text, correlation_id): | |||
self._parse_vm_settings(json_text) | |||
self._do_common_validations() | |||
except Exception as e: | |||
raise VmSettingsError("Error parsing vmSettings [HGAP: {0}]: {1}".format(self._host_ga_plugin_version, ustr(e)), etag, self.get_redacted_text()) | |||
message = "Error parsing vmSettings [HGAP: {0} Etag:{1}]: {2}".format(self._host_ga_plugin_version, etag, ustr(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the etag to the error message
|
||
@staticmethod | ||
def redact(text): | ||
return re.sub(r'("protectedSettings"\s*:\s*)"[^"]+"', r'\1"*** REDACTED ***"', text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The redact() method was used in a previous iteration of the code, but now there is no need for it, so I folded it into get_redacted_text()
@@ -165,7 +161,7 @@ def _parse_simple_attributes(self, vm_settings): | |||
# "correlationId": "9a47a2a2-e740-4bfc-b11b-4f2f7cfe7d2e", | |||
# "inSvdSeqNo": 1, | |||
# "extensionsLastModifiedTickCount": 637726657706205217, | |||
# "extensionGoalStatesSource": "Fabric", | |||
# "extensionGoalStatesSource": "FastTrack", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the vmSettings test data and sample code to FastTrack (since now we ignore Fabric)
self._history = GoalStateHistory(timestamp, incarnation) | ||
|
||
self._initialize_basic_properties(xml_doc) | ||
self._fetch_extended_goal_state(xml_text, xml_doc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are part of the new fix. I merged _initialize_basic_properties and _fetch_extended_goal_state into a single method (_fetch_full_wire_server_goal_state) and invoked that method from update().
I also removed code from init() that is already done by update()
@@ -139,12 +139,11 @@ def http_get_handler(url, *_, **__): | |||
self._test_supported_features_includes_fast_track(protocol, False) | |||
|
|||
def _test_supported_features_includes_fast_track(self, protocol, expected): | |||
with patch("azurelinuxagent.common.conf.get_enable_fast_track", return_value=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed all the patches that were setting EnableFastTrack to True. I made FastTrack the default several weeks ago, but was keeping these patches around in case we needed a release with FastTrack disable. No need to keep them around anymore.
''' | ||
Sets the ETag for the mock response | ||
''' | ||
def set_etag(self, etag, timestamp=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need new test helpers to change the source (fast track vs fabric) and timestamp of the goal states in the mock data
@@ -19,7 +18,7 @@ def assert_property(name, value): | |||
|
|||
assert_property("activity_id", "a33f6f53-43d6-4625-b322-1a39651a00c9") | |||
assert_property("correlation_id", "9a47a2a2-e740-4bfc-b11b-4f2f7cfe7d2e") | |||
assert_property("created_on_timestamp", "2021-11-16T13:22:50.620522Z") | |||
assert_property("created_on_timestamp", "2021-11-16T13:22:50.620529Z") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the timestamp on a few data files to make them the most recent goal state
|
||
_ = GoalState(protocol.client) | ||
|
||
self._assert_directory_contents( | ||
self._find_history_subdirectory("999"), | ||
self._find_history_subdirectory("999-888"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new suffix for history directories (incarnation-etag)
self.assertEqual(expected, actual, "The vmSettings were not saved correctly") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test cases for the certificate fix
Codecov Report
@@ Coverage Diff @@
## develop #2541 +/- ##
===========================================
+ Coverage 71.99% 72.07% +0.07%
===========================================
Files 102 102
Lines 15202 15228 +26
Branches 2406 2416 +10
===========================================
+ Hits 10945 10975 +30
+ Misses 3776 3773 -3
+ Partials 481 480 -1
Continue to review full report at Codecov.
|
@@ -659,17 +661,9 @@ def __get_vmagent_update_status(self, protocol, goal_state_changed): | |||
def _report_status(self, exthandlers_handler): | |||
vm_agent_update_status = self.__get_vmagent_update_status(exthandlers_handler.protocol, self._processing_new_extensions_goal_state()) | |||
# report_ext_handlers_status does its own error handling and returns None if an error occurred | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first goal state is always Fabric, and now we ignore Fabric goal states from the HostGAPlugin, so
self._goal_state.extensions_goal_state.channel == GoalStateChannel.HostGAPlugin
would never be True.
Now we do this in _try_update_goal_state(), using check_vm_settings_support()
if not goal_state_updated and not vm_settings_updated: | ||
return | ||
|
||
# Start a new history subdirectory and capture the updated goal state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wont this get confusing?
except VmSettingsParseError as exception: | ||
# ensure we save the vmSettings if there were parsing errors, but save them only once per ETag | ||
if not GoalStateHistory.tag_exists(exception.etag): | ||
GoalStateHistory(timeutil.create_timestamp(), exception.etag).save_vm_settings(exception.vm_settings_text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If by some magic the parsing error goes away in the next iteration for the same etag, will we overwrite the same history directory? Would we want to keep the bad file for debugging purposes?
vm_settings, _ = protocol.client.get_host_plugin().fetch_vm_settings() | ||
if vm_settings.etag != etag: | ||
raise Exception("The HostGAPlugin is no in sync. Expected ETag {0}. Got {1}".format(etag, vm_settings.etag)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small typo, no in sync
to not in sync
.
goal_state = GoalState(protocol.client) | ||
if goal_state.incarnation != incarnation: | ||
raise Exception("The WireServer is no in sync. Expected incarnation {0}. Got {1}".format(incarnation, goal_state.incarnation)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here as above
The goal states returned by the HostGAPlugin and the WireServer can be out of sync. This presents a problem when the most recent goal state includes an updated tenant certificate (this is always a Fabric goal state). In this case, we need to ensure that we first download the certificate and the process the extensions.
To achieve this, we now ignore Fabric goal states from the HostGAPlugin and we process the most recent goal state from the 2 goal states we receive from WireServer/HostGAPlugin.
This PR also includes some minor cleanup of portions of the Fast Track code. I'll add comments in the PR to point those out.