Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement FastTrack handshake with CRP #2514

Merged
merged 6 commits into from Feb 18, 2022
Merged

Conversation

narrieta
Copy link
Member

If the HostGAPlugin supports the VM settings API, the agent needs to add FastTrack to the supportedFeatures in the status report. When CRP detects this, it enables the feature and starts producing FastTrack goal states for that VM.

The PR includes some test fixes; I added comments in the PR for those.

@@ -219,6 +219,7 @@ def save_to_history(etag, text):
pass
except VmSettingsError as exception:
save_to_history(exception.etag, exception.vm_settings_text)
raise
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this error needs to be propagated to the caller, I missed that in a previous commit

@@ -480,8 +480,9 @@ def format_message(msg):
logger.info(message)
add_event(op=WALAEventOperation.HostPlugin, message=message, is_success=True)

# Don't support HostGAPlugin versions older than 115
if vm_settings.host_ga_plugin_version < FlexibleVersion("1.0.8.115"):
# Don't support HostGAPlugin versions older than 123
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

123 contains all the fixes we need on the HostGAPlugin before releasing to Prod. Many VMs in the canary region are still on 117 so using that for now

@codecov
Copy link

codecov bot commented Feb 17, 2022

Codecov Report

Merging #2514 (2a367ed) into develop (231b22f) will increase coverage by 0.11%.
The diff coverage is 91.89%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2514      +/-   ##
===========================================
+ Coverage    71.77%   71.88%   +0.11%     
===========================================
  Files          101      101              
  Lines        15116    15147      +31     
  Branches      2398     2401       +3     
===========================================
+ Hits         10850    10889      +39     
- Misses        3780     3781       +1     
+ Partials       486      477       -9     
Impacted Files Coverage Δ
azurelinuxagent/common/protocol/hostplugin.py 88.71% <0.00%> (ø)
...inuxagent/common/protocol/extensions_goal_state.py 73.07% <80.00%> (+0.73%) ⬆️
azurelinuxagent/common/agent_supported_feature.py 100.00% <100.00%> (ø)
...ol/extensions_goal_state_from_extensions_config.py 90.97% <100.00%> (-1.00%) ⬇️
...protocol/extensions_goal_state_from_vm_settings.py 80.69% <100.00%> (+0.61%) ⬆️
azurelinuxagent/common/protocol/goal_state.py 95.35% <100.00%> (+0.01%) ⬆️
azurelinuxagent/common/protocol/restapi.py 97.50% <100.00%> (+0.07%) ⬆️
azurelinuxagent/common/protocol/wire.py 78.82% <100.00%> (+0.05%) ⬆️
azurelinuxagent/ga/exthandlers.py 85.91% <100.00%> (+0.08%) ⬆️
azurelinuxagent/ga/update.py 87.69% <100.00%> (-0.05%) ⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 231b22f...2a367ed. Read the comment docs.

@@ -1,26 +0,0 @@
<Extensions version="1.0.0.0" goalStateIncarnation="9"><GuestAgentExtension xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments in tests/protocol/mockwiredata.py

@@ -111,9 +113,11 @@ def generate_put_handler(*emulators):
def mock_put_handler(url, *args, **_):

if HttpRequestPredicates.is_host_plugin_status_request(url):
return
status_blob = WireProtocolData.get_status_blob_from_hostgaplugin_put_status_request(args[0])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments in tests/protocol/mockwiredata.py

from tests.protocol import mockwiredata
from tests.protocol.HttpRequestPredicates import HttpRequestPredicates


Copy link
Member Author

@narrieta narrieta Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved ReportStatusTestCase to its own file.

test_report_status_should_log_errors_only_once_per_goal_state is basically the same as in the original file. The other tests are new or full rewrites of the older tests.


yield update_handler

def test_update_handler_should_report_status_when_fetch_goal_state_fails(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

full rewrite of older tests

raise HttpError()

@patch("azurelinuxagent.common.conf.get_enable_fast_track", return_value=True)
def test_update_handler_should_report_status_even_on_failed_goal_state_fetch(self, _):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test and the one below have been problematic for several of the code iterations I've been working on.

I just went ahead and rewrote them as test_update_handler_should_report_status_when_fetch_goal_state_fails

@@ -2868,171 +2868,6 @@ def test_it_should_process_goal_state_only_on_new_goal_state(self):
self.assertEqual(2, remote_access_handler.run.call_count, "remote_access_handler.run() should have been called on a new goal state")


class ReportStatusTestCase(AgentTestCase):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class was moved to its own file

@@ -305,17 +302,18 @@ def mock_http_put(self, url, data, **_):

if url.endswith('/vmAgentLog'):
self.call_counts['/vmAgentLog'] += 1
elif url.endswith('/StatusBlob'):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"/StatusBlob" is a fake url that was used in the test file that I removed. It's usage was making the put status on the hostgaplugin fail, and forcing the request to be retried as a direct call to storage.

I removed this and added support in the mock protocol for direct and hostgaplugin PUT status requests

@@ -327,6 +325,12 @@ def mock_gen_trans_cert(self, trans_prv_file, trans_cert_file):
with open(trans_cert_file, 'w+') as cert_file:
cert_file.write(self.trans_cert)

@staticmethod
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the status is PUT via the hostgaplugin, it is wrapped in an enclosing object as property "content" and base64-encoded.

this function unwraps it and decodes it

@@ -1,27 +0,0 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
Copy link
Member Author

@narrieta narrieta Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 tests were moved to test_extensions_goal_state_from_extensions_config/from_vm_settings.py, along with the other parse tests. They are now named test_it_should_default_to_block_blob_when_the_status_blob_type_is_not_valid


@patch("azurelinuxagent.common.conf.get_enable_fast_track", return_value=True)
def test_extension_goal_state_should_parse_requested_version_properly(self, _):
def test_it_should_parse_vm_settings(self, _):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified these tests to go thru update_goal_state instead of create_from_* to make them more consistend with other goal state tests

remove_timestamps(second_status)

self.assertEqual(first_status, second_status)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

existing test moved from the previous file; almost identical to the original

update_handler._try_update_goal_state(exthandlers_handler.protocol)
update_handler._report_status(exthandlers_handler)
self.assertEqual(2, logger_warn.call_count, "UpdateHandler._report_status() should continue reporting errors after a new goal state")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest are new tests

data_file["ext_conf"] = "hostgaplugin/ext_conf-invalid_blob_type.xml"
with mock_wire_protocol(data_file) as protocol:
extensions_goal_state = protocol.get_goal_state().extensions_goal_state
self.assertEqual("BlockBlob", extensions_goal_state.status_upload_blob_type, 'Expected BlockBob for an invalid statusBlobType')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit typo: BlockBob -> BlockBlob in the error message

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed

@@ -45,3 +53,8 @@ def test_it_should_parse_empty_depends_on_as_dependency_level_0(self):

self.assertEqual(0, extensions[0].settings[0].dependencyLevel, "Incorrect dependencyLevel}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another typo, if you want to fix it in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the typo? thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it. fixed

self.assertEqual(2, len(extensions_goal_state.agent_manifests), "Incorrect number of agent manifests. Got: {0}".format(extensions_goal_state.agent_manifests))
self.assertEqual("Prod", extensions_goal_state.agent_manifests[0].family, "Incorrect agent family.")
self.assertEqual(2, len(extensions_goal_state.agent_manifests[0].uris), "Incorrect number of uris. Got: {0}".format(extensions_goal_state.agent_manifests[0].uris))
self.assertEqual("https://zrdfepirv2cdm03prdstr01a.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Prod_uscentraleuap_manifest.xml", extensions_goal_state.agent_manifests[0].uris[0], "Incorrect number of uris.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this error message can be better. Instead of, Incorrect number of uris, maybe first URI does not match. Expected: {0}, got {1} or something to that effect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

# Don't support HostGAPlugin versions older than 115
if vm_settings.host_ga_plugin_version < FlexibleVersion("1.0.8.115"):
# Don't support HostGAPlugin versions older than 123
# TODO: update the minimum version to 1.0.8.117 before release
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, you mean minimum version to 123 before release?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i'll fix the comment

if vm_settings.host_ga_plugin_version < FlexibleVersion("1.0.8.115"):
# Don't support HostGAPlugin versions older than 123
# TODO: update the minimum version to 1.0.8.117 before release
if vm_settings.host_ga_plugin_version < FlexibleVersion("1.0.8.117"):
raise_not_supported(reset_state=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not supported, how does this update supports_fast_track flag as false

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the flag is maintained by the UpdateHandler and computed like this

supports_fast_track = self._goal_state.extensions_goal_state.source_channel == GoalStateChannel.HostGAPlugin

If the HostGAPlugin does not support vmsettings (or is the wrong version) we fallback to the WireServer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note also that the HostGAPlugin may support fast track, but the feature may be disabled in waagent.conf. The above condition takes care of that too.

The naming is a little awkward: CRP needs FastTrack to be added to the supportedFeatures, but what the flag really means is that the agent is requesting CRP to start using FastTrack. Of course, if the agent sets this flag, it must support FT. But the agent could support FT and still not request CRP to use it (e.g. the user disabled the feature, or the hostgaplugin is not the correct version).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got you. Thanks for explaining.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am doing another pass of the code (separate PRs) and will add comments and tests to explain the behavior of the feature, as well as to address the TODOs (the behavior is already described in the spec for fast track on the linux agent, but i should capture that in tests)

@narrieta narrieta merged commit acd6301 into Azure:develop Feb 18, 2022
@narrieta narrieta deleted the fast-track branch February 18, 2022 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants