Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Azure] DataSourceAzure: send failure signal on Azure datasource failure #594

Merged
merged 53 commits into from
Nov 18, 2020

Conversation

johnsonshi
Copy link
Contributor

@johnsonshi johnsonshi commented Oct 2, 2020

Proposed Commit Message

DataSourceAzure: send failure signal on Azure datasource failure

On systems where the Azure datasource
is a viable platform for crawling metadata,
cloud-init occasionally encounters fatal
irrecoverable errors during the crawling
of the Azure datasource.

When this happens, cloud-init crashes,
and Azure VM provisioning would fail.
However, instead of failing immediately,
the user will continue seeing provisioning
for a long time until it times out with
"OS Provisioning Timed Out" message.

In these situations, cloud-init should
report failure to the Azure datasource
endpoint indicating provisioning failure.
The user will immediately see provisioning
terminate, giving them a much better
failure experience instead of pointlessly
waiting for OS provisioning timeout.

Additional Context

Additional context in the proposed commit message.

Test Steps

Test: No regression for pre-existing report ready workflow

  • Create an Azure VM from the latest Ubuntu 18.04 LTS image.
  • Apply this PR's patch.
  • Capture the Azure VM into a custom image.
  • Successful deployment from this captured custom image.

Test: Report failure to Azure upon Azure crawl metadata fatal exception

  • Create an Azure VM from the latest Ubuntu 18.04 LTS image.
  • Apply this PR's patch.
  • Modify DataSourceAzure's crawl_metadata and add a raise Exception before the return statement. Tthis ensures a fatal crawl_metadata implementation, which mimics a fatal irrecoverable error during cloud-init's crawling of Azure metadata.
  • Capture the Azure VM into a custom image.
  • Deploy this custom image.
  • The custom image should terminate early within 2-3 minutes with the error code OSProvisioningInternalError. It should look like the following:
    image

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

@johnsonshi johnsonshi changed the title [DRAFT] DataSourceAzure: send failure signal on Azure datasource failure [Azure] DataSourceAzure: send failure signal on Azure datasource failure Oct 19, 2020
@johnsonshi johnsonshi marked this pull request as ready for review October 19, 2020 07:56
Copy link
Collaborator

@OddBloke OddBloke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! One minor comment on the code itself, and I've also reviewed the tests: no major comments, just some minor requests (and a few nits; feel free to ignore those).

cloudinit/sources/helpers/azure.py Outdated Show resolved Hide resolved
cloudinit/sources/helpers/azure.py Show resolved Hide resolved
tests/unittests/test_datasource/test_azure.py Outdated Show resolved Hide resolved
tests/unittests/test_datasource/test_azure.py Outdated Show resolved Hide resolved
tests/unittests/test_datasource/test_azure.py Show resolved Hide resolved
tests/unittests/test_datasource/test_azure_helper.py Outdated Show resolved Hide resolved
tests/unittests/test_datasource/test_azure_helper.py Outdated Show resolved Hide resolved
tests/unittests/test_datasource/test_azure_helper.py Outdated Show resolved Hide resolved
tests/unittests/test_datasource/test_azure_helper.py Outdated Show resolved Hide resolved
@johnsonshi
Copy link
Contributor Author

Thanks @OddBloke for the comments and NITs. I've addressed all of them in the series of commits above. To make it easier for you to review, for each of your comments above, I've linked the relevant commit that addresses the issue.

I've also built a deb with the latest changes and performed Azure deployment tests, which conform to the expected behavior (no regressions on successful deployments + report failure on intentionally buggy deployments).

Copy link
Collaborator

@OddBloke OddBloke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your work here, Johnson, this looks good to me now. 👍

@OddBloke OddBloke merged commit d807df2 into canonical:master Nov 18, 2020
@johnsonshi johnsonshi deleted the azure-send-failure-signal branch November 19, 2020 03:49
This was referenced May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants