Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

feat: run accelerated unattended-upgrade at node creation time #4217

Merged
merged 4 commits into from Feb 3, 2021

Conversation

jackfrancis
Copy link
Member

Reason for Change:

This PR adds a runUnattendedUpgradesOnBootstrap option to the linuxProfile api model configuration, to allow folks to explicitly accelerate the acceptance of new downstream packages on node VMs when bringing them online.

In practice this will slow down node creation time, and will require extra post-installation validation as any installed packages that were not already present on the AKS Engine-curated VHD will not have been tested (this assumes you're using one of those VHDs).

Fixes #4156

Issue Fixed:

Credit Where Due:

Does this change contain code from or inspired by another project?

  • No
  • Yes

If "Yes," did you notify that project's maintainers and provide attribution?

  • No
  • Yes

Requirements:

Notes:

@@ -276,6 +276,10 @@ if [[ $OS == $UBUNTU_OS_NAME ]]; then
fi
{{end}}

{{- if RunUnattendedUpgrades}}
apt_get_update && apt_get_dist_upgrade && unattended_upgrade
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice (I think) the unattended_upgrade invocation here is superfluous (update and dist-upgrade will effectively do the deed; including it here to be extra explicit.

perhaps @Michael-Sinz can confirm if this is sane

Mainly I trust our apt_get_update and apt_get_dist_upgrade functions to definitively accomplish those tasks over silently calling /usr/bin/unattended-upgrade. The latter (by design) silently fails single invocations (because it knows it'll be invoked again — it's not in a rush) if, for example, various apt locks are held (there are probably other reasons).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The big difference between unattended-upgrades and apt-get dist-upgrade is the list of things it will install.

Unattended upgrades is constrained to the list of updates that are deemed safe and vital for security/reliability. They are not minor feature updates unless that was required for security. (This is the default and recommended configuration for unattended-upgrade)

For example, on a test VM, I just logged in and noticed this right now:

58 packages can be updated.
4 updates are security updates.

After running unattended-upgrades on that machine (which normally cron does for me on regular basis), the login looks like this:

54 packages can be updated.
0 updates are security updates.    

This is very different from a full apt-get update/apt-get upgrade (which itself is less than apt-get dist-upgrade)

The actual ubuntu unattended-upgrade command will return an error if it fails to complete an update. But it is constrained to the security updates.

Another good thing about unattended-upgrades is that it does set the unattended settings for apt/apt-get/dpkg such that it should not hang (albeit, packages can still cause this problems but that is rare in the security patches).

Which to use is really a question of risks. Balancing all of them.

We run unattended-upgrade on a regular basis because we can trust it at scale.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS - It is redundant to run unattended-upgrade after having done the full upgrade or dist-upgrade.

It may be useful to do unattended-upgrade first just to be sure they complete before getting into the larger set (both from a security standpoint and an ability to complete them)

So I would not run unattended afterwards.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all makes sense. What's perplexing is that, in practice, simply adding a "wait for apt locks and then run unattended-upgrade" during CSE does not in my tests produce the expected /var/run/reboot-required (a symptom of critical security updates arriving) outcome.

I'm going to try apt-get update && unattended-upgrade next.

@codecov
Copy link

codecov bot commented Feb 2, 2021

Codecov Report

Merging #4217 (94b7c6a) into master (805416e) will increase coverage by 0.00%.
The diff coverage is 83.33%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #4217   +/-   ##
=======================================
  Coverage   73.36%   73.36%           
=======================================
  Files         135      135           
  Lines       20849    20855    +6     
=======================================
+ Hits        15296    15301    +5     
- Misses       4576     4577    +1     
  Partials      977      977           
Impacted Files Coverage Δ
pkg/api/types.go 92.72% <ø> (ø)
pkg/api/vlabs/types.go 73.04% <ø> (ø)
pkg/engine/templates_generated.go 44.19% <ø> (ø)
pkg/engine/template_generator.go 68.34% <75.00%> (+0.04%) ⬆️
pkg/api/converterfromapi.go 95.68% <100.00%> (+<0.01%) ⬆️
pkg/api/convertertoapi.go 94.04% <100.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 805416e...f9ba264. Read the comment docs.

@@ -276,6 +276,10 @@ if [[ $OS == $UBUNTU_OS_NAME ]]; then
fi
{{end}}

{{- if RunUnattendedUpgrades}}
apt_get_update && unattended_upgrade
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My tests so far prove that the above works to ensure that when there are security updates available, running apt-get update and then running unattended-upgrade successfully, serially, gets them. So we can trust that the "runUnattendedUpgradesOnBootstrap" feature does the right thing and actually applies (i.e., reboots) the OS updates during cluster creation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in the past I saw this not always work but it could have been timing related to when other things are set up with respect to cloudinit. This is likely a better place to do that.

Is there a reason that this would not be the default behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary reason is the judgment that having a node reboot before first coming online offers (1) undesirable delay and (2) demonstrable loss in node bootstrap reliability.

I don't think we can avoid #1, it's definitely going to take longer most of the time for nodes to come online if they come online with a stale OS security package configuration, and if they want to come up-to-date even if it requires a reboot. <-- is always going to drag up the average node bootstrap time

I wonder about #2 though. Can we summarize the additional risk of scooping up untested packages, plus any additional risk that a VM OS won't successfully come back online?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The risk is relatively low but is not zero. We have not had an outage due to the security updates as they are vetted relatively well. The question is how bad is it to run a node without the security updates?

I am not saying someone could not opt out, but it is a question of which way we should be "safe by default" and what "safe" means.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I claim we should start here and make a change to the default after some more testing maybe.

/lgtm

@acs-bot
Copy link

acs-bot commented Feb 3, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, Michael-Sinz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jackfrancis jackfrancis merged commit 8fe60fb into Azure:master Feb 3, 2021
@jackfrancis jackfrancis deleted the cse-unattended-upgrade branch February 3, 2021 23:29
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support configuration to inject unattended-upgrades into CSE.
3 participants