Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

fix: mark walinux for hold in cloud-init #778

Merged
merged 5 commits into from
Mar 16, 2019

Conversation

jackfrancis
Copy link
Member

@jackfrancis jackfrancis commented Mar 15, 2019

Reason for Change:

Follow-up of #771

Testing against the changes in #771 revealed that cloud-init itself is invoking apt to install some packages:

2019-03-15 19:26:12,725 - helpers.py[DEBUG]: Running update-sources using lock (<FileLock using file '/var/lib/cloud/instances/3557FA54-5097-654B-9116-21C59991226E/sem/update_sources'>)
2019-03-15 19:26:12,726 - util.py[DEBUG]: Running command ['eatmydata', 'apt-get', '--option=Dpkg::Options::=--force-confold', '--option=Dpkg::options::=--force-unsafe-io', '--assume-yes', '--quiet', 'update'] with allowed return codes [0] (shell=False, capture=False)
2019-03-15 19:26:25,480 - util.py[DEBUG]: apt-update [eatmydata apt-get --option=Dpkg::Options::=--force-confold --option=Dpkg::options::=--force-unsafe-io --assume-yes --quiet update] took 12.754 seconds
2019-03-15 19:26:25,480 - helpers.py[DEBUG]: update-sources already ran (freq=once-per-instance)
2019-03-15 19:26:25,480 - util.py[DEBUG]: Running command ['eatmydata', 'apt-get', '--option=Dpkg::Options::=--force-confold', '--option=Dpkg::options::=--force-unsafe-io', '--assume-yes', '--quiet', 'install', 'jq', 'traceroute'] with allowed return codes [0] (shell=False, capture=False)
2019-03-15 19:26:34,734 - util.py[DEBUG]: apt-install [eatmydata apt-get --option=Dpkg::Options::=--force-confold --option=Dpkg::options::=--force-unsafe-io --assume-yes --quiet install jq traceroute] took 9.254 seconds
2019-03-15 19:26:34,735 - handlers.py[DEBUG]: finish: modules-final/config-package-update-upgrade-install: SUCCESS: config-package-update-upgrade-install ran successfully

This means in practice that when we attempt to apt-mark hold walinuxagent in CSE it may fail due to cloud-init already having reserved the lock.

Additionally, we are explicitly pinning the version of walinuxagent to further protect against runtime changes to the walinuxagent. This last change is a defensive manouver. Based on production data following the introduction of 2.2.37 we want to introduce new version changes purposefully.

We can consider reverting to the prior behavior once we recover create stability, if the static version pinning produces undesired effects for long-lived clusters who wish to update their walinuxagent versions "out of band".

Issue Fixed:

Requirements:

Notes:

@acs-bot acs-bot added the size/M label Mar 15, 2019
@@ -57,10 +57,6 @@ else
FULL_INSTALL_REQUIRED=true
fi

if [[ $OS == $UBUNTU_OS_NAME ]]; then
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to get rid of the "apt-mark hold" invocation in CSE, as the hold command will wait for the release of apt locks, which may take a long time if cloud-init has already reserved them.

@jackfrancis
Copy link
Member Author

/azp run pr-e2e

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov
Copy link

codecov bot commented Mar 15, 2019

Codecov Report

Merging #778 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #778   +/-   ##
=======================================
  Coverage   67.58%   67.58%           
=======================================
  Files         115      115           
  Lines       16856    16856           
=======================================
  Hits        11392    11392           
  Misses       4677     4677           
  Partials      787      787

@@ -179,7 +175,7 @@ if $REBOOTREQUIRED; then
echo 'reboot required, rebooting node in 1 minute'
/bin/bash -c "shutdown -r 1 &"
if [[ $OS == $UBUNTU_OS_NAME ]]; then
holdWALinuxAgent "unhold"
aptmarkWALinuxAgent unhold
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a background process to return control to the user without waiting for it to finish?

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm pending lots of passing e2e tests :)

owner: root
content: |
Package: walinuxagent
Pin: version 2.2.32.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that the version that is currently in VHD? or the version we are currently being upgraded to ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version in the VHD

@jackfrancis
Copy link
Member Author

/azp run pr-e2e

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mboersma
Copy link
Member

/lgtm

@acs-bot
Copy link

acs-bot commented Mar 15, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, mboersma

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [jackfrancis,mboersma]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jackfrancis jackfrancis merged commit 99f9405 into Azure:master Mar 16, 2019
@jackfrancis jackfrancis deleted the fix-apt-mark-hold-cloud-init branch March 16, 2019 00:22
sylr pushed a commit to sylr/aks-engine that referenced this pull request Mar 22, 2019
sylr pushed a commit to sylr/aks-engine that referenced this pull request Mar 22, 2019
sylr pushed a commit to sylr/aks-engine that referenced this pull request Jun 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants