Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

fix: do not restart addon-manager if a reboot is required #3721

Merged
merged 3 commits into from Aug 19, 2020

Conversation

CecileRobertMichon
Copy link
Contributor

Reason for Change:

Related to #3668 (comment)
Seeing some clusters fail with exit 36 recently where CSE failed to get the addon-manager pod as part of ensureAddons().

+ for i in $(seq 1 $retries)
+ timeout 30 /usr/local/bin/kubectl --kubeconfig=/home/azureuser/.kube/config get pods -l app=kube-addon-manager -n kube-system
+ '[' 120 -eq 120 ']'
+ echo Executed '"/usr/local/bin/kubectl' --kubeconfig=/home/azureuser/.kube/config get pods -l app=kube-addon-manager -n 'kube-system"' 120 times
Executed "/usr/local/bin/kubectl --kubeconfig=/home/azureuser/.kube/config get pods -l app=kube-addon-manager -n kube-system" 120 times

This PR might not fix the underlying root cause but we should not be getting/restarting pods when the vm is marked for reboot so this PR addresses that.

/cc @mboersma @Michael-Sinz

Issue Fixed:

Requirements:

Notes:

@acs-bot acs-bot requested a review from mboersma August 17, 2020 19:10
@acs-bot
Copy link

acs-bot commented Aug 17, 2020

@CecileRobertMichon: GitHub didn't allow me to request PR reviews from the following users: Michael-Sinz.

Note that only Azure members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Reason for Change:

Related to #3668 (comment)
Seeing some clusters fail with exit 36 recently where CSE failed to get the addon-manager pod as part of ensureAddons().

+ for i in $(seq 1 $retries)
+ timeout 30 /usr/local/bin/kubectl --kubeconfig=/home/azureuser/.kube/config get pods -l app=kube-addon-manager -n kube-system
+ '[' 120 -eq 120 ']'
+ echo Executed '"/usr/local/bin/kubectl' --kubeconfig=/home/azureuser/.kube/config get pods -l app=kube-addon-manager -n 'kube-system"' 120 times
Executed "/usr/local/bin/kubectl --kubeconfig=/home/azureuser/.kube/config get pods -l app=kube-addon-manager -n kube-system" 120 times

This PR might not fix the underlying root cause but we should not be getting/restarting pods when the vm is marked for reboot so this PR addresses that.

/cc @mboersma @Michael-Sinz

Issue Fixed:

Requirements:

  • includes documentation
  • adds unit tests
  • tested upgrade from previous version

Notes:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@acs-bot acs-bot added the size/M label Aug 17, 2020
@codecov
Copy link

codecov bot commented Aug 17, 2020

Codecov Report

Merging #3721 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3721   +/-   ##
=======================================
  Coverage   73.16%   73.16%           
=======================================
  Files         147      147           
  Lines       25318    25318           
=======================================
  Hits        18525    18525           
  Misses       5655     5655           
  Partials     1138     1138           
Impacted Files Coverage Δ
pkg/engine/templates_generated.go 53.42% <ø> (ø)
pkg/api/common/versions.go 96.84% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 88add17...be8f221. Read the comment docs.

mboersma
mboersma previously approved these changes Aug 18, 2020
Copy link
Member

@mboersma mboersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Member

@jackfrancis jackfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@acs-bot
Copy link

acs-bot commented Aug 19, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon, jackfrancis, mboersma

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [CecileRobertMichon,jackfrancis,mboersma]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jackfrancis jackfrancis merged commit e2b7293 into Azure:master Aug 19, 2020
penggu pushed a commit to penggu/aks-engine that referenced this pull request Oct 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants