Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

test: add REBOOT_CONTROL_PLANE_NODES E2E config #3745

Merged
merged 8 commits into from
Aug 28, 2020

Conversation

jackfrancis
Copy link
Member

Reason for Change:

This PR adds a REBOOT_CONTROL_PLANE_NODES option, and related daemonset, to randomly reboot control plane nodes during cluster lifecycle, to ensure that the cluster operates without side effects in such scenarios.

Issue Fixed:

Requirements:

Notes:

@codecov
Copy link

codecov bot commented Aug 25, 2020

Codecov Report

Merging #3745 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3745   +/-   ##
=======================================
  Coverage   73.20%   73.20%           
=======================================
  Files         148      148           
  Lines       25372    25372           
=======================================
  Hits        18573    18573           
  Misses       5663     5663           
  Partials     1136     1136           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cb5f0dd...d43bd9a. Read the comment docs.

Copy link
Collaborator

@Michael-Sinz Michael-Sinz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this - the minor change for the --request-timeout is something I had forgotten to include originally. It can help address kubectl hanging when something goes wrong (like another master rebooting and it happens to be the one that the kubectl is trying to talk to)

- bash
- -c
- >-
while [[ $(kubectl annotate namespace ${LOCK_NS} ${LOCK_NAME}=${NODE_ID} 2>&1) != *\(${NODE_ID}\)* ]];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor addition - all of the kubectl command should also have --request-timeout 30s (or pick your number) since the reboots of another node may hang the kubectl command otherwise.

For example:
kubectl --request-timeout 30s annotate namespace ${LOCK_NS} ${LOCK_NAME}=${NODE_ID}

@Michael-Sinz
Copy link
Collaborator

/lgtm

@acs-bot
Copy link

acs-bot commented Aug 25, 2020

@Michael-Sinz: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@acs-bot
Copy link

acs-bot commented Aug 25, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, Michael-Sinz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jackfrancis jackfrancis merged commit 8261389 into Azure:master Aug 28, 2020
@jackfrancis jackfrancis deleted the control-plane-reboot branch August 28, 2020 18:56
penggu pushed a commit to penggu/aks-engine that referenced this pull request Oct 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants