Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

fix: promote system addons to system-cluster-critical #2533

Merged
merged 1 commit into from Jan 8, 2020

Conversation

jackfrancis
Copy link
Member

@jackfrancis jackfrancis commented Jan 7, 2020

Reason for Change:

This PR updates blobfuse-flexvolume and keyvault-flexvolume daemonsets to the system-cluster-critical PriorityClass to ensure that they are properly prioritized during pod scheduling.

For the nmi daemonset in the aad-pod-identity addon, we migrated the namespace to kube-system, as system-cluster-critical isn't allowed in the default namespace.

Specifically, during scale up events in response to more pods than available scheduleable nodes (e.g., cluster-autoscaler), we want to ensure that system daemonsets are scheduled first of all on new nodes, so that nodes don't reach pod capacity w/ non-system-critical pods before these daemonsets have a chance to be scheduled.

Issue Fixed:

Azure/aad-pod-identity#471
Fixes #2532
Fixes #2531

Requirements:

Notes:

@codecov
Copy link

codecov bot commented Jan 8, 2020

Codecov Report

Merging #2533 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #2533   +/-   ##
=======================================
  Coverage   72.11%   72.11%           
=======================================
  Files         130      130           
  Lines       24460    24460           
=======================================
  Hits        17639    17639           
  Misses       5799     5799           
  Partials     1022     1022

Copy link
Member

@mboersma mboersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@acs-bot acs-bot added the lgtm label Jan 8, 2020
@acs-bot acs-bot merged commit 02bc287 into Azure:master Jan 8, 2020
@acs-bot
Copy link

acs-bot commented Jan 8, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, mboersma

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [jackfrancis,mboersma]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jackfrancis
Copy link
Member Author

@palma21 @yangl900 FYI, we observed that ensuring all system addons/components have a DaemonSet PriorityClass of system-cluster-critical or system-node-critical is important during cluster-autoscaler scale up scenarios. E.g., if cluster-autoscaler adds more nodes to ensure that Pending pods have schedulable nodes, we want to ensure that system daemonsets get scheduling priority when those new nodes online.

@palma21
Copy link
Member

palma21 commented Jan 8, 2020

Thanks for the heads up we do have that on our backlog to add to a few that are missing.

FYI, a PDB to non DS system-cluster-critical or system-node-critical pods is required or things like core DNS can block an otherwise empty node to scale down.

acs-bot pushed a commit that referenced this pull request Jan 9, 2020
* Rashmi/ciprod12042019 (#3)

improvement: Update Containermonitoring addon for december release (#1)

* Fixing yaml indentation issues for omsagent (#4)

Fixing yaml indentation issues for omsagent

* Fixing more indentation issues in omsagent yaml (#5)

Fixing more indentation issues in omsagent yaml

* updating templates_generated file

* Fix: Updating generated file to fix unit tests (#6)

fix: Updating generated file to fix unit tests

* Adding test coverage for 1.16 and 1.17 (#7)

Adding test coverage for 1.16 and 1.17

* Merging changes for omsagent latest release - ciprod01072020 after syncing from remote master(#8)

* chore: use go template comments for generate proxy certs script (#2336)

* fix: fix ARM dependency issues with vm user-specified extensions on node pools (#2398)

* fix: fix ARM dependency issues if many extensions are specified for a node profile

* fix scale up case for windows vhd case. (#2483)

* refactor: make cilium addon user-configurable (#2480)

* refactor: make cilium addon user-configurable

* chore: clarify that cilium doesn't work w/ 1.16 and above, add validation

* test: addons UT

* test: go template UT

* ci: use Standard_D8_v3 for cilium test, only run NetworkPolicy tests

* fix: error message language

* chore: remove debug fmt.Println

* ci: revert back to Standard_D2_v3

* chore: upgrade cni-plugins to v0.7.6 (#2484)

* fix: hard-coding hyper-v generation when using VHD URls as a quick unblock (#2487)

* feat: Configuring docker log rotation for Windows nodes (#2478)

* feat: Antrea plugin support in AKS Engine (#2407)

* Antrea plugin support in AKS Engine

* chore: clean up

* chore: use ContainerImage

* chore: generated code

* refactor: Updating antrea yaml to 0.2.0

Co-authored-by: Jack Francis <jackfrancis@gmail.com>

* chore: lint (#2493)

* test: revert change to default kubernetes.json api model example (#2494)

* chore: update cloud-provider-azure components to v0.4.0 (#2473)

* chore: update cloud-provider-azure components to v0.4.0

See https://github.com/kubernetes-sigs/cloud-provider-azure/releases/tag/v0.4.0

* refactor: strip MCR constant to base hostname of URL

* fix: fetch Azure cloud-manager images from /oss/kubernetes/

* refactor: make audit-policy and azure-cloud-provider addons user-configurable (#2496)

* chore: pre-pull k8s v1.15.7-azs (#2490)

* fix: Fix some path handling in collect-windows-logs script (#2488)

* docs: remove mentions of old orchestrators (#2501)

* chore: Targeting dec patches for windows VHD (#2505)

* refactor: move StorageClass into azure-cloud-provider addon (#2497)

* add "Standard_DS3_v2" to "AcceleratedNetworking" supported list (#2509)

* ci: collect logs during E2E runs (#2520)

* refactor: user-configurable flannel and scheduled maintenance addons (#2517)

* chore: update Azure NPM to v1.0.31 (#2521)

* feat: add support for Kubernetes 1.18.0-alpha.1 (#2503)

* feat: add support for Kubernetes 1.18.0-alpha.1

See https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.18.md#changelog-since-v1170

* test: add 1.18 to Jenkinsfile

* ci: disable kms for 1.18

* chore: move flannel 1.18 spec to containeraddons

* chore: generated code

* fix: use new cloudprovider implementation for 1.18

Co-authored-by: Jack Francis <jackfrancis@gmail.com>

* test: don't test non-working >= 1.16 flannel + docker (#2524)

* fix: apply new master node labels for k8s v1.18+ compatibility (#2467)

* fix: apply new master node labels for k8s v1.18+ compatibility

* test: check master labels in the future for back-compat

* feat: cleaning up old kubelet/kubeproxy logs for Windows nodes (#2504)

* feat: cleaning up old kubelet/kubeproxy logs for Windows nodes

* Fixing path to look for logs

* generated files

* refactor: standardize to "addons", deprecate "containeraddons" (#2525)

* fix: configure addons before setting kubelet config (#2513)

* chore: update addon-resizer (#2527)

See https://github.com/kubernetes/autoscaler/releases/tag/addon-resizer-1.8.7

* fix: aci-connector region is ignored (#2535)

* test: use LOCATION env var for api model in E2E tests (#2542)

* fix: promote system addons to system-cluster-critical (#2533)

* test: use northeurope for byok testing (#2536)

* Changes for omsagent-version-ciprod01072020

* Committing generated file

Co-authored-by: Jack Francis <jackfrancis@gmail.com>
Co-authored-by: Mark Rossetti <marosset@microsoft.com>
Co-authored-by: Rohit <rjaini@microsoft.com>
Co-authored-by: Rahul Jain <58573065+reachjainrahul@users.noreply.github.com>
Co-authored-by: Matt Boersma <Matt.Boersma@microsoft.com>
Co-authored-by: Javier Darsie <44655727+jadarsie@users.noreply.github.com>
Co-authored-by: Patrick Lang <PatrickLang@users.noreply.github.com>
Co-authored-by: Wenjun Wu <wenjun.wu@live.com>
Co-authored-by: Jaeryn <13284103+jaer-tsun@users.noreply.github.com>
Co-authored-by: Anish Ramasekar <anish.ramasekar@gmail.com>

* deleting github merge auto-generated files

* Adding back 1.17 omsagent yaml changes

* Updating generated file to address build failures

Co-authored-by: Jack Francis <jackfrancis@gmail.com>
Co-authored-by: Mark Rossetti <marosset@microsoft.com>
Co-authored-by: Rohit <rjaini@microsoft.com>
Co-authored-by: Rahul Jain <58573065+reachjainrahul@users.noreply.github.com>
Co-authored-by: Matt Boersma <Matt.Boersma@microsoft.com>
Co-authored-by: Javier Darsie <44655727+jadarsie@users.noreply.github.com>
Co-authored-by: Patrick Lang <PatrickLang@users.noreply.github.com>
Co-authored-by: Wenjun Wu <wenjun.wu@live.com>
Co-authored-by: Jaeryn <13284103+jaer-tsun@users.noreply.github.com>
Co-authored-by: Anish Ramasekar <anish.ramasekar@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set priority class for addons Set priority class for pod-identity
4 participants