Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

fix: configure addons before setting kubelet config #2513

Merged
merged 4 commits into from
Jan 8, 2020

Conversation

aramase
Copy link
Member

@aramase aramase commented Jan 2, 2020

Reason for Change:

As part of PR #2393, the following change was made to move addon configuration after setting kubelet config - 691e3d3#diff-fd43f080184ed15731e5d7e0d917d6e1L486-L490. This change breaks setting kubelet config as kubelet configuration checks for addon configs while setting defaults. For instance --non-masquerade-cidr in kubelet is set to cluster subnet if ip-masq-agent is not enabled. But since the addon was configured after, kubelet assumes masq-agent is not enabled and sets the --non-masquerade-cidr to cluster subnet. This results in ip-masq-agent and kubelet both configuring iptables which could possibly break things.

Issue Fixed:

Requirements:

Notes:

cc @lachie83

@acs-bot acs-bot added the size/XS label Jan 2, 2020
@aramase
Copy link
Member Author

aramase commented Jan 2, 2020

/cc @mboersma @jackfrancis

cs.setAddonsConfig(isUpgrade)

// Configure kubelet
cs.setKubeletConfig(isUpgrade)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on https://github.com/Azure/aks-engine/pull/2393/files#r355038175, was there a reason for changing kubelet config defaults to before addons defaults? ie. is there a dependency introduced in #2393 between addons and kubelet config defaults?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@aramase aramase Jan 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--cluster-domain is always set to cluster.local. So not sure if that addon dependency on kubelet config is required.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ip-masq-agent addon is enabled by default. It's only explicitly disabled by setting kubernetes config/ hosted master. So instead of checking if the addon is enabled, I've changed the check to see if it's explicitly disabled and then perform the logic.

@CecileRobertMichon lmk what you think?

@CecileRobertMichon
Copy link
Contributor

Let's add a unit test that fails before this change and passes after the change to prove this fixes the issue and protect from future breaking changes.

@acs-bot acs-bot added size/M and removed size/XS labels Jan 2, 2020
@codecov
Copy link

codecov bot commented Jan 2, 2020

Codecov Report

Merging #2513 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2513      +/-   ##
==========================================
+ Coverage   72.73%   72.74%   +<.01%     
==========================================
  Files         130      130              
  Lines       24043    24051       +8     
==========================================
+ Hits        17488    17496       +8     
  Misses       5532     5532              
  Partials     1023     1023

@CecileRobertMichon
Copy link
Contributor

/assign @jackfrancis

@acs-bot acs-bot added size/L and removed size/M labels Jan 6, 2020
@@ -499,6 +519,15 @@ func TestKubeletIPMasqAgentEnabledOrDisabled(t *testing.T) {
t.Fatalf("got unexpected '--non-masquerade-cidr' kubelet config value %s, the expected value is %s",
k["--non-masquerade-cidr"], DefaultNonMasqueradeCIDR)
}

// No ip-masq-agent addon configuration specified, --non-masquerade-cidr should be 0.0.0.0/0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aramase @CecileRobertMichon Actually, this additional UT is not what we want. If I add this coverage to the commit prior to #2393 ($ git checkout 6e76fd9e69a7ee8aa61d2d3c559919a460bad5ed) this test fails:

got unexpected '--non-masquerade-cidr' kubelet config value 10.240.0.0/12, the expected value is 0.0.0.0/0

We want to identify exactly what cluster configuration + defaults enforcement was working prior to #2393, and prove that this fix restores that functionality.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackfrancis That is because in the unit tests ip-masq-agent was never enabled. So these tests will fail in all versions prior to this. But going forward it will help us spot such issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is truly a fix, then something must have broken in #2393 that can be expressed as a UT, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was never unit tested before. There are individual unit tests for setKubeletConfig and setAddonConfig, but none which check the desired result after running both.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change the ip-masq-agent enabled/disabled isn't really dependent on setAddonConfig anymore as the check to see if ip-masq is enabled/disabled is done outside of that while setting non-masq-cidr. So I feel this unit test is appropriate. This unit test addition is essentially increasing the coverage for the path we didn't check before.

Copy link
Member

@jackfrancis jackfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@acs-bot acs-bot added the lgtm label Jan 8, 2020
@jackfrancis jackfrancis merged commit 76f1a69 into Azure:master Jan 8, 2020
@aramase aramase deleted the fix-addon-config-bug branch January 8, 2020 00:10
@acs-bot
Copy link

acs-bot commented Jan 8, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aramase, jackfrancis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

acs-bot pushed a commit that referenced this pull request Jan 9, 2020
* Rashmi/ciprod12042019 (#3)

improvement: Update Containermonitoring addon for december release (#1)

* Fixing yaml indentation issues for omsagent (#4)

Fixing yaml indentation issues for omsagent

* Fixing more indentation issues in omsagent yaml (#5)

Fixing more indentation issues in omsagent yaml

* updating templates_generated file

* Fix: Updating generated file to fix unit tests (#6)

fix: Updating generated file to fix unit tests

* Adding test coverage for 1.16 and 1.17 (#7)

Adding test coverage for 1.16 and 1.17

* Merging changes for omsagent latest release - ciprod01072020 after syncing from remote master(#8)

* chore: use go template comments for generate proxy certs script (#2336)

* fix: fix ARM dependency issues with vm user-specified extensions on node pools (#2398)

* fix: fix ARM dependency issues if many extensions are specified for a node profile

* fix scale up case for windows vhd case. (#2483)

* refactor: make cilium addon user-configurable (#2480)

* refactor: make cilium addon user-configurable

* chore: clarify that cilium doesn't work w/ 1.16 and above, add validation

* test: addons UT

* test: go template UT

* ci: use Standard_D8_v3 for cilium test, only run NetworkPolicy tests

* fix: error message language

* chore: remove debug fmt.Println

* ci: revert back to Standard_D2_v3

* chore: upgrade cni-plugins to v0.7.6 (#2484)

* fix: hard-coding hyper-v generation when using VHD URls as a quick unblock (#2487)

* feat: Configuring docker log rotation for Windows nodes (#2478)

* feat: Antrea plugin support in AKS Engine (#2407)

* Antrea plugin support in AKS Engine

* chore: clean up

* chore: use ContainerImage

* chore: generated code

* refactor: Updating antrea yaml to 0.2.0

Co-authored-by: Jack Francis <jackfrancis@gmail.com>

* chore: lint (#2493)

* test: revert change to default kubernetes.json api model example (#2494)

* chore: update cloud-provider-azure components to v0.4.0 (#2473)

* chore: update cloud-provider-azure components to v0.4.0

See https://github.com/kubernetes-sigs/cloud-provider-azure/releases/tag/v0.4.0

* refactor: strip MCR constant to base hostname of URL

* fix: fetch Azure cloud-manager images from /oss/kubernetes/

* refactor: make audit-policy and azure-cloud-provider addons user-configurable (#2496)

* chore: pre-pull k8s v1.15.7-azs (#2490)

* fix: Fix some path handling in collect-windows-logs script (#2488)

* docs: remove mentions of old orchestrators (#2501)

* chore: Targeting dec patches for windows VHD (#2505)

* refactor: move StorageClass into azure-cloud-provider addon (#2497)

* add "Standard_DS3_v2" to "AcceleratedNetworking" supported list (#2509)

* ci: collect logs during E2E runs (#2520)

* refactor: user-configurable flannel and scheduled maintenance addons (#2517)

* chore: update Azure NPM to v1.0.31 (#2521)

* feat: add support for Kubernetes 1.18.0-alpha.1 (#2503)

* feat: add support for Kubernetes 1.18.0-alpha.1

See https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.18.md#changelog-since-v1170

* test: add 1.18 to Jenkinsfile

* ci: disable kms for 1.18

* chore: move flannel 1.18 spec to containeraddons

* chore: generated code

* fix: use new cloudprovider implementation for 1.18

Co-authored-by: Jack Francis <jackfrancis@gmail.com>

* test: don't test non-working >= 1.16 flannel + docker (#2524)

* fix: apply new master node labels for k8s v1.18+ compatibility (#2467)

* fix: apply new master node labels for k8s v1.18+ compatibility

* test: check master labels in the future for back-compat

* feat: cleaning up old kubelet/kubeproxy logs for Windows nodes (#2504)

* feat: cleaning up old kubelet/kubeproxy logs for Windows nodes

* Fixing path to look for logs

* generated files

* refactor: standardize to "addons", deprecate "containeraddons" (#2525)

* fix: configure addons before setting kubelet config (#2513)

* chore: update addon-resizer (#2527)

See https://github.com/kubernetes/autoscaler/releases/tag/addon-resizer-1.8.7

* fix: aci-connector region is ignored (#2535)

* test: use LOCATION env var for api model in E2E tests (#2542)

* fix: promote system addons to system-cluster-critical (#2533)

* test: use northeurope for byok testing (#2536)

* Changes for omsagent-version-ciprod01072020

* Committing generated file

Co-authored-by: Jack Francis <jackfrancis@gmail.com>
Co-authored-by: Mark Rossetti <marosset@microsoft.com>
Co-authored-by: Rohit <rjaini@microsoft.com>
Co-authored-by: Rahul Jain <58573065+reachjainrahul@users.noreply.github.com>
Co-authored-by: Matt Boersma <Matt.Boersma@microsoft.com>
Co-authored-by: Javier Darsie <44655727+jadarsie@users.noreply.github.com>
Co-authored-by: Patrick Lang <PatrickLang@users.noreply.github.com>
Co-authored-by: Wenjun Wu <wenjun.wu@live.com>
Co-authored-by: Jaeryn <13284103+jaer-tsun@users.noreply.github.com>
Co-authored-by: Anish Ramasekar <anish.ramasekar@gmail.com>

* deleting github merge auto-generated files

* Adding back 1.17 omsagent yaml changes

* Updating generated file to address build failures

Co-authored-by: Jack Francis <jackfrancis@gmail.com>
Co-authored-by: Mark Rossetti <marosset@microsoft.com>
Co-authored-by: Rohit <rjaini@microsoft.com>
Co-authored-by: Rahul Jain <58573065+reachjainrahul@users.noreply.github.com>
Co-authored-by: Matt Boersma <Matt.Boersma@microsoft.com>
Co-authored-by: Javier Darsie <44655727+jadarsie@users.noreply.github.com>
Co-authored-by: Patrick Lang <PatrickLang@users.noreply.github.com>
Co-authored-by: Wenjun Wu <wenjun.wu@live.com>
Co-authored-by: Jaeryn <13284103+jaer-tsun@users.noreply.github.com>
Co-authored-by: Anish Ramasekar <anish.ramasekar@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants