Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aks: fix AKS cluster creation following new taint limitations #17529

Merged
merged 2 commits into from Oct 13, 2021

Conversation

nbusseneau
Copy link
Member

@nbusseneau nbusseneau commented Oct 4, 2021

Context: we recommend users taint all nodepools with node.cilium.io/agent-not-ready=true:NoSchedule to prevent application pods from being managed by the default AKS CNI plugin.

To this end, the proposed workflow users should follow when installing Cilium into AKS was to replace the initial AKS node pool with a new tainted system node pool, as it is not possible to taint the initial AKS node pool, cf. Azure/AKS#1402.

AKS recently pushed a change on the API side that forbids setting up custom taints on system node pools, cf. Azure/AKS#2578.

It is not possible anymore for us to recommend users taint all nodepools with node.cilium.io/agent-not-ready=true:NoSchedule to prevent application pods from being managed by the default AKS CNI plugin.

To work around this new limitation, we propose the following workflow instead:

  • Replace the initial node pool with a system node pool tainted with CriticalAddonsOnly=true:NoSchedule, preventing application pods from being scheduled on it.
  • Create a secondary user node pool tainted with node.cilium.io/agent-not-ready=true:NoSchedule to prevent application pods from being scheduled on the user node pool until Cilium is ready to manage them.

@nbusseneau nbusseneau added area/CI Continuous Integration testing issue or flake release-note/ci This PR makes changes to the CI. integration/cloud Related to integration with cloud environments such as AKS, EKS, GKE, etc. labels Oct 4, 2021
@nbusseneau nbusseneau requested a review from aanm October 4, 2021 16:57
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.10.5 Oct 4, 2021
@nbusseneau nbusseneau force-pushed the pr/workflows-fix-aks-pools branch 3 times, most recently from e31024d to 4500382 Compare October 11, 2021 16:10
@nbusseneau
Copy link
Member Author

nbusseneau commented Oct 11, 2021

Link to test run of AKS workflow: https://github.com/cilium/cilium/actions/runs/1333557658

@nbusseneau nbusseneau marked this pull request as ready for review October 11, 2021 16:29
@nbusseneau nbusseneau requested review from a team as code owners October 11, 2021 16:29
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit in the docs but otherwise makes sense.

I debated a little bit about the use of .. code-block:: bash rather than .. code-block:: shell-session but in this case unless we dress up the # lines with some extra #s at the beginning, I think that shell-session would interpret the comments as commands to run as root. So bash seems fine to me (also consistent with other commands in the same guide).

Documentation/gettingstarted/k8s-install-default.rst Outdated Show resolved Hide resolved
.github/workflows/conformance-aks-v1.10.yaml Outdated Show resolved Hide resolved
.github/workflows/conformance-aks.yaml Outdated Show resolved Hide resolved
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 looking simpler which is always a win. I have some minor wording nits / suggestions to try to integrate the explanations more directly into the text. Feel free to take/leave/modify whichever you think improve on the text.

Documentation/gettingstarted/k8s-install-default.rst Outdated Show resolved Hide resolved
Documentation/gettingstarted/k8s-install-default.rst Outdated Show resolved Hide resolved
Documentation/gettingstarted/k8s-install-default.rst Outdated Show resolved Hide resolved
Context: we recommend users taint all nodepools with `node.cilium.io/agent-not-ready=true:NoSchedule`
to prevent application pods from being managed by the default AKS CNI
plugin.

To this end, the proposed workflow users should follow when installing
Cilium into AKS was to replace the initial AKS node pool with a new
tainted system node pool, as it is not possible to taint the initial AKS
node pool, cf. Azure/AKS#1402.

AKS recently pushed a change on the API side that forbids setting up
custom taints on system node pools, cf. Azure/AKS#2578.

It is not possible anymore for us to recommend users taint all nodepools
with `node.cilium.io/agent-not-ready=true:NoSchedule` to prevent
application pods from being managed by the default AKS CNI plugin.

To work around this new limitation, we propose the following workflow
instead:

- Replace the initial node pool with a system node pool tainted with
  `CriticalAddonsOnly=true:NoSchedule`, preventing application pods from
  being scheduled on it.
- Create a secondary user node pool tainted with `node.cilium.io/agent-not-ready=true:NoSchedule`
  to prevent application pods from being scheduled on the user node pool
  until Cilium is ready to manage them.

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
See 36be0aa for full context.

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
@nbusseneau
Copy link
Member Author

nbusseneau commented Oct 13, 2021

All reviews are in, no need to run any additional CI as this only modifies documentation and the AKS workflow, removing temp test commit and marking as ready to merge.

The following AKS workflows can be re-enabled after merging:

I will also apply similar changes to cilium-cli tomorrow.

@nbusseneau nbusseneau added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 13, 2021
@aanm aanm merged commit c5f5de8 into master Oct 13, 2021
@aanm aanm deleted the pr/workflows-fix-aks-pools branch October 13, 2021 22:03
@joestringer joestringer added this to Needs backport from master in 1.10.6 Oct 13, 2021
@joestringer joestringer removed this from Needs backport from master in 1.10.5 Oct 13, 2021
nbusseneau added a commit to cilium/cilium-cli that referenced this pull request Oct 15, 2021
Re-impacted from: cilium/cilium#17529

Context: we recommend users taint all nodepools with `node.cilium.io/agent-not-ready=true:NoSchedule`
to prevent application pods from being managed by the default AKS CNI
plugin.

To this end, the proposed workflow users should follow when installing
Cilium into AKS was to replace the initial AKS node pool with a new
tainted system node pool, as it is not possible to taint the initial AKS
node pool, cf. Azure/AKS#1402.

AKS recently pushed a change on the API side that forbids setting up
custom taints on system node pools, cf. Azure/AKS#2578.

It is not possible anymore for us to recommend users taint all nodepools
with `node.cilium.io/agent-not-ready=true:NoSchedule` to prevent
application pods from being managed by the default AKS CNI plugin.

To work around this new limitation, we propose the following workflow
instead:

- Replace the initial node pool with a system node pool tainted with
  `CriticalAddonsOnly=true:NoSchedule`, preventing application pods from
  being scheduled on it.
- Create a secondary user node pool tainted with `node.cilium.io/agent-not-ready=true:NoSchedule`
  to prevent application pods from being scheduled on the user node pool
  until Cilium is ready to manage them.

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
@joamaki
Copy link
Contributor

joamaki commented Oct 18, 2021

@nbusseneau Could you take a look at how this should be backported, if at all? These files do not exist in 1.10 due to 65eb99a.

@nbusseneau
Copy link
Member Author

@joamaki I added the backport label for the documentation changes. The workflows indeed are already "backported" in a sense, since they live in master even though they are used by the 1.10 PRs.

nbusseneau added a commit to cilium/cilium-cli that referenced this pull request Oct 18, 2021
Re-impacted from: cilium/cilium#17529

Context: we recommend users taint all nodepools with `node.cilium.io/agent-not-ready=true:NoSchedule`
to prevent application pods from being managed by the default AKS CNI
plugin.

To this end, the proposed workflow users should follow when installing
Cilium into AKS was to replace the initial AKS node pool with a new
tainted system node pool, as it is not possible to taint the initial AKS
node pool, cf. Azure/AKS#1402.

AKS recently pushed a change on the API side that forbids setting up
custom taints on system node pools, cf. Azure/AKS#2578.

It is not possible anymore for us to recommend users taint all nodepools
with `node.cilium.io/agent-not-ready=true:NoSchedule` to prevent
application pods from being managed by the default AKS CNI plugin.

To work around this new limitation, we propose the following workflow
instead:

- Replace the initial node pool with a system node pool tainted with
  `CriticalAddonsOnly=true:NoSchedule`, preventing application pods from
  being scheduled on it.
- Create a secondary user node pool tainted with `node.cilium.io/agent-not-ready=true:NoSchedule`
  to prevent application pods from being scheduled on the user node pool
  until Cilium is ready to manage them.

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
nbusseneau added a commit to cilium/cilium-cli that referenced this pull request Oct 19, 2021
Re-impacted from: cilium/cilium#17529

Context: we recommend users taint all nodepools with `node.cilium.io/agent-not-ready=true:NoSchedule`
to prevent application pods from being managed by the default AKS CNI
plugin.

To this end, the proposed workflow users should follow when installing
Cilium into AKS was to replace the initial AKS node pool with a new
tainted system node pool, as it is not possible to taint the initial AKS
node pool, cf. Azure/AKS#1402.

AKS recently pushed a change on the API side that forbids setting up
custom taints on system node pools, cf. Azure/AKS#2578.

It is not possible anymore for us to recommend users taint all nodepools
with `node.cilium.io/agent-not-ready=true:NoSchedule` to prevent
application pods from being managed by the default AKS CNI plugin.

To work around this new limitation, we propose the following workflow
instead:

- Replace the initial node pool with a system node pool tainted with
  `CriticalAddonsOnly=true:NoSchedule`, preventing application pods from
  being scheduled on it.
- Create a secondary user node pool tainted with `node.cilium.io/agent-not-ready=true:NoSchedule`
  to prevent application pods from being scheduled on the user node pool
  until Cilium is ready to manage them.

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
aanm pushed a commit to cilium/cilium-cli that referenced this pull request Oct 20, 2021
Re-impacted from: cilium/cilium#17529

Context: we recommend users taint all nodepools with `node.cilium.io/agent-not-ready=true:NoSchedule`
to prevent application pods from being managed by the default AKS CNI
plugin.

To this end, the proposed workflow users should follow when installing
Cilium into AKS was to replace the initial AKS node pool with a new
tainted system node pool, as it is not possible to taint the initial AKS
node pool, cf. Azure/AKS#1402.

AKS recently pushed a change on the API side that forbids setting up
custom taints on system node pools, cf. Azure/AKS#2578.

It is not possible anymore for us to recommend users taint all nodepools
with `node.cilium.io/agent-not-ready=true:NoSchedule` to prevent
application pods from being managed by the default AKS CNI plugin.

To work around this new limitation, we propose the following workflow
instead:

- Replace the initial node pool with a system node pool tainted with
  `CriticalAddonsOnly=true:NoSchedule`, preventing application pods from
  being scheduled on it.
- Create a secondary user node pool tainted with `node.cilium.io/agent-not-ready=true:NoSchedule`
  to prevent application pods from being scheduled on the user node pool
  until Cilium is ready to manage them.

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
@joestringer joestringer moved this from Needs backport from master to Backport done to v1.10 in 1.10.6 Dec 3, 2021
aditighag pushed a commit to aditighag/cilium-cli that referenced this pull request Apr 21, 2023
Re-impacted from: cilium/cilium#17529

Context: we recommend users taint all nodepools with `node.cilium.io/agent-not-ready=true:NoSchedule`
to prevent application pods from being managed by the default AKS CNI
plugin.

To this end, the proposed workflow users should follow when installing
Cilium into AKS was to replace the initial AKS node pool with a new
tainted system node pool, as it is not possible to taint the initial AKS
node pool, cf. Azure/AKS#1402.

AKS recently pushed a change on the API side that forbids setting up
custom taints on system node pools, cf. Azure/AKS#2578.

It is not possible anymore for us to recommend users taint all nodepools
with `node.cilium.io/agent-not-ready=true:NoSchedule` to prevent
application pods from being managed by the default AKS CNI plugin.

To work around this new limitation, we propose the following workflow
instead:

- Replace the initial node pool with a system node pool tainted with
  `CriticalAddonsOnly=true:NoSchedule`, preventing application pods from
  being scheduled on it.
- Create a secondary user node pool tainted with `node.cilium.io/agent-not-ready=true:NoSchedule`
  to prevent application pods from being scheduled on the user node pool
  until Cilium is ready to manage them.

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
michi-covalent pushed a commit to michi-covalent/cilium that referenced this pull request May 30, 2023
Re-impacted from: cilium#17529

Context: we recommend users taint all nodepools with `node.cilium.io/agent-not-ready=true:NoSchedule`
to prevent application pods from being managed by the default AKS CNI
plugin.

To this end, the proposed workflow users should follow when installing
Cilium into AKS was to replace the initial AKS node pool with a new
tainted system node pool, as it is not possible to taint the initial AKS
node pool, cf. Azure/AKS#1402.

AKS recently pushed a change on the API side that forbids setting up
custom taints on system node pools, cf. Azure/AKS#2578.

It is not possible anymore for us to recommend users taint all nodepools
with `node.cilium.io/agent-not-ready=true:NoSchedule` to prevent
application pods from being managed by the default AKS CNI plugin.

To work around this new limitation, we propose the following workflow
instead:

- Replace the initial node pool with a system node pool tainted with
  `CriticalAddonsOnly=true:NoSchedule`, preventing application pods from
  being scheduled on it.
- Create a secondary user node pool tainted with `node.cilium.io/agent-not-ready=true:NoSchedule`
  to prevent application pods from being scheduled on the user node pool
  until Cilium is ready to manage them.

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake integration/cloud Related to integration with cloud environments such as AKS, EKS, GKE, etc. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.
Projects
No open projects
1.10.6
Backport done to v1.10
Development

Successfully merging this pull request may close these issues.

None yet

5 participants