Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AKS: avoid overlapping pod and service CIDRs #31504

Merged
merged 1 commit into from Mar 19, 2024

Conversation

bimmlerd
Copy link
Member

The default service CIDR of AKS clusters is 10.0.0.0/16 1. Unfortunately, we don't set a pod cidr for clusterpool IPAM, and hence use cilium's default of 10.0.0.0/8, which overlaps. This can lead to "fun" situations in which e.g. the kube-dns service ClusterIP is the same as the hubble-relay pod IP, or similar shenanigans. This usually breaks the cluster utterly.

The fix is relatively straight-forward: set a pod CIDR for cilium which does not overlap with defaults of AKS. We chose 192.168.0.0/16 as this is what is recommended in 2.

Fixes: fbf3d38 (ci: add AKS workflow)

Fixes: #30905

The default service CIDR of AKS clusters is 10.0.0.0/16 [1].
Unfortunately, we don't set a pod cidr for clusterpool IPAM, and hence
use cilium's default of 10.0.0.0/8, which overlaps. This can
lead to "fun" situations in which e.g. the kube-dns service ClusterIP is
the same as the hubble-relay pod IP, or similar shenanigans. This
usually breaks the cluster utterly.

The fix is relatively straight-forward: set a pod CIDR for cilium which
does not overlap with defaults of AKS. We chose 192.168.0.0/16 as this
is what is recommended in [2].

[1]: https://learn.microsoft.com/en-us/azure/aks/configure-kubenet#create-an-aks-cluster-with-system-assigned-managed-identities
[2]: https://learn.microsoft.com/en-us/azure/aks/azure-cni-powered-by-cilium#option-1-assign-ip-addresses-from-an-overlay-network

Fixes: fbf3d38 (ci: add AKS workflow)

Co-authored-by: Fabian Fischer <fabian.fischer@isovalent.com>
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
@bimmlerd bimmlerd added kind/bug/CI This is a bug in the testing code. area/CI Continuous Integration testing issue or flake release-note/ci This PR makes changes to the CI. labels Mar 19, 2024
@bimmlerd
Copy link
Member Author

/ci-aks

@bimmlerd bimmlerd marked this pull request as ready for review March 19, 2024 15:38
@bimmlerd bimmlerd requested review from a team as code owners March 19, 2024 15:38
@bimmlerd
Copy link
Member Author

/test

Copy link
Contributor

@viktor-kurchenko viktor-kurchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 19, 2024
@tklauser tklauser added this pull request to the merge queue Mar 19, 2024
Merged via the queue into main with commit fbe78c4 Mar 19, 2024
73 checks passed
@tklauser tklauser deleted the pr/bimmlerd/fix-aks-pod-cidr branch March 19, 2024 16:25
@bimmlerd
Copy link
Member Author

@viktor-kurchenko I don't know enough about our CI to know whether this should be backported or not - are these workflows run from main or stable branches?

Also, do we have this bug on other cloud providers? Seems like an easy oversight to make.

Finally, this probably also wants to be documented somewhere?

@giorio94 giorio94 added needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Mar 19, 2024
@giorio94
Copy link
Member

I don't know enough about our CI to know whether this should be backported or not - are these workflows run from main or stable branches?

Marked for backport to all stable branches, as run from the respective workflow files.

@viktor-kurchenko
Copy link
Contributor

Finally, this probably also wants to be documented somewhere?

@bimmlerd sorry for the delay. To be honest I don't where we can document this (

tklauser added a commit to cilium/cilium-cli that referenced this pull request Mar 22, 2024
Follow cilium/cilium#31504, quoting its commit
description:

> The default service CIDR of AKS clusters is 10.0.0.0/16 [1].
> Unfortunately, we don't set a pod cidr for clusterpool IPAM, and hence
> use cilium's default of 10.0.0.0/8, which overlaps. This can
> lead to "fun" situations in which e.g. the kube-dns service ClusterIP is
> the same as the hubble-relay pod IP, or similar shenanigans. This
> usually breaks the cluster utterly.
>
> The fix is relatively straight-forward: set a pod CIDR for cilium which
> does not overlap with defaults of AKS. We chose 192.168.0.0/16 as this
> is what is recommended in [2].
>
> [1]: https://learn.microsoft.com/en-us/azure/aks/configure-kubenet#create-an-aks-cluster-with-system-assigned-managed-identities
> [2]: https://learn.microsoft.com/en-us/azure/aks/azure-cni-powered-by-cilium#option-1-assign-ip-addresses-from-an-overlay-network

Co-authored-by: Fabian Fischer <fabian.fischer@isovalent.com>
Co-authored-by: David Bimmler <david.bimmler@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
@sayboras sayboras mentioned this pull request Mar 24, 2024
6 tasks
@sayboras sayboras added backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. and removed needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Mar 24, 2024
@sayboras sayboras mentioned this pull request Mar 24, 2024
4 tasks
@sayboras sayboras added the backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. label Mar 24, 2024
@sayboras sayboras removed the needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch label Mar 24, 2024
@sayboras sayboras mentioned this pull request Mar 24, 2024
2 tasks
@sayboras sayboras added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Mar 24, 2024
@github-actions github-actions bot added backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Mar 25, 2024
tklauser added a commit to cilium/cilium-cli that referenced this pull request Mar 26, 2024
Follow cilium/cilium#31504, quoting its commit
description:

> The default service CIDR of AKS clusters is 10.0.0.0/16 [1].
> Unfortunately, we don't set a pod cidr for clusterpool IPAM, and hence
> use cilium's default of 10.0.0.0/8, which overlaps. This can
> lead to "fun" situations in which e.g. the kube-dns service ClusterIP is
> the same as the hubble-relay pod IP, or similar shenanigans. This
> usually breaks the cluster utterly.
>
> The fix is relatively straight-forward: set a pod CIDR for cilium which
> does not overlap with defaults of AKS. We chose 192.168.0.0/16 as this
> is what is recommended in [2].
>
> [1]: https://learn.microsoft.com/en-us/azure/aks/configure-kubenet#create-an-aks-cluster-with-system-assigned-managed-identities
> [2]: https://learn.microsoft.com/en-us/azure/aks/azure-cni-powered-by-cilium#option-1-assign-ip-addresses-from-an-overlay-network

Co-authored-by: Fabian Fischer <fabian.fischer@isovalent.com>
Co-authored-by: David Bimmler <david.bimmler@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
tklauser added a commit to cilium/cilium-cli that referenced this pull request Mar 26, 2024
Follow cilium/cilium#31504, quoting its commit
description:

> The default service CIDR of AKS clusters is 10.0.0.0/16 [1].
> Unfortunately, we don't set a pod cidr for clusterpool IPAM, and hence
> use cilium's default of 10.0.0.0/8, which overlaps. This can
> lead to "fun" situations in which e.g. the kube-dns service ClusterIP is
> the same as the hubble-relay pod IP, or similar shenanigans. This
> usually breaks the cluster utterly.
>
> The fix is relatively straight-forward: set a pod CIDR for cilium which
> does not overlap with defaults of AKS. We chose 192.168.0.0/16 as this
> is what is recommended in [2].
>
> [1]: https://learn.microsoft.com/en-us/azure/aks/configure-kubenet#create-an-aks-cluster-with-system-assigned-managed-identities
> [2]: https://learn.microsoft.com/en-us/azure/aks/azure-cni-powered-by-cilium#option-1-assign-ip-addresses-from-an-overlay-network

Co-authored-by: Fabian Fischer <fabian.fischer@isovalent.com>
Co-authored-by: David Bimmler <david.bimmler@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. kind/bug/CI This is a bug in the testing code. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: Conformance AKS - Hubble Relay CrashLoopBackOff
5 participants