Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-pool: Determine IP pool based on ipam.cilium.io/ip-pool annotation #25511

Merged
merged 8 commits into from May 30, 2023

Conversation

gandro
Copy link
Member

@gandro gandro commented May 17, 2023

This adds a mechanism which allows workloads to specify what IP pool their IP should be allocated from in multi-pool IPAM mode (see #22762 for details).

It will do that by first checking if the pod has a ipam.cilium.io/ip-pool annotation, and if present, will return that pool to be used for IP allocation. If the pod does not have such an annotation, it will then check for the same annotation on the pod's namespace. If that is also not present, we fall back on the default pool.

💡 Review per commit

I have manually tested this on kind as follows:

$ helm upgrade --install cilium ./install/kubernetes/cilium --namespace kube-system \
    --set debug.enabled=true \
    --set image.override=localhost:5000/cilium/cilium-dev:local \
    --set image.pullPolicy=IfNotPresent \
    --set operator.image.override=localhost:5000/cilium/operator-generic:local \
    --set operator.image.pullPolicy=IfNotPresent \
    --set operator.replicas=1 \
    --set hubble.relay.enabled=true \
    --set ipam.mode=multi-pool \
    --set "ipam.operator.multiPoolMap.default.ipv4CIDRs[0]=10.10.0.0/16" \
    --set "ipam.operator.multiPoolMap.default.ipv4MaskSize=27" \
    --set "ipam.operator.multiPoolMap.mars.ipv4CIDRs[0]=10.20.0.0/16" \
    --set "ipam.operator.multiPoolMap.mars.ipv4MaskSize=24" \
    --set "extraConfig.multi-pool-node-pre-alloc=default=64\,mars=64" \
    --set tunnel=disabled \
    --set autoDirectNodeRoutes=true \
    --set ipv4NativeRoutingCIDR=10.0.0.0/8 \
    --set endpointRoutes.enabled=true \
    --set-string extraConfig.enable-local-node-route=false \
    --set kubeProxyReplacement=strict \
    --set bpf.masquerade=true
$ k create ns cilium-test
$ k annotate ns cilium-test ipam.cilium.io/ip-pool=mars
$ cilium connectivity test
...
✅ All 41 tests (291 actions) successful, 8 tests skipped, 0 scenarios skipped.
$ k -n default get pods -o wide
NAME                         READY   STATUS    RESTARTS   AGE   IP            NODE           NOMINATED NODE   READINESS GATES
deathstar-54bb8475cc-89hwl   1/1     Running   0          9m    10.10.1.77    kind-worker2   <none>           <none>
deathstar-54bb8475cc-8gfdd   1/1     Running   0          9m    10.10.0.83    kind-worker    <none>           <none>
tiefighter                   1/1     Running   0          9m    10.10.1.103   kind-worker    <none>           <none>
xwing                        1/1     Running   0          9m    10.10.1.101   kind-worker    <none>           <none>
$ k -n cilium-test get pods -o wide
NAME                               READY   STATUS    RESTARTS   AGE     IP            NODE           NOMINATED NODE   READINESS GATES
client-7b78db77d5-dkzh6            1/1     Running   0          7m59s   10.20.0.94    kind-worker    <none>           <none>
client2-78f748dd67-j2qhg           1/1     Running   0          7m59s   10.20.0.103   kind-worker    <none>           <none>
echo-other-node-8689b5dd4f-6pk5l   2/2     Running   0          7m59s   10.20.2.182   kind-worker2   <none>           <none>
echo-same-node-5b55d8bdd7-jdb66    2/2     Running   0          7m59s   10.20.0.225   kind-worker    <none>           <none>
$ # note the different IP range for the cilium-test pods                ⬆️

CI and documentation will be added in a separate PR. See meta issue #25470

@gandro gandro added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. area/ipam Impacts IP address management functionality. labels May 17, 2023
pkg/annotation/k8s.go Outdated Show resolved Hide resolved
@gandro gandro force-pushed the pr/gandro/multi-pool-annotations branch 5 times, most recently from 5070857 to 3751245 Compare May 23, 2023 14:03
@gandro gandro mentioned this pull request May 23, 2023
29 tasks
@gandro gandro force-pushed the pr/gandro/multi-pool-annotations branch from 3751245 to 5416626 Compare May 23, 2023 14:10
@gandro gandro self-assigned this May 23, 2023
@gandro gandro force-pushed the pr/gandro/multi-pool-annotations branch 2 times, most recently from e4566b8 to dfa9c7e Compare May 23, 2023 14:53
@gandro gandro marked this pull request as ready for review May 23, 2023 14:54
@gandro gandro requested review from a team as code owners May 23, 2023 14:54
@gandro
Copy link
Member Author

gandro commented May 23, 2023

/test

@gandro gandro added the release-blocker/1.14 This issue will prevent the release of the next version of Cilium. label May 23, 2023
@gandro gandro requested a review from tklauser May 23, 2023 16:03
@christarazi christarazi added sig/ipam IP address management, including cloud IPAM kind/feature This introduces new functionality. labels May 23, 2023
Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-done commits, very clean history! Makes it a breeze to review. Nothing major from my side.

pkg/ipam/metadata/manager.go Outdated Show resolved Hide resolved
Copy link
Member

@nathanjsweet nathanjsweet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for API

@gandro
Copy link
Member Author

gandro commented May 25, 2023

/test

@yuriydzobak
Copy link
Contributor

Hi @yuriydzobak! Those are great questions, thanks!

  1. Would it be possible to allocate CIDR dynamically when the target Pod is scheduled on a node?

Yes, I'm currently working on that (will be added in a follow-up PR). Thanks for pointing out the /30 use-case, I didn't have that in mind for the dynamic per-allocation mechanism. I do agree that there are probably use-cases were we do not want to over-allocate.

  1. Will BGP Control plane announce the pod CIDR for the multi pool IPAM in the future?

As you correctly observed, it's not supported at the moment. Unfortunately, the v1.14 feature freeze is coming up quick and I'm not sure if I'll have the cycles to add support for it till then (having said that, PRs welcome!). That doesn't mean that we won't be able to add it in v1.15 though. multi-pool will be a "beta" feature for v1.14 anyway, since it did not yet have production-grade testing.

@ysksuzuki ☝🏻
tagged wrong person =)

@gandro
Copy link
Member Author

gandro commented May 25, 2023

Thanks, I should not rely on GitHub's auto-completion too much 🙈

@gandro gandro force-pushed the pr/gandro/multi-pool-annotations branch from e4885bf to 0d4ba6a Compare May 25, 2023 12:16
@gandro
Copy link
Member Author

gandro commented May 25, 2023

Last push fixes the unit test needed due to the new error introduced here: #25511 (comment)

Restarting tests.

@gandro
Copy link
Member Author

gandro commented May 25, 2023

/test

@@ -554,6 +561,9 @@ func (d *Daemon) startIPAM() {
log.Info("Initializing node addressing")
// Set up ipam conf after init() because we might be running d.conf.KVStoreIPv4Registration
d.ipam = ipam.NewIPAM(d.datapath.LocalNodeAddressing(), option.Config, d.nodeDiscovery, d.k8sWatcher, &d.mtuConfig, d.clientset)
if d.ipamMetadata != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read the code correctly, d.ipamMetadata is already created via hive? Why add a fallback here, and not where the ipamMetaddata is initialized?

Copy link
Member Author

@gandro gandro May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hive ipamMetaddata constructor will return nil if it's running in an unsupported IPAM mode:
https://github.com/cilium/cilium/blob/0d4ba6a6d4edb21c92f6e469566d4d408c2ac31b/pkg/ipam/metadata/cell.go#L34-L36

And I do not want to pass in a potentially nil value as an interface type into WithMetadata because it will break the nil checks here (i.e. they will not detect that metadata is nil due to nil interface types):
https://github.com/cilium/cilium/blob/0d4ba6a6d4edb21c92f6e469566d4d408c2ac31b/pkg/ipam/allocator.go#L34-L36

Example: https://go.dev/play/p/z8C_cU0eMdU

@gandro gandro force-pushed the pr/gandro/multi-pool-annotations branch from 0d4ba6a to 368278d Compare May 25, 2023 16:49
@gandro
Copy link
Member Author

gandro commented May 25, 2023

Last push fixes a ineffassign lint error in the unit tests. Rerunning tests (everything was green otherwise)

@gandro
Copy link
Member Author

gandro commented May 25, 2023

/test

@ysksuzuki
Copy link
Member

@gandro Thank you for the reply!

Yes, I'm currently working on that (will be added in a follow-up PR). Thanks for pointing out the /30 use-case, I didn't have that in mind for the dynamic per-allocation mechanism. I do agree that there are probably use-cases were we do not want to over-allocate.

Thanks! I'm glad to hear that.

As you correctly observed, it's not supported at the moment. Unfortunately, the v1.14 feature freeze is coming up quick and I'm not sure if I'll have the cycles to add support for it till then (having said that, PRs welcome!). That doesn't mean that we won't be able to add it in v1.15 though. multi-pool will be a "beta" feature for v1.14 anyway, since it did not yet have production-grade testing.

We're not in such a hurry, so supporting it in v1.15 or later is OK. (I would of course be happy to contribute)

A bit out of the topic, but it would be useful to export the Pod CIDR to the kernel routing table or a dummy interface so that other BGP software, such as BIRD or FRR, can import and announce the PodCIDR.
Actually, I discussed it offline with @YutaroHayakawa yesterday😁

bird.conf

ipv4 table ciliumtab;
protocol kernel 'cilium' {
    kernel table 119;  # cilium exports
    learn;
    scan time 1;
    ipv4 {
        table ciliumtab;
        import all;
        export none;
    };
}
protocol pipe {
    table master4;
    peer table ciliumtab;
    import all;
    export none;
}

@gandro gandro requested a review from lmb May 26, 2023 15:34
@nathanjsweet nathanjsweet added the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label May 26, 2023
This adds a new shared resource for local pods, i.e. pods scheduled on
the local node. This is useful for cells which need to access the pod
resource for a Cilium-managed endpoint.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This adds the IPAM pool name to the AddressPair type which is both the
result of an IPAM allocation and an argument of the EndpointCreate
request. This allows the API client to know the pool effectively used to
allocate an IP.  The caller needs to keep track of this, as it will have
to pass the pool back to the IPAM subsystem when the IP is released.

In Cilium, the IP is typically not released not by the CNI plugin, but
directly by the endpoint package when the endpoint is deleted. This is
why the field is added to the AddressPair type (which shared between the
IPAM and endpoint types), rather than just to the IPAMAddressResponse.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This adds a field to the endpoint structure to track where the endpoint's IP
has been allocated from. This is needed both for IP release, as well as
endpoint restoration.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This adds an annotation which may be added to pods or namespaces to determine
the IP pool should be used for those workloads.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This adds a new cell which is responsible for determining the IP pool of a
particular pod. It will do that by first checking if the pod has the `ip-pool`
annotation, and if present, will return that pool to be used for IP allocation.
If the pod does not have such an annotation, it will then check for the same
annotation on the namespace. If that is also not present, we fall back on the
`default` pool.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This removes a functionality which was added for a bug fix in
03df4a5, but then made obsolete in
82a8c71 when the bug fix turned out to
be insufficient.

There are no other users of this funcionality. This commit also moves
the parsing of the IP address back into the API handler. This ensures
that the API actually returns the correct (i.e. documented) HTTP error
code when the IP is invalid.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit contains no functional changes, but moves the variable
definition of `err` from a local variable to a named return value. This
makes a difference when reading `err` in the `defer` block, because if
we ever return with an error without also setting `err` to a non-nil
value, the `defer` statement which releases the IPv6 address would not
release it.

At the moment, the implementation never returns an error without also
setting `err` explicitly, but moving the variable definition into the
function signature ensures that future code changes may not introduce
subtle bugs.

See the following example for details: https://go.dev/play/p/3l7FDg5k6yP

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit introduces the IPAM metadata manager to the IPAM subsystem.
When allocating a new random IP via `AllocateNext` with
an empty pool name, the IPAM subsystem will now call into the IPAM
metadata manager to obtain the pool name for the IP owner. If a pool is
provided by the caller, then no lookup is performed and the caller's
pool takes precedence.

For `ReleaseIP` and `AllocateIP`, we now also expect the IP pool to be
explicitly provided. `AllocateIP` is only used for IP restoration, in
which case we expect the caller to know what pool to restore the IP from
(because restoring the IP from a different pool than it was originally
allocated from might lead to bugs). For `ReleaseIP`, we don't actually
know the owner name, so we also expect it to be provided, because again,
releasing an IP into a different pool than then one it was allocated
from might lead to bugs.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro force-pushed the pr/gandro/multi-pool-annotations branch from 368278d to 580382f Compare May 30, 2023 09:54
@gandro
Copy link
Member Author

gandro commented May 30, 2023

Rebased to resolve trivial conflict in cells.go. @lmb Could you re-review please?

@gandro gandro removed the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label May 30, 2023
Copy link
Contributor

@lmb lmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! I was out for a long weekend, sorry.

@gandro
Copy link
Member Author

gandro commented May 30, 2023

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 30, 2023
@gandro
Copy link
Member Author

gandro commented May 30, 2023

This is ready to merge.

@gandro gandro merged commit 2588462 into cilium:main May 30, 2023
62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ipam Impacts IP address management functionality. kind/feature This introduces new functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/ipam IP address management, including cloud IPAM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants