multi-pool: Support allocating from new IPAM pools on demand #25765

gandro · 2023-05-30T14:01:39Z

This PR adds support for dynamically allocating IPs from a new pool. Previously, we required the user to specify in advance what pool the agent would allocate from. This however is very limiting, since it's often not known in advance what workload a particular node will run.

Therefore, this PR adds a mechanism where upon observing a allocation attempt for a new pool, we update the local CiliumNode CRD to request additional IPs for that pool. This then allows the allocation request to be successful upon retry. This retry mechanism is similar to how Cilium IPAM also works e.g. in ENI mode (where if the local node is out of IPs, we fail the CNI ADD request and expect kubelet to re-try again later).

This PR also implicitly adds support for allocating from pools without a "pre-allocate" value. This allows users to configure very small pools for specific workloads and assign those few IPs only if they are actually used.

ℹ️ Review per commit. The first few commits are mainly cleanup, the main logic is contained in the last two commits

For further context, see:

CI and documentation will be added in a separate PR. See meta issue #25470

gandro · 2023-05-30T16:32:59Z

/test

meyskens

Code looks good to me, one tiny nit :)
Functionality I am not the best person for to ask so if a better expert has time to also review 😄

meyskens · 2023-05-31T07:57:31Z

daemon/cmd/daemon_main.go

-	option.BindEnv(Vp, option.IPAMMultiPoolNodePreAlloc)
+	flags.Var(option.NewNamedMapOptions(option.IPAMMultiPoolPreAllocation, &option.Config.IPAMMultiPoolPreAllocation, nil),
+		option.IPAMMultiPoolPreAllocation,
+		fmt.Sprintf("Defines how the minimum number of IPs a node should pre-allocate from each pool (default %s)", defaults.IPAMMultiPoolPreAllocation))


Given this is a string might %q not be more cleat to the user? (default default=8) vs (default "default=8")

Thanks! Yes, I noticed this too, and I am torn on this. It's not really a string, it's a map. But option.NewNamedMapOptions does not allow me to define a default map, which is why I'm setting a default "string" value (which is later parsed as a map).

I've went back and forth between %s and %q. I ended up with %s, because other non-string types (like e.g. integers, durations and booleans) don't use quotes either:

cilium/Documentation/cmdref/cilium-agent.md

Line 17 in d137d83

--allocator-list-timeout duration Timeout for listing allocator state before exiting (default 3m0s)

Another note here is that you don't need to explicitly mention the default here because flags does it for you, AFAIR.

Unfortunately, for some reason it doesn't here. See the generated docs, there is no duplicate default.

pkg/ipam/multipool.go

squeed · 2023-05-31T09:40:58Z

One thought occurs to me: is the owner a pod or the CNI container ID?

If it's a container ID, then that is unique for every CNI ADD (even retries), so we will have a churn issue. If it's a pod, then we're OK.

gandro · 2023-05-31T10:36:31Z

One thought occurs to me: is the owner a pod or the CNI container ID?

Good point. It's the pod:

cilium/plugins/cilium-cni/main.go

Line 121 in 2588462

    
           podName := string(cniArgs.K8S_POD_NAMESPACE) + "/" + string(cniArgs.K8S_POD_NAME)

pkg/ipam/multipool.go

This commit contains no functional changes. It changes the function signature of a few internal functions to use the `ipam.Pool` type instead of a string. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

Before this commit, agent would sort the list of requested and allocated pools in reserve alphabetical order. This was not intended, as the operator sorts the list of allocated pools in alphabetical order. This commit fixes that by actually using the "less" operator for the "less" comparison function in `sort.Slice`. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

This commit fixes the sort order of IPv4 vs IPv6 CIDRs in the multipool IPAM unit tests. This change is purely cosmetic and has no impact on test correctness, but the new order relects the sort order also used by the operator when it populates that field. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

This commit renames and exposes the flag to configure how many IPs a node should pre-allocate from each pool. The flag was previously hidden due to uncertainties of how IPAM pools should be configured. But since then, it has become clear that it does make sense for certain workloads to pre-allocate, while for others, dynamic allocation makes more sense. Therefore, this commit exposes a way to configure Cilium to always pre-allocate IPs for pools listed in this option. Pools not listed in this option will only request as many IPs as are actually used, which will be added in a subsequent commit. The default value of `default=8` means that (unless specified otherwise), we will always require at least 8 spare IPs in the default pool. This matches the default behavior of cloud-based IPAM modes, c.f. `defaults.IPAMPreAllocation`, which is also set to 8. This commit also slightly cleans up the usage of the flag in `multipool.go` for upcoming commits. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

This commit adds a new `pendingAllocationsPerPool` object, which allows the caller to track which IP owners have pending (i.e. requested but not yet fulfilled) allocations. This will be used in the next commit to compute the number of IPs the agent should request from the operator. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

This commit rewrites the way we compute the number of requested IPs for each pool. Previously, we required the user to specify in advance what pool the agent would allocate from. This however is very limiting, since it's often not known in advance what workload a particular node will run. Therefore, this commit adds a mechanism where upon observing a allocation attempt for a new pool, we update the local CiliumNode CRD to request additional IPs for that pool. This then allows the allocation request to be successful upon retry. This retry mechanism is similar to how Cilium IPAM also works e.g. in ENI mode (where if the local node is out of IPs, we fail the CNI ADD request and expect kubelet to re-try again later). This PR also implicitly adds support for allocating from pools without a "pre-allocate" value. This allows users to configure very small pools for specific workloads and assign those few IPs only if they are actually used. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

gandro · 2023-06-01T08:26:49Z

Addressed the feedback and rebased. No functional changes:

diff --git a/pkg/ipam/multipool.go b/pkg/ipam/multipool.go
index 18439b361e86..9b443dd9a2be 100644
--- a/pkg/ipam/multipool.go
+++ b/pkg/ipam/multipool.go
@@ -352,11 +352,11 @@ func neededIPCeil(numIP int, preAlloc int) int {
 // computeNeededIPsPerPoolLocked computes how many IPs we want to request from
 // the operator for each pool. The formula we use for each pool is basically
 //
-//       neededIPs = roundUp(inUseIPs + pendingIPs + preAllocIPs, preAlloc)
+//     neededIPs = roundUp(inUseIPs + pendingIPs + preAllocIPs, preAllocIPs)
 //
-//               inUseIPs              Number of IPs that are currently actively in use
-//               pendingIPs    Number of IPs that have been requested, but not yet assigned
-//               preAllocIPs   Minimum number of IPs that we want to pre-allocate as a buffer
+//           inUseIPs      Number of IPs that are currently actively in use
+//           pendingIPs    Number of IPs that have been requested, but not yet assigned
+//           preAllocIPs   Minimum number of IPs that we want to pre-allocate as a buffer
 //
 // Rounded up to the next multiple of preAllocIPs.
 func (m *multiPoolManager) computeNeededIPsPerPoolLocked() map[Pool]types.IPAMPoolDemand {
@@ -596,17 +596,11 @@ func (m *multiPoolManager) poolByFamilyLocked(poolName Pool, family Family) *pod
        return nil
 }
 
-func (m *multiPoolManager) allocateNext(owner string, poolName Pool, family Family, syncUpstream bool) (result *AllocationResult, err error) {
+func (m *multiPoolManager) allocateNext(owner string, poolName Pool, family Family, syncUpstream bool) (*AllocationResult, error) {
        m.mutex.Lock()
        defer m.mutex.Unlock()
 
        defer func() {
-               if err != nil {
-                       m.pendingIPsPerPool.upsertPendingAllocation(poolName, owner, family)
-               } else {
-                       m.pendingIPsPerPool.markAsAllocated(poolName, owner, family)
-               }
-
                if syncUpstream {
                        m.k8sUpdater.TriggerWithReason("allocation of next IP")
                }
@@ -614,27 +608,25 @@ func (m *multiPoolManager) allocateNext(owner string, poolName Pool, family Fami
 
        pool := m.poolByFamilyLocked(poolName, family)
        if pool == nil {
+               m.pendingIPsPerPool.upsertPendingAllocation(poolName, owner, family)
                return nil, fmt.Errorf("unable to allocate from pool %q (family %s): pool not (yet) available", poolName, family)
        }
 
        ip, err := pool.allocateNext()
        if err != nil {
+               m.pendingIPsPerPool.upsertPendingAllocation(poolName, owner, family)
                return nil, err
        }
+
+       m.pendingIPsPerPool.markAsAllocated(poolName, owner, family)
        return &AllocationResult{IP: ip, IPPoolName: poolName}, nil
 }
 
-func (m *multiPoolManager) allocateIP(ip net.IP, owner string, poolName Pool, family Family, syncUpstream bool) (result *AllocationResult, err error) {
+func (m *multiPoolManager) allocateIP(ip net.IP, owner string, poolName Pool, family Family, syncUpstream bool) (*AllocationResult, error) {
        m.mutex.Lock()
        defer m.mutex.Unlock()
 
        defer func() {
-               if err != nil {
-                       m.pendingIPsPerPool.upsertPendingAllocation(poolName, owner, family)
-               } else {
-                       m.pendingIPsPerPool.markAsAllocated(poolName, owner, family)
-               }
-
                if syncUpstream {
                        m.k8sUpdater.TriggerWithReason("allocation of specific IP")
                }
@@ -642,13 +634,17 @@ func (m *multiPoolManager) allocateIP(ip net.IP, owner string, poolName Pool, fa
 
        pool := m.poolByFamilyLocked(poolName, family)
        if pool == nil {
+               m.pendingIPsPerPool.upsertPendingAllocation(poolName, owner, family)
                return nil, fmt.Errorf("unable to reserve IP %s from pool %q (family %s): pool not (yet) available", ip, poolName, family)
        }
 
-       err = pool.allocate(ip)
+       err := pool.allocate(ip)
        if err != nil {
+               m.pendingIPsPerPool.upsertPendingAllocation(poolName, owner, family)
                return nil, err
        }
+
+       m.pendingIPsPerPool.markAsAllocated(poolName, owner, family)
        return &AllocationResult{IP: ip, IPPoolName: poolName}, nil
 }

gandro · 2023-06-01T08:35:05Z

/test

gandro requested review from a team as code owners May 30, 2023 14:01

gandro requested review from meyskens, squeed, christarazi and tklauser May 30, 2023 14:01

gandro mentioned this pull request May 30, 2023

IPAM Multi-Pool Follow-ups #25470

Closed

29 tasks

meyskens approved these changes May 31, 2023

View reviewed changes

squeed reviewed May 31, 2023

View reviewed changes

pkg/ipam/multipool.go Show resolved Hide resolved

squeed approved these changes May 31, 2023

View reviewed changes

tklauser approved these changes May 31, 2023

View reviewed changes

christarazi approved these changes May 31, 2023

View reviewed changes

pkg/ipam/multipool.go Outdated Show resolved Hide resolved

maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels May 31, 2023

gandro removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 1, 2023

gandro added 6 commits June 1, 2023 10:25

ipam/multipool: Consistenly use Pool type

52ec016

This commit contains no functional changes. It changes the function signature of a few internal functions to use the `ipam.Pool` type instead of a string. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 1, 2023

gandro force-pushed the pr/gandro/multipool-dynamic-prealloc branch from 9533c25 to 6bff6cc Compare June 1, 2023 08:26

gandro merged commit ea63399 into cilium:main Jun 1, 2023
61 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi-pool: Support allocating from new IPAM pools on demand #25765

multi-pool: Support allocating from new IPAM pools on demand #25765

gandro commented May 30, 2023

gandro commented May 30, 2023

meyskens left a comment

meyskens May 31, 2023

gandro May 31, 2023 •

edited

christarazi May 31, 2023

gandro Jun 1, 2023

squeed commented May 31, 2023

gandro commented May 31, 2023

gandro commented Jun 1, 2023

gandro commented Jun 1, 2023

multi-pool: Support allocating from new IPAM pools on demand #25765

multi-pool: Support allocating from new IPAM pools on demand #25765

Conversation

gandro commented May 30, 2023

gandro commented May 30, 2023

meyskens left a comment

Choose a reason for hiding this comment

meyskens May 31, 2023

Choose a reason for hiding this comment

gandro May 31, 2023 • edited

Choose a reason for hiding this comment

christarazi May 31, 2023

Choose a reason for hiding this comment

gandro Jun 1, 2023

Choose a reason for hiding this comment

squeed commented May 31, 2023

gandro commented May 31, 2023

gandro commented Jun 1, 2023

gandro commented Jun 1, 2023

gandro May 31, 2023 •

edited