Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-pool: Support allocating from new IPAM pools on demand #25765

Merged
merged 6 commits into from Jun 1, 2023

Conversation

gandro
Copy link
Member

@gandro gandro commented May 30, 2023

This PR adds support for dynamically allocating IPs from a new pool. Previously, we required the user to specify in advance what pool the agent would allocate from. This however is very limiting, since it's often not known in advance what workload a particular node will run.

Therefore, this PR adds a mechanism where upon observing a allocation attempt for a new pool, we update the local CiliumNode CRD to request additional IPs for that pool. This then allows the allocation request to be successful upon retry. This retry mechanism is similar to how Cilium IPAM also works e.g. in ENI mode (where if the local node is out of IPs, we fail the CNI ADD request and expect kubelet to re-try again later).

This PR also implicitly adds support for allocating from pools without a "pre-allocate" value. This allows users to configure very small pools for specific workloads and assign those few IPs only if they are actually used.

ℹ️ Review per commit. The first few commits are mainly cleanup, the main logic is contained in the last two commits

For further context, see:

CI and documentation will be added in a separate PR. See meta issue #25470

@gandro gandro added release-note/misc This PR makes changes that have no direct user impact. sig/ipam IP address management, including cloud IPAM release-blocker/1.14 This issue will prevent the release of the next version of Cilium. area/ipam Impacts IP address management functionality. labels May 30, 2023
@gandro gandro requested review from a team as code owners May 30, 2023 14:01
@gandro gandro mentioned this pull request May 30, 2023
29 tasks
@gandro
Copy link
Member Author

gandro commented May 30, 2023

/test

Copy link
Member

@meyskens meyskens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me, one tiny nit :)
Functionality I am not the best person for to ask so if a better expert has time to also review 😄

option.BindEnv(Vp, option.IPAMMultiPoolNodePreAlloc)
flags.Var(option.NewNamedMapOptions(option.IPAMMultiPoolPreAllocation, &option.Config.IPAMMultiPoolPreAllocation, nil),
option.IPAMMultiPoolPreAllocation,
fmt.Sprintf("Defines how the minimum number of IPs a node should pre-allocate from each pool (default %s)", defaults.IPAMMultiPoolPreAllocation))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is a string might %q not be more cleat to the user? (default default=8) vs (default "default=8")

Copy link
Member Author

@gandro gandro May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yes, I noticed this too, and I am torn on this. It's not really a string, it's a map. But option.NewNamedMapOptions does not allow me to define a default map, which is why I'm setting a default "string" value (which is later parsed as a map).

I've went back and forth between %s and %q. I ended up with %s, because other non-string types (like e.g. integers, durations and booleans) don't use quotes either:

--allocator-list-timeout duration Timeout for listing allocator state before exiting (default 3m0s)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note here is that you don't need to explicitly mention the default here because flags does it for you, AFAIR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, for some reason it doesn't here. See the generated docs, there is no duplicate default.

@squeed
Copy link
Contributor

squeed commented May 31, 2023

One thought occurs to me: is the owner a pod or the CNI container ID?

If it's a container ID, then that is unique for every CNI ADD (even retries), so we will have a churn issue. If it's a pod, then we're OK.

@gandro
Copy link
Member Author

gandro commented May 31, 2023

One thought occurs to me: is the owner a pod or the CNI container ID?

Good point. It's the pod:

podName := string(cniArgs.K8S_POD_NAMESPACE) + "/" + string(cniArgs.K8S_POD_NAME)

pkg/ipam/multipool.go Outdated Show resolved Hide resolved
@maintainer-s-little-helper maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels May 31, 2023
@gandro gandro removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 1, 2023
This commit contains no functional changes. It changes the function
signature of a few internal functions to use the `ipam.Pool` type
instead of a string.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Before this commit, agent would sort the list of requested and allocated
pools in reserve alphabetical order. This was not intended, as the
operator sorts the list of allocated pools in alphabetical order. This
commit fixes that by actually using the "less" operator for the "less"
comparison function in `sort.Slice`.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit fixes the sort order of IPv4 vs IPv6 CIDRs in the multipool
IPAM unit tests. This change is purely cosmetic and has no impact on
test correctness, but the new order relects the sort order also used by
the operator when it populates that field.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit renames and exposes the flag to configure how many IPs a
node should pre-allocate from each pool. The flag was previously hidden
due to uncertainties of how IPAM pools should be configured. But since
then, it has become clear that it does make sense for certain workloads
to pre-allocate, while for others, dynamic allocation makes more sense.

Therefore, this commit exposes a way to configure Cilium to always
pre-allocate IPs for pools listed in this option. Pools not listed in
this option will only request as many IPs as are actually used, which
will be added in a subsequent commit.

The default value of `default=8` means that (unless specified
otherwise), we will always require at least 8 spare IPs in the default
pool. This matches the default behavior of cloud-based IPAM modes, c.f.
`defaults.IPAMPreAllocation`, which is also set to 8.

This commit also slightly cleans up the usage of the flag in
`multipool.go` for upcoming commits.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit adds a new `pendingAllocationsPerPool` object, which allows
the caller to track which IP owners have pending (i.e. requested but not
yet fulfilled) allocations.

This will be used in the next commit to compute the number of IPs the
agent should request from the operator.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit rewrites the way we compute the number of requested IPs for
each pool. Previously, we required the user to specify in advance what
pool the agent would allocate from. This however is very limiting, since
it's often not known in advance what workload a particular node will
run.

Therefore, this commit adds a mechanism where upon observing a
allocation attempt for a new pool, we update the local CiliumNode CRD to
request additional IPs for that pool. This then allows the allocation
request to be successful upon retry. This retry mechanism is similar to
how Cilium IPAM also works e.g. in ENI mode (where if the local node is
out of IPs, we fail the CNI ADD request and expect kubelet to re-try
again later).

This PR also implicitly adds support for allocating from pools without a
"pre-allocate" value. This allows users to configure very small pools
for specific workloads and assign those few IPs only if they are
actually used.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 1, 2023
@gandro gandro force-pushed the pr/gandro/multipool-dynamic-prealloc branch from 9533c25 to 6bff6cc Compare June 1, 2023 08:26
@gandro
Copy link
Member Author

gandro commented Jun 1, 2023

Addressed the feedback and rebased. No functional changes:

diff --git a/pkg/ipam/multipool.go b/pkg/ipam/multipool.go
index 18439b361e86..9b443dd9a2be 100644
--- a/pkg/ipam/multipool.go
+++ b/pkg/ipam/multipool.go
@@ -352,11 +352,11 @@ func neededIPCeil(numIP int, preAlloc int) int {
 // computeNeededIPsPerPoolLocked computes how many IPs we want to request from
 // the operator for each pool. The formula we use for each pool is basically
 //
-//       neededIPs = roundUp(inUseIPs + pendingIPs + preAllocIPs, preAlloc)
+//     neededIPs = roundUp(inUseIPs + pendingIPs + preAllocIPs, preAllocIPs)
 //
-//               inUseIPs              Number of IPs that are currently actively in use
-//               pendingIPs    Number of IPs that have been requested, but not yet assigned
-//               preAllocIPs   Minimum number of IPs that we want to pre-allocate as a buffer
+//           inUseIPs      Number of IPs that are currently actively in use
+//           pendingIPs    Number of IPs that have been requested, but not yet assigned
+//           preAllocIPs   Minimum number of IPs that we want to pre-allocate as a buffer
 //
 // Rounded up to the next multiple of preAllocIPs.
 func (m *multiPoolManager) computeNeededIPsPerPoolLocked() map[Pool]types.IPAMPoolDemand {
@@ -596,17 +596,11 @@ func (m *multiPoolManager) poolByFamilyLocked(poolName Pool, family Family) *pod
        return nil
 }
 
-func (m *multiPoolManager) allocateNext(owner string, poolName Pool, family Family, syncUpstream bool) (result *AllocationResult, err error) {
+func (m *multiPoolManager) allocateNext(owner string, poolName Pool, family Family, syncUpstream bool) (*AllocationResult, error) {
        m.mutex.Lock()
        defer m.mutex.Unlock()
 
        defer func() {
-               if err != nil {
-                       m.pendingIPsPerPool.upsertPendingAllocation(poolName, owner, family)
-               } else {
-                       m.pendingIPsPerPool.markAsAllocated(poolName, owner, family)
-               }
-
                if syncUpstream {
                        m.k8sUpdater.TriggerWithReason("allocation of next IP")
                }
@@ -614,27 +608,25 @@ func (m *multiPoolManager) allocateNext(owner string, poolName Pool, family Fami
 
        pool := m.poolByFamilyLocked(poolName, family)
        if pool == nil {
+               m.pendingIPsPerPool.upsertPendingAllocation(poolName, owner, family)
                return nil, fmt.Errorf("unable to allocate from pool %q (family %s): pool not (yet) available", poolName, family)
        }
 
        ip, err := pool.allocateNext()
        if err != nil {
+               m.pendingIPsPerPool.upsertPendingAllocation(poolName, owner, family)
                return nil, err
        }
+
+       m.pendingIPsPerPool.markAsAllocated(poolName, owner, family)
        return &AllocationResult{IP: ip, IPPoolName: poolName}, nil
 }
 
-func (m *multiPoolManager) allocateIP(ip net.IP, owner string, poolName Pool, family Family, syncUpstream bool) (result *AllocationResult, err error) {
+func (m *multiPoolManager) allocateIP(ip net.IP, owner string, poolName Pool, family Family, syncUpstream bool) (*AllocationResult, error) {
        m.mutex.Lock()
        defer m.mutex.Unlock()
 
        defer func() {
-               if err != nil {
-                       m.pendingIPsPerPool.upsertPendingAllocation(poolName, owner, family)
-               } else {
-                       m.pendingIPsPerPool.markAsAllocated(poolName, owner, family)
-               }
-
                if syncUpstream {
                        m.k8sUpdater.TriggerWithReason("allocation of specific IP")
                }
@@ -642,13 +634,17 @@ func (m *multiPoolManager) allocateIP(ip net.IP, owner string, poolName Pool, fa
 
        pool := m.poolByFamilyLocked(poolName, family)
        if pool == nil {
+               m.pendingIPsPerPool.upsertPendingAllocation(poolName, owner, family)
                return nil, fmt.Errorf("unable to reserve IP %s from pool %q (family %s): pool not (yet) available", ip, poolName, family)
        }
 
-       err = pool.allocate(ip)
+       err := pool.allocate(ip)
        if err != nil {
+               m.pendingIPsPerPool.upsertPendingAllocation(poolName, owner, family)
                return nil, err
        }
+
+       m.pendingIPsPerPool.markAsAllocated(poolName, owner, family)
        return &AllocationResult{IP: ip, IPPoolName: poolName}, nil
 }

@gandro
Copy link
Member Author

gandro commented Jun 1, 2023

/test

@gandro gandro merged commit ea63399 into cilium:main Jun 1, 2023
61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ipam Impacts IP address management functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. release-note/misc This PR makes changes that have no direct user impact. sig/ipam IP address management, including cloud IPAM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants