Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Antrea-IPAM] Support pre-allocating continuous IPs for StatefulSet #3281

Merged
merged 2 commits into from Mar 26, 2022

Conversation

annakhm
Copy link
Contributor

@annakhm annakhm commented Feb 8, 2022

In order to provide better user experience, AtreamIPAM will try to
preallocate continuous IP range for StatefulSet. If unsuccesful,
IPs for the StatfulSet will be allocated on the fly, as before.

Signed-off-by: Anna Khmelnitsky akhmelnitsky@vmware.com

@annakhm annakhm marked this pull request as draft February 8, 2022 01:31
@codecov-commenter
Copy link

codecov-commenter commented Feb 8, 2022

Codecov Report

Merging #3281 (00dfb7e) into main (2dcc257) will decrease coverage by 11.92%.
The diff coverage is 62.85%.

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #3281       +/-   ##
===========================================
- Coverage   65.51%   53.58%   -11.93%     
===========================================
  Files         277      392      +115     
  Lines       27500    43040    +15540     
===========================================
+ Hits        18016    23063     +5047     
- Misses       7567    17655    +10088     
- Partials     1917     2322      +405     
Flag Coverage Δ
integration-tests 38.23% <ø> (?)
kind-e2e-tests 47.80% <0.00%> (-8.03%) ⬇️
unit-tests 43.36% <62.85%> (+0.32%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/ipam/poolallocator/allocator.go 48.87% <14.28%> (-1.13%) ⬇️
pkg/controller/ipam/antrea_ipam_controller.go 74.13% <65.51%> (-6.15%) ⬇️
pkg/agent/cniserver/ipam/antrea_ipam_controller.go 80.00% <100.00%> (ø)
pkg/apis/controlplane/types.go 0.00% <0.00%> (-100.00%) ⬇️
...g/agent/apiserver/handlers/featuregates/handler.go 0.00% <0.00%> (-82.36%) ⬇️
pkg/controller/networkpolicy/store/group.go 5.00% <0.00%> (-70.00%) ⬇️
pkg/apis/controlplane/helper.go 33.33% <0.00%> (-66.67%) ⬇️
pkg/controller/networkpolicy/convert.go 0.00% <0.00%> (-64.52%) ⬇️
pkg/controller/egress/store/egressgroup.go 1.38% <0.00%> (-58.34%) ⬇️
...kg/apiserver/registry/system/supportbundle/rest.go 20.45% <0.00%> (-54.55%) ⬇️
... and 170 more

@annakhm annakhm force-pushed the antreaipam-preallocate branch 5 times, most recently from bdf284c to bfe3f41 Compare February 14, 2022 21:07
@annakhm annakhm marked this pull request as ready for review February 15, 2022 01:44
build/yamls/base/controller-rbac.yml Outdated Show resolved Hide resolved
pkg/ipam/poolallocator/allocator.go Outdated Show resolved Hide resolved
@annakhm
Copy link
Contributor Author

annakhm commented Feb 16, 2022

/test-all

@gran-vmv gran-vmv self-requested a review February 17, 2022 09:13
gran-vmv
gran-vmv previously approved these changes Feb 17, 2022
Copy link
Contributor

@gran-vmv gran-vmv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gran-vmv
Copy link
Contributor

/test-integration
/test-flexible-ipam-e2e

@annakhm
Copy link
Contributor Author

annakhm commented Feb 18, 2022

Would appreciate feedback - we need to distinguish between IPAddressPhaseReserved and v1alpha2.IPAddressPhasePreallocated, or is it better to combine them?

@gran-vmv
Copy link
Contributor

Would appreciate feedback - we need to distinguish between IPAddressPhaseReserved and v1alpha2.IPAddressPhasePreallocated, or is it better to combine them?

@annakhm Please use IPAddressPhaseReserved, and IPAddressPhasePreallocated is deprecated.

@gran-vmv gran-vmv self-requested a review February 18, 2022 01:00
@gran-vmv gran-vmv self-requested a review February 18, 2022 01:14
@gran-vmv gran-vmv dismissed their stale review February 18, 2022 01:19

Some change needed.

@annakhm
Copy link
Contributor Author

annakhm commented Feb 18, 2022

/test-all

gran-vmv
gran-vmv previously approved these changes Feb 21, 2022
Copy link
Contributor

@gran-vmv gran-vmv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not do a thorough review of this PR, but I would trust existing reviews. So, you need not my approval to merge it.

@annakhm
Copy link
Contributor Author

annakhm commented Mar 23, 2022

/test-all

@gran-vmv
Copy link
Contributor

All e2e failed.

@gran-vmv
Copy link
Contributor

/test-e2e

@gran-vmv
Copy link
Contributor

/test-flexible-ipam-e2e

gran-vmv
gran-vmv previously approved these changes Mar 24, 2022
Copy link
Contributor

@gran-vmv gran-vmv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gran-vmv
Copy link
Contributor

/test-windows-all
/test-ipv6-all
/test-ipv6-only-all
/test-multicluster-e2e
/test-integration

@gran-vmv
Copy link
Contributor

/test-flexible-ipam-e2e

@gran-vmv
Copy link
Contributor

/test-windows-proxyall-e2e

key := k8s.NamespacedName(ss.Namespace, ss.Name)

c.statefulSetMap.Store(key, ss)
c.statefulSetQueue.Add(addEventIndication + key)
Copy link
Member

@tnqn tnqn Mar 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @jianjuns. And there are some other potential issues:

  1. workqueue is meaningless here if it just needs a FIFO, in which case channel is more proper.
  2. As the key in workqueue has a "a" or "d" prefix, one object's add and delete event will be two items in workqueue, then it's not possible to increase the number of workers that handle statefulSet, which may not scale well.
  3. It cannot support replicas number change, as processNextStatefulSetWorkItem processes the statefulset using the "snapshot" of its spec.
  4. As there is only single worker today, processing the statefulSet based on "snapshot" is more error-prone. For example, if multiple statefulSets are recreated, the worker has to process deletion of the statefulSets serially, then their creation, during which Pods of the new statefulSets may have been created, the work may release the IP allocations by mistake when it handles the deletion event of a statefulSet, leading to duplicate allocation.
  5. With single worker, creation of multiple statefulSets may be handled slower than kubelet creating Pods, leading to many discontinuous IP allocation in practice.
  6. As your comment points out, transient errors when processing delete event will lead to IP leak and can only be fixed until controller restarts, which is not very robust.

All controllers need to consider the recreate case, and the typical controller pattern still works for them. I think it's appliable to IPAM as well.

What the controller does is to ensure statefulSet's replica == number of continuous IPs in IPPool, and we don't really care UID of statefulset. Then even after the statefulSet is deleted and recreated, we don't really need to delete all IP allocations from the IPPool first then re-allocate.

It should work if implementing the controller in this way:

  1. The event handlers only need to enqueue the key (namespace + name) of the statefulSet to the queue, regardless of it's add/update/delete events (some update events can be ingored if fields we care don't change).
  2. The worker gets the key from the queue, checks the latest spec in the lister.
  • If the statefulSet doesn't exist, it cleans up all IP allocation of this statefulSet.
  • If the statefulSet exists, it allocates or releases IPs to make sure the number of IP allocation equal to the number of replicas. This is quite similar to ReplicaSetController which ensures the number of Pods equal to the number of ReplicaSet's replica.

In this way, it can run multiple workers to process statefulSet concurrently, can handle replica number change, and prevents introducing another storage and potential race condition coming with it.

pkg/controller/ipam/antrea_ipam_controller.go Outdated Show resolved Hide resolved
@annakhm
Copy link
Contributor Author

annakhm commented Mar 24, 2022

Thanks @tnqn and @jianjuns for your comments, I'm refactoring the change based on those recommendations.

In order to provide better user experience, AtreamIPAM will try to
reserve continuous IP range for StatefulSet. If unsuccesful,
IPs for the StatfulSet will be allocated on the fly, as before.

Signed-off-by: Anna Khmelnitsky <akhmelnitsky@vmware.com>
@annakhm
Copy link
Contributor Author

annakhm commented Mar 24, 2022

/test-all

@annakhm
Copy link
Contributor Author

annakhm commented Mar 24, 2022

/test-flexible-ipam-e2e

pkg/controller/ipam/antrea_ipam_controller.go Show resolved Hide resolved
}

size := int(*ss.Spec.Replicas)
err = allocator.AllocateStatefulSet(ss.Namespace, ss.Name, size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we checked no IP allocated for this StatefulSet already?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And add comments somewhere to indicate we do not support IPPool changes.

Copy link
Contributor Author

@annakhm annakhm Mar 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AllocateStatefulSet has this check inside: IPs will not be preallocated if this StatefulSet is already present in the pool: https://github.com/antrea-io/antrea/pull/3281/files#diff-5e4e20e1e087d03ded8c46066bd42db715fc14fa73a9e4ac6fcfeab83e8d7f4cR412

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And could we also add a comment about this then? It is just easier to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a note in docs that explains that IPPool annotation can not be changed without recreating the resource: Note that the IP pool annotation cannot be updated or deleted without recreating the resource. We should add this to validation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is great. But still feel better to add a comment in the code, so reviewers/readers like me can easily understand why we do not check if pool changes or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment in code

@annakhm annakhm marked this pull request as draft March 24, 2022 23:42
@annakhm
Copy link
Contributor Author

annakhm commented Mar 25, 2022

Thank you again @tnqn and @jianjuns, indeed when refactored according to your suggestions, the code is more straightforward and less prone to race conditions. @tnqn please note that since the goal is allocation of continuous IP range, today we don't resize preallocations based on replica number. Scaling up is unlikely to succeed (since next IP is likely to be taken), scaling down is possible though.

@annakhm annakhm marked this pull request as ready for review March 25, 2022 00:03
@annakhm
Copy link
Contributor Author

annakhm commented Mar 25, 2022

/test-flexible-ipam-e2e

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, some minor comments

pkg/controller/ipam/antrea_ipam_controller_test.go Outdated Show resolved Hide resolved
pkg/controller/ipam/antrea_ipam_controller_test.go Outdated Show resolved Hide resolved
pkg/controller/ipam/antrea_ipam_controller.go Outdated Show resolved Hide resolved
pkg/controller/ipam/antrea_ipam_controller.go Outdated Show resolved Hide resolved
pkg/controller/ipam/antrea_ipam_controller.go Outdated Show resolved Hide resolved
Signed-off-by: Anna Khmelnitsky <akhmelnitsky@vmware.com>
@annakhm
Copy link
Contributor Author

annakhm commented Mar 25, 2022

/test-all

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Mar 26, 2022

/test-flexible-ipam-e2e

@tnqn tnqn merged commit 739b8a1 into antrea-io:main Mar 26, 2022
@tnqn tnqn added the action/release-note Indicates a PR that should be included in release notes. label Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants