Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipam: Remove cluster-pool-v2beta code #27753

Merged
merged 5 commits into from
Sep 7, 2023

Conversation

gandro
Copy link
Member

@gandro gandro commented Aug 28, 2023

This IPAM mode has been deprecated in Cilium v1.14 and has been superseeded by multi-pool IPAM instead. This commit removes all code related to cluster-pool-v2beta from Cilium.

@gandro gandro added kind/cleanup This includes no functional changes. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. area/ipam Impacts IP address management functionality. labels Aug 28, 2023
@maintainer-s-little-helper

This comment was marked as resolved.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Aug 28, 2023
@gandro gandro force-pushed the pr/gandro/remove-clusterpool-v2beta branch from 842f1fe to 4a0f499 Compare August 28, 2023 16:15
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Aug 28, 2023
@gandro gandro force-pushed the pr/gandro/remove-clusterpool-v2beta branch 6 times, most recently from 3740037 to 98de88d Compare August 29, 2023 08:54
@gandro
Copy link
Member Author

gandro commented Aug 29, 2023

/test

Edit:

@gandro gandro requested a review from tklauser August 29, 2023 10:51
@gandro gandro marked this pull request as ready for review August 29, 2023 10:51
@gandro gandro requested review from a team as code owners August 29, 2023 10:51
Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for nicely splitting these up into commits which made it a lot easier to review and verify!

One non-blocking minor nit inline.

pkg/ipam/pool.go Show resolved Hide resolved
@gandro gandro force-pushed the pr/gandro/remove-clusterpool-v2beta branch 2 times, most recently from bb7e36c to 7a5cb7c Compare August 29, 2023 13:06
@gandro
Copy link
Member Author

gandro commented Aug 29, 2023

/test

@gandro gandro added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Aug 30, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 30, 2023
Copy link
Member

@kaworu kaworu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helm changes LGTM

@gandro
Copy link
Member Author

gandro commented Aug 30, 2023

/ci-ingress

@gandro
Copy link
Member Author

gandro commented Aug 30, 2023

/ci-runtime

@gandro
Copy link
Member Author

gandro commented Aug 30, 2023

/ci-integration

@gandro
Copy link
Member Author

gandro commented Aug 30, 2023

Had to re-trigger some tests manually due to changed workflow names.

Copy link
Contributor

@zacharysarah zacharysarah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for docs

@gandro
Copy link
Member Author

gandro commented Sep 4, 2023

@tommyp1ckles Ping for review

Copy link
Contributor

@tommyp1ckles tommyp1ckles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k8s changes look good

@gandro
Copy link
Member Author

gandro commented Sep 7, 2023

CI is stuck because the PR has been open for a while and new tests have been introduce. I'll rebase to be able to re-run CI

This commit moves various functions used by the multipool.go
implementation out of clusterpool.go. This will allow us to remove the
clusterpool.go file in a subsequent commit.

This commit contains no functional changes.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This IPAM mode has been deprecated in Cilium v1.14 and has been
superseded by multi-pool IPAM instead. This commit removes all code
related to cluster-pool-v2beta from the agent.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This mostly reverts commit 3d6b9a8.

The changes in podcidr.go have been restored to the behavior before the
cluster-pool-v2beta code was added.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
They are no longer used.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro force-pushed the pr/gandro/remove-clusterpool-v2beta branch from 26d7a29 to 10599ea Compare September 7, 2023 08:04
@gandro
Copy link
Member Author

gandro commented Sep 7, 2023

@gandro gandro merged commit d89e204 into cilium:main Sep 7, 2023
61 checks passed
@maintainer-s-little-helper maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Sep 7, 2023
@log1cb0mb
Copy link
Contributor

Can we expect a proper upgrade/migration path from cluster-pool-v2beta to multi-pool with v1.15?
It would have been lot better if this: #27100 was included in v1.15 first so that easier to do migration before removing the whole thing.
OR BGP CP multi-pool support should be backported to v1.14?

@gandro
Copy link
Member Author

gandro commented Oct 2, 2023

Unfortunately, BGP for multi-pool cannot be backported to v1.14, due to some other larger refactoring PRs that depend on it. clusterpool-v2beta will only removed in v1.15 - In v1.15 multi-pool does have feature parity the removed clusterpool-v2beta, including BGP support.

Generally speaking, there is no migration guide planned, since clusterpool-v2beta was never recommended for production. However, having said that, I think we would accept contribution which adds some migration documentation and potentially also some helper logic in the agent for example which reads the old clusterpool-v2beta podCIDRs and writes them into the multi-pool spec.

@log1cb0mb
Copy link
Contributor

In that case, can we consider not dropping clusterpool-v2beta from v1.15 so that users have just one thing to deal with. Assuming there is a way to co-exist both IPAM modes?

@gandro
Copy link
Member Author

gandro commented Oct 2, 2023

In that case, can we consider not dropping clusterpool-v2beta from v1.15 so that users have just one thing to deal with. Assuming there is a way to co-exist both IPAM modes?

I don't think simply reverting this PR is a good way forward. The clusterpool-v2beta code had a few of servere bugs (e.g. #24588 or #26621 (the issue mentions multi-pool, but clusterpool-v2beta is equally affected and in v1.15 we are fixing that for multi-pool) and has been blocking some other tech debt removal work. So even reverting this PR will not be an easy task, since we already started follow-up cleanups.

I also don't think it's possible to run in two IPAM modes: There is only one operator, and it only runs one IPAM allocator at a time. It would

To find the best solution, I think it would be useful to get a bit more context on what deployments you have and what problems you want to solve. I'm not opposed to adding code or logic that helps the migration both in v1.14 or v1.15. I can totally envision a migration mode where we make the multi-pool logic backwards-compatible with the clusterpool-v2beta fields, and I think such a solution would be less work and less overall risk than backporting BGP to v1.14 or reverting this PR.

@log1cb0mb
Copy link
Contributor

That makes sense and originally I was looking for "best way" to migrate from v2beta to multi-pool. I understand former was in beta state and now its being deprecated but our use case required its usage.

We are heavily relying on BGP CP so now the problem is in order to migrate to multi-pool which every user must, we need to also consider migration to multi-pool support in BGP CP. I was hoping that we could first take care of migrating to multi-pool within v1.14 provided that BGP CP supports it before we move to v1.15 where v2-beta is removed completely.

Now thinking about it, BGP CP support for multi-pool in v.14 wouldnt have really helped us much as we would have to naturally migrate to using both at the same time regardless.

So I suppose what I am looking for is how do we perform this migration in best way possible without effecting any workload. I am thinking we are going to have to run different versions at the same time where each node (running say v1.15) is migrated to new pool one at a time. Let's say we manage to rollout in this fashion so my concern is if cilium global config is changed based on v1.15 will it effect existing v1.14 agents and not just IPAM functionality but also BGP CP?

@gandro
Copy link
Member Author

gandro commented Oct 2, 2023

Thanks a lot for the clarification!

Let's say we manage to rollout in this fashion so my concern is if cilium global config is changed based on v1.15 will it effect existing v1.14 agents and not just IPAM functionality but also BGP CP?

So yeah, unless Cilium agents are restarted, they will not start to use the new configuration. Only once they are restarted, will they read the newer ConfigMap. If you want to keep certain nodes on an different configuration regardless of restarts, you can use the CiliumNodeConfig, which might be of use here.

I think the bigger challenge however is the fact that clusterpool-v2beta and multi-pool are using different fields for storing the per-node podCIDRs. Basically, as is at the moment, once a node switches over to the new IPAM mode, it will no longer observe the pod CIDRs that it previously had and instead will attempt allocate fresh ones. This will cause disruption in the workload, because existing workloads have not released they IP yet.

The good news however is that I think this should be fixable with some migration logic from v1.14 with clusterpool-v2beta to v1.15 with multi-pool:

Step 1: Prepare a v1.15 configuration, that uses multi-pool IPAM with a default IPAM pool to use the same CIDRs as the --cluster-pool-ipv4/6-{cidr,mask} configuration (Helm: cilium.ipam.operator.clusterPoolIPv4/6{PodCIDRList/MaskSize}) used by clusterpool-v2beta.

Step 2: Mirror all per-node pod CIDRs from all CiliumNode spec.ipam.podCIDRs into the spec.ipam.pools["default"].allocated field. This will make sure that the multi-pool IPAM operator code is aware of the pod CIDRs previously allocated to a node.

Step 3: Update the Operator to Cilium v1.15 with multi-pool IPAM mode. Thanks to the mirrored pod CIDRs, it will make sure that any already allocated pod CIDRs are retained and not re-allocated to any new nodes. At this point, any v1.14 agent running in clusterpool-v2beta mode will continue running, but it will not be able to allocate any new CIDRs or release old ones. While this might hinder new pods from being scheduled if the node is running low on IPs, this will not interrupt any running pods.

Step 4: Update Cilium agents one by one from v1.14 with clusterpool-v2beta to v1.15 with multi-pool. Thanks to the mirrored pod CIDRs, they will remember their old pod CIDRs and restore all endpoints with the old pod CIDR. All this logic is already present in the code.

(Optional) Step 5: If migration is successful and no downgrade is needed, go ahead and clear any spec.ipam.podCIDRs from the CiliumNode. This step is optional, as the field will simply be ignored by all agents and the operator once running in multi-pool IPAM mode.

The only code that needs to be created for this migration plan to work is the code in step 2. This could either be done by a custom shell or python script, or we could add this logic to the Cilium operator. I would accept such a PR.

@log1cb0mb
Copy link
Contributor

Good stuff! Thank you for all the details.

Isnt there a shortcut to all of this, you drain/delete the node and rejoin or bring the node up with no previous CNI config and with v1.15?

This obviously assumes that somehow we can control v1.15 is deployed onto cluster however agent is not rolled out on every node except new joining ones. Use this approach along with CiliumNodeConfig CRD?

@gandro
Copy link
Member Author

gandro commented Oct 2, 2023

Isnt there a shortcut to all of this, you drain/delete the node and rejoin or bring the node up with no previous CNI config and with v1.15?

So yeah, if you don't mind workload pods being restarted during the Cilium upgrade, then draining nodes is also an option. They would basically forgot all about their old clusterpool-v2beta IPs, come up and start requesting multi-pool IPs instead.

The main concern there however is that you want to avoid the operator handing out out a podCIDR to a new (upgraded) node that is still owned by an old (non-upgraded) node. This could happen because from the view of the upgraded multi-pool operator, all old nodes are basically empty, i.e. do not have any pod CIDRs allocated or requested.

This can be circumvented in two ways:

  1. Either by making sure the operator simply allocate "new node" pod CIDRs from a new CIDR range. This can be achieved by making sure that your new default CiliumPodIPPool does not overlap with old cluster-pool-ipv4/6-{cidr,mask} configuration.
  2. Or by telling the multi-pool operator about the in-use pod CIDRs, this can be done by copying all old pod CIDRs to the multi-pool node schema (i.e. "step 2" above)

This obviously assumes that somehow we can control v1.15 is deployed onto cluster however agent is not rolled out on every node except new joining ones. Use this approach along with CiliumNodeConfig CRD?

Yeah. I'm not well versed enough in Kubernetes to tell you what the best option would be to have certain nodes of the DaemonSet use the old image, where as other nodes use the new image. Maybe you will need two separate cilium-agent DaemonSets with disjunct Node Selectors.

(Note that my proposal above says "Update Cilium agents one by one" but you can also do it all at once, i.e. it does not necessarily require that you control which nodes have been migrated and which ones haven't. But of course you might still want to do this for safety reasons)

@log1cb0mb
Copy link
Contributor

Yes, restarting workload on one node at a time is not big of an issue as there is redundancy.

Overlapping/Copying per-node CIDR over is a good point, I had not considered that. iirc, cilium operator does that where it allocates ipam block per node from the start of podCIDR subnet.

Let me give this a try, will test this out in dev environment and see how it goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ipam Impacts IP address management functionality. kind/cleanup This includes no functional changes. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants