feat: unpeering force #3054

giuliafalaschi · 2025-05-27T14:24:01Z

Description

Two flags have been added to perform a forced unpeering:
--force, a Boolean flag that force the unpeering even if the provider cluster is unreachable, and --remote-cluster-id, a flag that must be entered in conjunction with --force and must contain the cluster id of the provider cluster.

Fixes #3051

How Has This Been Tested?

I created two local clusters using kind, installed Liqo on them, and established a peering relationship between the two. To simulate a catastrophic scenario, I modified the IP address of the provider cluster to make it unreachable. Finally, I executed the unpeer command with the options I implemented, specifically:
liqoctl unpeer --force --remote-cluster-id <cluster-id>
In this way, the unpeering was successfully performed on the consumer cluster, even though the provider was no longer reachable.

adamjensenbot · 2025-05-27T14:24:13Z

Hi @giuliafalaschi. Thanks for your PR!

I am @adamjensenbot.
You can interact with me issuing a slash command in the first line of a comment.
Currently, I understand the following commands:

/rebase: Rebase this PR onto the master branch (You can add the option test=true to launch the tests
when the rebase operation is completed)
/merge: Merge this PR into the master branch
/build Build Liqo components
/test Launch the E2E and Unit tests
/hold, /unhold Add/remove the hold label to prevent merging with /merge

Make sure this PR appears in the liqo changelog, adding one of the following labels:

feat: 🚀 New Feature
fix: 🐛 Bug Fix
refactor: 🧹 Code Refactoring
docs: 📝 Documentation
style: 💄 Code Style
perf: 🐎 Performance Improvement
test: ✅ Tests
chore: 🚚 Dependencies Management
build: 📦 Builds Management
ci: 👷 CI/CD
revert: ⏪ Reverts Previous Changes

claudiolor · 2025-05-29T09:42:36Z

Hi @giuliafalaschi, thanks for this PR. Is this ready for review?

giuliafalaschi · 2025-05-29T09:55:56Z

Hi @giuliafalaschi, thanks for this PR. Is this ready for review?

Yes, I tested everything on a local cluster.

claudiolor · 2025-06-18T14:20:38Z

Hi Giulia 😊

Thank you so much for your contribution and the effort you put into this PR, it's really appreciated! 🙏
Apologies for the delay in getting back to you, I got caught up with other tasks but I’ve now had the chance to review and test your changes.

That said, I noticed some issues related to how the forced unpeering is handled. Specifically, there are some challenges that need to be handled when the remote cluster becomes unreachable, which are not addressed in your implementation. The two main components that cause problems in such scenarios are:

CRDReplicator: whose role is reflecting Liqo CR (e.g., ResourceSlice) on the remote cluster, and It attempts to clean them up when they are deleted. If the remote cluster is no longer reachable, the finalizers won’t be removed, preventing the resources from being deleted.
VirtualNode: When a ResourceSlice is deleted, the associated virtual node also gets removed, triggering the VirtualNode controller, which starts a process that tries to evict pods scheduled on the virtual node. The Virtual Kubelet's reflector will take in charge the eviction of the pods, but if the remote cluster is unreachable, this eviction gets stuck.

Because of this, if you try to force unpeer two kind clusters and then delete one of them, with your version of the code you’ll notice that the forced unpeer hangs on the ResourceSlice deletion.

To properly handle these situations, we need to introduce a way to inform that the remote cluster is no longer expected to respond. That way, components like the CRDReplicator or Virtual Kubelet can stop trying to reach it endlessly. Here are a couple of ideas I could come up with:

Use the controller manager’s API server checker to flag a ForeignCluster as "dead" after a number of failed health checks. However, this might introduce issues in case of temporary downtimes, if the cluster is considered "dead", the Virtual Kubelet will mark the pod scheduled on the virtual node as terminated. However, if the connection comes back up, then the pod running remotely will never be stopped.
Explicitly mark a cluster as unreachable in the ForeignCluster spec (for example, when forcing unpeer and the cluster doesn’t respond). Then the component listed above can skip trying to delete remote resources, so cleanly removing the peering connection on the consumer cluster.

Once we handle this properly in the Liqo core, we can refine the behavior of the liqoctl unpeer command. I saw you modified the flow a bit, but I would suggest keeping the original flow and falling back to forced unpeer only if the standard one fails. For example, if we can’t get the cluster ID of the provider during unpeering, we can attempt with a short timeout. If it fails and force is enabled, we fall back to the user-supplied cluster ID and performing the operation required to force the unpeer (for example, following the second option above, marking the ForeignCluster as "dead").
Pay attention to keep the utility functions as "agnostic" as possible, not introducing on them the concept of "force". For example, you can add a flag to skip remote operations or giving the possibility to manually provide the data it tries to get from the remote cluster via API server.

Lastly, regarding the CLI flags: instead of having both --force and --remote-cluster-id (where the latter only makes sense with --force), it might be cleaner to have a single flag like --force-with-cluster-id that provides the fallback cluster ID directly.

Thanks again for your work! Let me know your thoughts on this and feel free to ask if you any questions 😊

giuliafalaschi added 7 commits May 19, 2025 15:47

Add peering detailed logging when connection fail

2fad87d

original version

587c8f7

Add logic to cancel offloading

d898bff

Add remote-cluster-id flag in unpeer command

68b6cd1

Fix some details on unpeering

0e3dc1e

feat: make force without using remote-kubeconfig

bef40d6

Merge branch 'liqotech:master' into unpeering_force

6be06b8

pull-request-size bot added the size/L label May 27, 2025

github-actions bot added the feat Adds a new feature to the codebase label May 27, 2025

giuliafalaschi added 2 commits May 27, 2025 17:05

Merge branch 'master' into unpeering_force

6372a55

Merge branch 'master' into unpeering_force

c022314

giuliafalaschi added 3 commits June 6, 2025 14:09

Merge branch 'master' into unpeering_force

d88ffe8

Merge branch 'master' into unpeering_force

2b2b1fd

Merge branch 'master' into unpeering_force

01c209f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: unpeering force #3054

feat: unpeering force #3054

giuliafalaschi commented May 27, 2025

Uh oh!

adamjensenbot commented May 27, 2025

Uh oh!

claudiolor commented May 29, 2025

Uh oh!

giuliafalaschi commented May 29, 2025

Uh oh!

claudiolor commented Jun 18, 2025

Uh oh!

Uh oh!

feat: unpeering force #3054

Are you sure you want to change the base?

feat: unpeering force #3054

Conversation

giuliafalaschi commented May 27, 2025

Description

How Has This Been Tested?

Uh oh!

adamjensenbot commented May 27, 2025

Uh oh!

claudiolor commented May 29, 2025

Uh oh!

giuliafalaschi commented May 29, 2025

Uh oh!

claudiolor commented Jun 18, 2025

Uh oh!

Uh oh!