Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix possible disruption of long running, cross-cluster, pod to node traffic on agent restart #29613

Merged
merged 1 commit into from
Dec 5, 2023

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented Dec 4, 2023

8de7707 ("endpoint: wait for clustermesh IPs/identities sync before regeneration") modified the endpoint regeneration logic to explicitly wait for ipcache and identities synchronization from all remote clusters (in addition to the local one) before starting the regeneration process, to avoid disrupting long running connections.

Yet, that fix is not enough in case of pod-to-node connectivity, because the ipcache entries corresponding to the addresses of remote nodes (as well as the health and ingress IPs) are configured upon reception of the relevant node entry. Hence, let's extend the wait function to also wait for nodes synchronization in addition to IPs and identities, in order to ensure that the ipcache is fully synchronized before triggering the endpoint regeneration process.

Related: #27575

Fix possible disruption of long running, cross-cluster, pod to node traffic on agent restart 

8de7707 ("endpoint: wait for clustermesh IPs/identities sync before
regeneration") modified the endpoint regeneration logic to explicitly
wait for ipcache and identities synchronization from all remote clusters
(in addition to the local one) before starting the regeneration process,
to avoid disrupting long running connections.

Yet, that fix is not enough in case of pod-to-node connectivity, because
the ipcache entries corresponding to the addresses of remote nodes (as
well as the health and ingress IPs) are configured upon reception of the
relevant node entry. Hence, let's extend the wait function to also wait
for nodes synchronization in addition to IPs and identities, in order
to ensure that the ipcache is fully synchronized before triggering the
endpoint regeneration process.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. area/clustermesh Relates to multi-cluster routing functionality in Cilium. needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Dec 4, 2023
@giorio94 giorio94 requested a review from a team as a code owner December 4, 2023 16:57
@giorio94 giorio94 requested a review from marseel December 4, 2023 16:57
@giorio94
Copy link
Member Author

giorio94 commented Dec 4, 2023

/test

@marseel
Copy link
Contributor

marseel commented Dec 5, 2023

Looks good, thanks!

@giorio94
Copy link
Member Author

giorio94 commented Dec 5, 2023

The mergeability check seems stuck. Let's try adding and removing a dont-merge label.

@giorio94 giorio94 added the dont-merge/blocked Another PR must be merged before this one. label Dec 5, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.14.5 Dec 5, 2023
@giorio94 giorio94 removed the dont-merge/blocked Another PR must be merged before this one. label Dec 5, 2023
@mhofstetter mhofstetter added dont-merge/blocked Another PR must be merged before this one. and removed dont-merge/blocked Another PR must be merged before this one. labels Dec 5, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Dec 5, 2023
@ti-mo ti-mo added this pull request to the merge queue Dec 5, 2023
Merged via the queue into cilium:main with commit 24257ff Dec 5, 2023
62 checks passed
@nbusseneau nbusseneau mentioned this pull request Dec 5, 2023
10 tasks
@nbusseneau nbusseneau added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Dec 5, 2023
@github-actions github-actions bot removed the backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. label Dec 6, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed this from Needs backport from main in 1.14.5 Dec 6, 2023
@github-actions github-actions bot added the backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. label Dec 6, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Backport done to v1.14 in 1.14.5 Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.14.5
Backport done to v1.14
Status: Released
Development

Successfully merging this pull request may close these issues.

None yet

5 participants