Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.15 Backports 2024-05-30 #32802

Merged
merged 2 commits into from
Jun 3, 2024
Merged

v1.15 Backports 2024-05-30 #32802

merged 2 commits into from
Jun 3, 2024

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented May 30, 2024

Once this PR is merged, a GitHub action will update the labels of these PRs:

 32671

[ upstream commit 2750123 ]

The only reason for that function to return an error is that the parent
context expired, which happens if the agent is being shut down while the
synchronization has not yet completed. Hence, let's just return, rather
than triggering a fatal error.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 added kind/backports This PR provides functionality previously merged into master. backport/1.15 This PR represents a backport for Cilium 1.15.x of a PR that was merged to main. labels May 30, 2024
@giorio94
Copy link
Member Author

/test-backport-1.15

[ upstream commit cc7c27d ]

[ backporter's notes: hit multiple conflicts in clustermesh.go due to
  the different surrounding context, addressed with minor adaptations;
  additionally dropped the endpointslicesync and release notes related
  hunks, as not relevant in this context. ]

Upon agent and operator restart, we need to wait for full clustermesh
synchronization in multiple subsystems, to prevent breaking existing
cross-cluster connections due to e.g., garbage collection of valid
but not yet retrieved entries for a given remote cluster. However,
the absence of a timeout controlling this process is problematic as
well, as the impossibility of connecting to a remote cluster (e.g.,
due to a misconfiguration) can cause issues for local communication
to the blocked GC operations.

Let's standardize the different wait for synchronization functions
to automatically return after a user configurable timeout (tunable
via the clustermesh-sync-timeout, and set to 1 minute by default)
elapses. This mimics and replaces the already existing timeout used
to unblock endpoint regeneration, generalizing it to all the other
resources as well. The existing flag is deprecated, but it still
takes precedence for consistency with the existing behavior.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the pr/v1.15-backport-2024-05-30-04-36 branch from bd67735 to 264f859 Compare May 31, 2024 07:15
@giorio94
Copy link
Member Author

/test-backport-1.15

@giorio94 giorio94 marked this pull request as ready for review May 31, 2024 07:15
@giorio94 giorio94 requested a review from a team as a code owner May 31, 2024 07:15
@aanm aanm merged commit d491d97 into v1.15 Jun 3, 2024
240 checks passed
@aanm aanm deleted the pr/v1.15-backport-2024-05-30-04-36 branch June 3, 2024 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.15 This PR represents a backport for Cilium 1.15.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master.
Projects
No open projects
Status: Released
Development

Successfully merging this pull request may close these issues.

None yet

2 participants