Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s: fix incorrect EndpointSlice API version [1.12] #27378

Merged

Conversation

nebril
Copy link
Member

@nebril nebril commented Aug 9, 2023

This is a backport of #27277 to 1.12 branch.

This commit fixes a bug that k8s watcher incorrectly chooses the discovery/v1beta1 EndpointSlice in environments where discovery/v1 EndpointSlice is available.
The agent doesn't wait for the discovery/v1 EndpointSlice to be received and ends up removing live services because of this bug.

The issue happens in the following scenario:

  1. The agent initializes and runs the informers for Service and EndpointSlice.
  2. The reflectors list the resources from kube-apiserver and replace DeltaFIFO.
  3. The informers pop items from the DeltaFIFO and call the onAdd handler one by one.
  4. If the Service handler is called first, ServiceCache.UpdateService doesn't emit the UpdateService event, because the service is not ready. (No backend)
  5. The EndpointSlice handler is called next, ServiceCache.updateEndpoints emits UpdateService event.
  6. The agent doesn't wait until the event triggered at step 5 is processed, and calls SyncWithK8sFinished.
  7. The agent removes live services because it hasn't heard about it. (It is supposed to hear about those services in the event process triggered at step 5)

fixes: #27215

This commit fixes a bug that k8s watcher incorrectly chooses the
discovery/v1beta1 EndpointSlice in environments where
discovery/v1 EndpointSlice is available.
The agent doesn't wait for the discovery/v1 EndpointSlice to be received
and ends up removing live services because of this bug.

The issue happens in the following scenario:

1. The agent initializes and runs the informers for Service and EndpointSlice.
2. The reflectors list the resources from kube-apiserver and replace DeltaFIFO.
3. The informers pop items from the DeltaFIFO and call the onAdd handler one by one.
4. If the Service handler is called first, ServiceCache.UpdateService doesn't
emit the UpdateService event, because the service is not ready. (No backend)
5. The EndpointSlice handler is called next, ServiceCache.updateEndpoints emits
UpdateService event.
6. The agent doesn't wait until the event triggered at step 5 is processed, and
calls SyncWithK8sFinished.
7. The agent removes live services because it hasn't heard about it.
(It is supposed to hear about those services in the event process triggered at step 5)

fixes: cilium#27215

Signed-off-by: Yusuke Suzuki <yusuke-suzuki@cybozu.co.jp>
Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
@nebril nebril added kind/backports This PR provides functionality previously merged into master. backport/1.12 This PR represents a backport for Cilium 1.12.x of a PR that was merged to main. labels Aug 9, 2023
@nebril nebril requested a review from a team as a code owner August 9, 2023 11:00
@nebril nebril added the release-blocker/1.12 This issue will prevent the release of the next version of Cilium. label Aug 9, 2023
@nebril
Copy link
Member Author

nebril commented Aug 9, 2023

/test-backport-1.12

@nebril nebril requested a review from ysksuzuki August 9, 2023 15:02
Copy link
Member

@ysksuzuki ysksuzuki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@nebril nebril merged commit 58f393f into cilium:v1.12 Aug 9, 2023
56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.12 This PR represents a backport for Cilium 1.12.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. release-blocker/1.12 This issue will prevent the release of the next version of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants