Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix endpoint slice filtering to ensure we get all the necessary objects #25351

Merged
merged 1 commit into from Jul 12, 2023

Conversation

odinuge
Copy link
Member

@odinuge odinuge commented May 10, 2023

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

Fixes: #issue-number

Fix endpoint slices filtering to ensure we filter out headless services and continue to support older k8s versions where service labels are not propagated to endpoint slices

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 10, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label May 10, 2023
@odinuge odinuge marked this pull request as ready for review May 23, 2023 08:09
@odinuge odinuge requested review from a team as code owners May 23, 2023 08:09
@odinuge odinuge requested a review from joamaki May 23, 2023 08:09
@@ -567,7 +572,7 @@ func (k *K8sWatcher) enableK8sWatchers(ctx context.Context, resourceNames []stri
case resources.K8sAPIGroupEndpointSliceV1Discovery:
// no-op; handled in resources.K8sAPIGroupEndpointV1Core.
case resources.K8sAPIGroupEndpointV1Core:
k.initEndpointsOrSlices(k.clientset.Slim(), serviceOptModifier)
k.initEndpointsOrSlices(k.clientset.Slim(), serviceAndEndpointOptModifier, endpointSliceOptModifier)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should plumb it in here or just just run GetEndpointSliceListOptionsModifier where we need it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine here. The PR #23977 removes this from here anyway. I'll rebase and incorporate these changes once this PR lands.

@@ -567,7 +572,7 @@ func (k *K8sWatcher) enableK8sWatchers(ctx context.Context, resourceNames []stri
case resources.K8sAPIGroupEndpointSliceV1Discovery:
// no-op; handled in resources.K8sAPIGroupEndpointV1Core.
case resources.K8sAPIGroupEndpointV1Core:
k.initEndpointsOrSlices(k.clientset.Slim(), serviceOptModifier)
k.initEndpointsOrSlices(k.clientset.Slim(), serviceAndEndpointOptModifier, endpointSliceOptModifier)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine here. The PR #23977 removes this from here anyway. I'll rebase and incorporate these changes once this PR lands.

@joamaki
Copy link
Contributor

joamaki commented May 23, 2023

/test

@joamaki joamaki added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label May 23, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 23, 2023
@odinuge
Copy link
Member Author

odinuge commented May 31, 2023

Hi @joamaki! Did you want this in before #23977, or after? I guess if we do that one first, back-porting this would be non-trivial.

@odinuge
Copy link
Member Author

odinuge commented Jun 2, 2023

Hmm. Seems like #23977 is going along, so I'll let this sit for now. Do you mind pinging when things are stable so I can rebase this PR @joamaki? 😄

@odinuge odinuge force-pushed the ou-endpointslice-headless branch from 8a6396e to 124a014 Compare June 8, 2023 09:22
@odinuge odinuge requested a review from a team as a code owner June 8, 2023 09:22
@ti-mo
Copy link
Contributor

ti-mo commented Jun 20, 2023

@odinuge thank you for your contribution! I think this would be useful to land in 1.14, but we can hold off until after the release has been cut and treat this as a backport. The tree is currently considered frozen, so now would be a good time to rebase. 😄

@ti-mo ti-mo added kind/enhancement This would improve or streamline existing functionality. dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs labels Jun 20, 2023
@ldelossa
Copy link
Contributor

/test

@ldelossa
Copy link
Contributor

@odinuge, this PR is suffering from two tests that are stuck. This can be fixed by rebasing this PR again main and pushing back up. Can you do this please?

@ldelossa ldelossa removed the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Jun 29, 2023
@squeed
Copy link
Contributor

squeed commented Jul 4, 2023

@odinuge do you have time to rebase this?

I realize all the box-checking is annoying, and we really appreciate the contribution. I'm sorry it takes so much clicking to get stuff in 😞.

@odinuge
Copy link
Member Author

odinuge commented Jul 4, 2023

Yes, I'll rebase when I'm back at my computer at work later today. I've been OOTO for a few weeks (it was also in my gh status until yesterday), so I have not been able to rebase during that time.

@odinuge odinuge force-pushed the ou-endpointslice-headless branch from 124a014 to e6c96a3 Compare July 4, 2023 17:07
@odinuge
Copy link
Member Author

odinuge commented Jul 4, 2023

Rebased now, tho. some earlier changes changed fully how this works, so this won't be possible to backport now 😞

It does however looks like #23977 incorrectly changed how this works, adding this filtering with the wrong arguments... I'll debug some more, but that feels like a big release blocker if its the case..

@odinuge odinuge force-pushed the ou-endpointslice-headless branch from e6c96a3 to 385f75c Compare July 4, 2023 17:50
@odinuge
Copy link
Member Author

odinuge commented Jul 4, 2023

It does however looks like #23977 incorrectly changed how this works, adding this filtering with the wrong arguments... I'll debug some more, but that feels like a big release blocker if its the case..

Yes, so for endpointslices, labels from service to endpointSlice will only be done for k8s v1.20+ (kubernetes/kubernetes#94443). However, for endpoints, it has been done for significantly longer (https://github.com/kubernetes/kubernetes/blame/v1.27.0/pkg/controller/endpoint/endpoints_controller.go#L461-L479).

And as I understand it, cilium still wants to support k8s versions older than v1.20, right? (given I read the current code right)

And to be clear; this PR should mitigate this, and k8s pre v1.20 should start working again. Then, we can concider adding the proxy-name filtering to endpointslices when its safe. Should probably add some more docs and a TODO about that.

Any thought on path forwards @joamaki @squeed?

@odinuge
Copy link
Member Author

odinuge commented Jul 4, 2023

On that note, i'd be curious about the expected behavior of cilium with --k8s-service-proxy-name set, and .toService egress rules are in used (https://github.com/cilium/cilium/pull/13036/files), given it will change what services and endpoints{,lices} we get from the apiserver... But thats up for another day.

@odinuge
Copy link
Member Author

odinuge commented Jul 4, 2023

/test

@squeed squeed added the needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch label Jul 6, 2023
@squeed
Copy link
Contributor

squeed commented Jul 6, 2023

Likewise, it seems like the PR description is no longer accurate; by my reading of the code, we also dropped headless endpointslices before. The bug fix now is to make service name work on <v1.20. Right?

@odinuge
Copy link
Member Author

odinuge commented Jul 6, 2023

@odinuge can you please add the Signed-off-by footer in your commit message? Then we can get this in.

I pushed a new commit yesterday with Signed-off-by, so that comment is pointing to an old commit 😄

Likewise, it seems like the PR description is no longer accurate; by my reading of the code, we also dropped headless endpointslices before. The bug fix now is to make service name work on <v1.20. Right?

Yee. I could also update the commit messages (I did add an edit to this one) again as well, but defer to you.

For the changelog I think its fine/useful having stating the new filtering, given the previous PR that changed the filtering was only a refactor that I assume wasn't intentionally changing the filtering.

@odinuge
Copy link
Member Author

odinuge commented Jul 6, 2023

Oooh, or did v1.14 get its own branch already, and main is now whats getting into v1.15?

@odinuge
Copy link
Member Author

odinuge commented Jul 6, 2023

/test

@squeed
Copy link
Contributor

squeed commented Jul 6, 2023

Oooh, or did v1.14 get its own branch already, and main is now whats getting into v1.15?

bingo. but this should be an easy "backport".

@squeed squeed removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jul 6, 2023
@squeed
Copy link
Contributor

squeed commented Jul 6, 2023

Yee. I could also update the commit messages (I did add an edit to this one) again as well, but defer to you.

Please do update the commit messages (and maybe squash it to one?). I realize this PR has been a bit of a saga, but the git way is to forget the journey :-).

We'll get this in soon. I appreciate the fix.

@odinuge odinuge force-pushed the ou-endpointslice-headless branch from 3c85df3 to 6726db0 Compare July 6, 2023 12:57
@odinuge odinuge changed the title Stop watching endpoint slices for headless services Fix endpoint slices filtering to ensure we filter out headless services and continue to support older k8s versions where service labels are not propagated to endpoint slices Jul 6, 2023
@odinuge odinuge changed the title Fix endpoint slices filtering to ensure we filter out headless services and continue to support older k8s versions where service labels are not propagated to endpoint slices Fix endpoint slice filtering to ensure we get all the necessary objects Jul 6, 2023
@odinuge
Copy link
Member Author

odinuge commented Jul 6, 2023

Please do update the commit messages (and maybe squash it to one?). I realize this PR has been a bit of a saga, but the git way is to forget the journey :-).

Thanks!

Squashed it and updated the commit msg now 😄

Need some more time to wrap my head around things, but I think I got it right now. Mind sanity checking the last statement to make sure I/we all see this the same way?

For completeness; the end user behavior will now be equal for both
endpoints and endpoint slices; since we will always filter the services
in the same way, and when we get an endpoint slice without a
corresponding service in state, we effectively ignore that endpoint slice.

That is true, right? Since if the service is missing, this will ensure we only add it to that internal cache, and nothing else will use it? Or? https://github.com/cilium/cilium/blob/v1.14.0-snapshot.0/pkg/k8s/service_cache.go#L284-L286

The code base has change sooo much since I went OOTO a few weeks ago... 😅

This fixes the filtering of endpoint slices to ensure that we support
all the k8s versions we intend to. This ensures that we always filter
out endpoint slices with the well-known "headless" label, and
_do not_ filter out any endpoint slices based on the service proxy
label.

In pre Kubernetes v1.20, the labels on a service were not mirrored into
the labels of the endpoint slice. The headless label was not applied.
See PR 94443 in kubernetes/kubernetes for more info.

When no longer supporting Kubernetes v1.20, we can remove this custom
logic - and use the same label filter for endpoints, services and endpoint
slices.

Historically, we had no filters on the endpoint slice objects, but with
the two referred commits, the same filter we had for endpoints and
services was introduced to endpoint slices as part of the refactor. The
reason we don't revert the behavior directly, is that we _do want_ to filter
out endpoint slices for headless services, like we do with normal endpoints.

For completeness; the end user behavior will now be equal for both
endpoints and endpoint slices; since we will always filter the services
in the same way, and when we get an endpoint slice without a
corresponding service in state, we effectively ignore that endpoint slice.

Fixes: ca3a4df ("k8s: Add Resource[*Endpoints] to shared resources")
Fixes: 82a728a ("agent, operator, clustermesh-apiserver: use Resource[*Endpoints]")
Signed-off-by: Odin Ugedal <ougedal@palantir.com>
Signed-off-by: Odin Ugedal <odin@uged.al>
@odinuge odinuge force-pushed the ou-endpointslice-headless branch from 6726db0 to 2c5d742 Compare July 6, 2023 13:05
@squeed
Copy link
Contributor

squeed commented Jul 6, 2023

I would mention that this fixes the specific use case of using services with service.kubernetes.io/service-proxy-name set on clusters pre v1.20 -- otherwise it's hard to piece together the context.

@christarazi
Copy link
Member

/test

@christarazi
Copy link
Member

Sounds like when Cilium v1.16 is out then we can drop the changes in this PR? That's when K8s 1.20 support would be dropped.

@christarazi christarazi added the sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. label Jul 6, 2023
@odinuge
Copy link
Member Author

odinuge commented Jul 11, 2023

I would mention that this fixes the specific use case of using services with service.kubernetes.io/service-proxy-name set on clusters pre v1.20 -- otherwise it's hard to piece together the context.

Missed this before the approvals. Still want me to add this to changelog and/or commit messages? Feels like things are pretty overloaded with info already tbh., but happy to add if you still want

@julianwiedmann julianwiedmann removed the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Jul 11, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 11, 2023
@julianwiedmann julianwiedmann merged commit 0790d0f into cilium:main Jul 12, 2023
65 checks passed
@jibi jibi mentioned this pull request Jul 13, 2023
13 tasks
@jibi jibi added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Jul 13, 2023
@aanm aanm added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. kind/community-contribution This was a contribution made by a community member. kind/enhancement This would improve or streamline existing functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants