Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bgp,bugfix: parse ips when converting from slim_core to k8s service #18358

Merged
merged 1 commit into from
Jan 12, 2022

Conversation

ldelossa
Copy link
Contributor

@ldelossa ldelossa commented Jan 3, 2022

this fixes #16967.

previous to this commit, when converting from a slim_core Service to our
k8s Service abstraction the parsed out loadBalancersIP were not
converted to the appropriate map of net.IP objects.

they were not converted because of a guard which only does this for when
"NodePort" configuration is set to true.

with the introduction of the BGP load balancer announcement feature, we
also need to parse IPs coming from the ServiceStatus field in a
slim_core Service.

If this change is not made, both the old and new services look exactly
the same to the k8sWatcher infrastructure and when the Service gets its
load balancer IP the entire event is thrown away.

Signed-off-by: Louis DeLosSantos louis.delos@isovalent.com

@ldelossa ldelossa requested review from a team and sayboras January 3, 2022 21:20
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 3, 2022
@sayboras sayboras added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Jan 5, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 5, 2022
Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the change looks simple, there is a lot of wisdom behind, I am going through the discussion in github issue to understand more. Right now, I don't even understand why nodeport check is there in the first place 🤔

Will be great if other team member can help to review this instead.

@sayboras sayboras requested a review from a team January 5, 2022 11:41
@sayboras sayboras removed their assignment Jan 5, 2022
@ldelossa
Copy link
Contributor Author

ldelossa commented Jan 5, 2022

@sayboras That is where I'm currently at as well. Doing a deep dive.

It appears the linked issue can be "worked-around" by ensuring "NodePort" is enabled off the bat. I didn't suggest this yet because I need to confirm.

Also as I mention in this thread: https://cilium.slack.com/archives/C2B917YHE/p1641328782392200?thread_ts=1641245029.380700&cid=C2B917YHE

I believe a BPF backed NodePort should be completely orthogonal to the BGP Announce LB feature, as that feature just cares about getting external traffic to a load balancer, not whether that load balancer is backed by a BPF node port impl, or the vanilla K8s impl. So part of me feels like this PR will actually change into decoupling the NodePort requirement from the BGP Announce LB feature all together.

CC @aanm may have some insight.

@ldelossa ldelossa marked this pull request as draft January 5, 2022 15:07
@ldelossa ldelossa requested a review from aanm January 5, 2022 16:27
@@ -1517,14 +1517,6 @@ func initEnv(cmd *cobra.Command) {
}
}

// This is necessary because the code inside pkg/k8s.NewService() for
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes away, the BGP LB Announcement features are not coupled to the NodePort configuration.

@ldelossa ldelossa marked this pull request as ready for review January 6, 2022 17:56
@ldelossa ldelossa requested review from a team and kkourt January 6, 2022 17:56
@ldelossa
Copy link
Contributor Author

ldelossa commented Jan 6, 2022

Discussed this with @brb @aanm (separately)

The reason we do not parse IPs when "EnableNodePort" is false is to save on CPU cycles. When the umbrella feature flag "EnableNodePort" is disabled, Cilium does not care about Service events triggered by LoadBalancer Status updates.

However, when the BGP Announce LB feature is enabled, Cilium must care about these updates in order for the k8s Watcher infra to generate a new Service event.

The decision was made to keep the original "EnableNodePort" flag to continue to budget CPU if we don't need to spend it, but if BGP announcement feature is on, parse out the loadBalancerIPs so feature works without being coupled to "EnableNodePort" feature flag.

@@ -128,7 +128,7 @@ func (s *MetalLBSpeaker) OnUpdateService(svc *slim_corev1.Service) error {
}

l.Debug("adding event to queue")
s.queue.Add(epEvent{
s.queue.Add(svcEvent{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good citizen change, this is mislabeled but causes no functional difference in the code, other then logging the wrong event type.

@ldelossa
Copy link
Contributor Author

ldelossa commented Jan 6, 2022

/test

@joestringer joestringer added this to Needs backport from master in 1.11.1 Jan 6, 2022
Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, makes sense to me.

pkg/k8s/service.go Outdated Show resolved Hide resolved
Copy link
Contributor

@kkourt kkourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (just a non-blocking comment).

daemon/cmd/daemon_main.go Show resolved Hide resolved
@ldelossa ldelossa added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 11, 2022
@ldelossa
Copy link
Contributor Author

/test

@ldelossa ldelossa removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 11, 2022
@ldelossa
Copy link
Contributor Author

Latest failures are due to vagrant infra:

15:59:17  Failed in branch Copy code and boot vms

Rerunning tests.

@ldelossa
Copy link
Contributor Author

/test

this fixes cilium#16967.

previous to this commit, when converting from a slim_core Service to our
k8s Service abstraction the parsed out loadBalancersIP were not
converted to the appropriate map of net.IP objects.

they were not converted because of a guard which only does this for when
"NodePort" configuration is set to true.

with the introduction of the BGP load balancer announcement feature, we
also need to parse IPs coming from the ServiceStatus field in a
slim_core Service.

If this change is not made, both the old and new services look exactly
the same to the k8sWatcher infrastructure and when the Service gets its
load balancer IP the entire event is thrown away.

Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
@pchaigno
Copy link
Member

pchaigno commented Jan 12, 2022

/test

Job 'Cilium-PR-K8s-1.22-kernel-4.19' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sServicesTest Checks graceful termination of service endpoints Checks client terminates gracefully on service endpoint deletion

Failure Output

FAIL: Timed out after 60.001s.

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-4.19 so I can create a new GitHub issue to track it.

Job 'Cilium-PR-K8s-1.23-kernel-net-next' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: terminating containers are not deleted after timeout

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.23-kernel-net-next so I can create a new GitHub issue to track it.

@ldelossa
Copy link
Contributor Author

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 12, 2022
@aditighag aditighag merged commit 2275759 into cilium:master Jan 12, 2022
@ldelossa ldelossa deleted the ldelossa/announce-lb-fix branch January 13, 2022 16:41
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.10 in 1.11.1 Jan 14, 2022
@joestringer joestringer added backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. and removed backport-pending/1.11 labels Jan 18, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.10 to Backport done to v1.11 in 1.11.1 Jan 18, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.10 to Backport done to v1.11 in 1.11.1 Jan 18, 2022
@aditighag
Copy link
Member

@ldelossa FYI - This bug fix warranted a release note as it's user-facing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.11.1
Backport done to v1.11
Development

Successfully merging this pull request may close these issues.

BGP announcements not occurring on single-node cluster
9 participants