-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicitly fail when we are asked to add a duplicate ip to the ipset #51081
Explicitly fail when we are asked to add a duplicate ip to the ipset #51081
Conversation
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Can the node level interception be deprecated now |
What do you mean, exactly? We don't do node level interception at all, but we still do need to do node level SNAT for kubelet healthchecks. This doesn't change that, this just works around a bug that could exist there. |
OK, NVM, I thought is was intercetion on node |
Why still need this, now if ambient behaves like sidecar interception, we can eliminate host ipset completely |
Even for sidecars we must special-case host kubelet probes - with sidecars we just do it with webhook rewrites for the pod manifest. Ambient doesn't have mutating webhooks, so we do it this way instead. Sidecar mutates the pod manifest to redirect the health probes, ambient mutates the source IP to redirect the health probes. It's tricky. |
Yes, we know the src addrss of kubelete probe, so can do that better than sidecar via bypass this kind of traffic |
We don't know the source address of the kubelet probe. The current approach in ambient doesn't need to know it. Mechanisms to reliably derive the kubelet IP from the perspective of the pod are very CNI dependent. We don't bother to derive the pod-facing kubelet IP with ambient, we just use iptables to tell us if the packet originated locally on the node, from the host node netns. Also, not all packets coming from the kubelet IP originate from the local kubelet process - the kubelet source IP is just not a reliable check. Which is why we don't use it. It's tricky. Asserting policy based on "kubelet src IP" is just an unavoidably unsound approach, no matter how you slice it. |
// Since we purge on restart of CNI, and remove pod IPs from the set on every pod removal/deletion, | ||
// we _shouldn't_ get any overwrite/overlap, unless something is wrong and we are asked to add | ||
// a pod by an IP we already have in the set (which will give an error, which we want). | ||
if err := hostsideProbeSet.AddIP(pip, ipProto, podUID, false); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we hit the error, we fail to the CNI layer which retries and now the ordering should be correct. Right? It will recover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Inelegant, but it will work itself out for now (if the hypothesis about the cause is in fact correct)
In response to a cherrypick label: #51081 failed to apply on top of branch "release-1.22":
|
In response to a cherrypick label: new issue created for failed cherrypick: #51131 |
…stio#51081) * Explicitly fail when we are asked to add a duplicate ip to the ipset Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io> * Add relnotes Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io> * Test fixups Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io> --------- Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
…stio#51081) * Explicitly fail when we are asked to add a duplicate ip to the ipset Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io> * Add relnotes Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io> * Test fixups Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io> --------- Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
* upstream/master: Automator: update ztunnel@master in istio/istio@master (istio#51143) Automator: update proxy@master in istio/istio@master (istio#51142) Automator: update go-control-plane in istio/istio@master (istio#51141) Automator: update ztunnel@master in istio/istio@master (istio#51139) Automator: update proxy@master in istio/istio@master (istio#51138) Work around k8s 1.23/24 templating bugs (istio#51135) Drop legacy references to 1.22 (istio#51127) Automator: update proxy@master in istio/istio@master (istio#51133) Workaround weird cluster versions (istio#51128) Explicitly fail when we are asked to add a duplicate ip to the ipset (istio#51081) State what we did, rather than why (which is somewhat moot) (istio#51123) `istio-cni` - use templated env from cm, simplify config (istio#51050) Automator: update ztunnel@master in istio/istio@master (istio#51122) rename confused function (istio#51118) Automator: update proxy@master in istio/istio@master (istio#51120) Clean up to improve readability (istio#51117) Fix data race in discovery filter (istio#51048)
A user is (somehow) able to produce a scenario where, without
istio-cni
restarting, a pod IP goesmissing
from the host node ipset, causing an active pod to begin failing host node healthchecks.Since we destructively mutate the ipset contents in exactly 3 spots:
istio-cni
startupistio-cni
shutdownand no
istio-cni
restart happens as per logs, that leaves (3) as the only way this could manifest, barring outside interference with the contents of the ipset.The only scenario where I can imagine this happening is if
pod add
andpod remove
events for those two pods both overlap, and come in to us out-of-order - weadd
the new pod with the reused IP, and then somehow later get aremove
event for the old pod with the original use of the IP, leading to the set being out of sync.This PR changes the
addPodIPToIpset
logic, so that if we try to add a pod IP that is already there, we log/return an error - previously, we would silently overwrite the existing entry.This will at least help us see if the above happens and cause the pod with the overlapping IP to fail to start - otherwise we should never legitimately have an overwrite.
(We might have missed this in testing months back because we were using
ip:port
maps, and the odds of two pods using the same IP AND the same ports is even tinier)