-
Notifications
You must be signed in to change notification settings - Fork 7.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support injecting init container to pods backed by ztunnel #49092
base: master
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
@@ -159,6 +162,7 @@ type Config struct { | |||
|
|||
const ( | |||
SidecarTemplateName = "sidecar" | |||
ZtunnelTemplateName = "ztunnel" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZtunnelTemplateName = "ztunnel" | |
ZtunnelTemplateName = "ztunnel" |
nit: do we need to call it ztunnel
? it strictly speaking doesn't have anything to do with ztunnel, right?
ambient
or cni-wait
or ambient-init
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: Nevermind. I see, it is waiting for the ztunnel sockets rather than direct CNI readiness.
I still think the name is overly inside-baseball tho, in general I don't wanna proliferate ztunnel
in our explicit or implicit APIs. Proxy
, node proxy
or ambient
or CNI
are a big enough vocabulary to expose to people.
The recently merged blog post conveys the following.
Can you please explain the race condition that this PR is addressing? |
If the CNI agent is already installed on a node before any pods are scheduled on that node, it works like the blog. If, because you might be scaling up new nodes in an existing cluster, whatever platform you are using happens to schedule pods on those new nodes without waiting for the Istio CNI agent to be started and the plugin to be installed on the node, then the plugin cannot block startup. The plugin has to be there to block startup, and there are some existing-cluster node scaling scenarios where pods might end up on a node before the CNI agent and the plugin do - and that's the race this is fixing. |
Thanks for the detailed explanation @bleggett |
While I understand that this approach solves the problem of the business pod waiting for the ztunnel to be ready, the init container is not a good way to do it. In a way it looks like Ambient is going back to sidecar mode. |
While it's not really going back to sidecar mode, as sidecars never were able to fully solve this CNI readiness problem, I tend to agree on the optics, which is why, for people that need to manage this, #48818 is likely preferable. This is a fallback for users that can't or won't use the untaint controller. |
I agree with u , the untaint controller is preferable~ But if we use the initContainer approach, I think it would be better to leave the iptables execution logic to ztunnel (instead of istio-cni), since consistency is easier to achieve. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @yuval-k let me know if you plan to update the PR to latest? Any objection on this approach @hzxuzhonghu or @keithmattix or @howardjohn? |
It'd be up to the user/provider to decide which approach (untaint or use this initcontainer approach). What we are providing is the 2nd choice as we have heard from the community some folks don't like untaint. |
Alternative idea: what if we try out the 'device plugin' idea for ambient here? #40303 (comment) Its superior to this method if it works since it has no runtime cost; init containers are slow and add non-trivial latency to pod startup (can be seconds, which can be 2x for fast containers) I don't mind this as well if its ready to go and we want to followup no device plugin long term. It doesn't need to block. Note: device plugin would work with sidecars, too, but it might be a good opportunity to try the slightly riskier approach on new code vs old code |
I agree it's less invasive than init containers. We could probably reuse the node tainter as the DRA driver, and let you pick either method by configuring the same component. |
The init container appraoch to solve the CNI race problem. This requires a mutating web hook to inject an init container. The init container will block the application pod from starting, until the ztunnel sockets are present.
Notes: