Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support injecting init container to pods backed by ztunnel #49092

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yuval-k
Copy link
Contributor

@yuval-k yuval-k commented Jan 30, 2024

The init container appraoch to solve the CNI race problem. This requires a mutating web hook to inject an init container. The init container will block the application pod from starting, until the ztunnel sockets are present.

Notes:

  • To make the distinction between sidecar and ambient pod we pass in "/redirection/ambient" as part of the webhook http path.
  • This does mean that the ambient pod selection logic is "duplicated" in the webhook config.
  • I'll wait on Untaint controller #48818 to merge first, as i'll want to re-use some the code from there for integration testing.

@istio-testing istio-testing added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Jan 30, 2024
@istio-testing
Copy link
Collaborator

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@istio-policy-bot istio-policy-bot added the area/ambient Issues related to ambient mesh label Jan 30, 2024
@istio-testing istio-testing added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 30, 2024
@yuval-k yuval-k mentioned this pull request Jan 31, 2024
@@ -159,6 +162,7 @@ type Config struct {

const (
SidecarTemplateName = "sidecar"
ZtunnelTemplateName = "ztunnel"
Copy link
Contributor

@bleggett bleggett Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ZtunnelTemplateName = "ztunnel"
ZtunnelTemplateName = "ztunnel"

nit: do we need to call it ztunnel? it strictly speaking doesn't have anything to do with ztunnel, right?

ambient or cni-wait or ambient-init?

Copy link
Contributor

@bleggett bleggett Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: Nevermind. I see, it is waiting for the ztunnel sockets rather than direct CNI readiness.

I still think the name is overly inside-baseball tho, in general I don't wanna proliferate ztunnel in our explicit or implicit APIs. Proxy, node proxy or ambient or CNI are a big enough vocabulary to expose to people.

@sridhargaddam
Copy link
Contributor

sridhargaddam commented Feb 1, 2024

The recently merged blog post conveys the following.

...  node’s istio-cni agent, and block pod startup until the agent successfully configures redirection. Since CNI plugins are invoked by the CRI as early as possible in the Kubernetes pod creation process, this ensures that we can establish traffic redirection early enough to prevent traffic escaping during startup, without relying on things like init containers.

Can you please explain the race condition that this PR is addressing?

@bleggett
Copy link
Contributor

bleggett commented Feb 1, 2024

The recently merged blog post conveys the following.

...  node’s istio-cni agent, and block pod startup until the agent successfully configures redirection. Since CNI plugins are invoked by the CRI as early as possible in the Kubernetes pod creation process, this ensures that we can establish traffic redirection early enough to prevent traffic escaping during startup, without relying on things like init containers.

Can you please explain the race condition that this PR is addressing?

If the CNI agent is already installed on a node before any pods are scheduled on that node, it works like the blog.

If, because you might be scaling up new nodes in an existing cluster, whatever platform you are using happens to schedule pods on those new nodes without waiting for the Istio CNI agent to be started and the plugin to be installed on the node, then the plugin cannot block startup.

The plugin has to be there to block startup, and there are some existing-cluster node scaling scenarios where pods might end up on a node before the CNI agent and the plugin do - and that's the race this is fixing.

@sridhargaddam
Copy link
Contributor

Can you please explain the race condition that this PR is addressing?

If the CNI agent is already installed on a node before any pods are scheduled on that node, it works like the blog.

If, because you might be scaling up new nodes in an existing cluster, whatever platform you are using happens to schedule pods on those new nodes without waiting for the Istio CNI agent to be started and the plugin to be installed on the node, then the plugin cannot block startup.

The plugin has to be there to block startup, and there are some existing-cluster node scaling scenarios where pods might end up on a node before the CNI agent and the plugin do - and that's the race this is fixing.

Thanks for the detailed explanation @bleggett

@zengyuxing007
Copy link
Contributor

zengyuxing007 commented Feb 6, 2024

While I understand that this approach solves the problem of the business pod waiting for the ztunnel to be ready, the init container is not a good way to do it. In a way it looks like Ambient is going back to sidecar mode.

@bleggett
Copy link
Contributor

bleggett commented Feb 6, 2024

While I understand that this approach solves the problem of the business pod waiting for the ztunnel to be ready, the init container is not a good way to do it. In a way it looks like Ambient is going back to sidecar mode.

While it's not really going back to sidecar mode, as sidecars never were able to fully solve this CNI readiness problem, I tend to agree on the optics, which is why, for people that need to manage this, #48818 is likely preferable.

This is a fallback for users that can't or won't use the untaint controller.

@zengyuxing007
Copy link
Contributor

zengyuxing007 commented Feb 7, 2024

While I understand that this approach solves the problem of the business pod waiting for the ztunnel to be ready, the init container is not a good way to do it. In a way it looks like Ambient is going back to sidecar mode.

While it's not really going back to sidecar mode, as sidecars never were able to fully solve this CNI readiness problem, I tend to agree on the optics, which is why, for people that need to manage this, #48818 is likely preferable.

This is a fallback for users that can't or won't use the untaint controller.

I agree with u , the untaint controller is preferable~ But if we use the initContainer approach, I think it would be better to leave the iptables execution logic to ztunnel (instead of istio-cni), since consistency is easier to achieve.

@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Mar 6, 2024
@istio-testing
Copy link
Collaborator

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Mar 8, 2024
@istio-policy-bot istio-policy-bot added the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Mar 23, 2024
@bleggett bleggett reopened this Mar 25, 2024
@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Mar 25, 2024
@bleggett bleggett added the lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed label Mar 25, 2024
@linsun
Copy link
Member

linsun commented Apr 4, 2024

Hi @yuval-k let me know if you plan to update the PR to latest?

Any objection on this approach @hzxuzhonghu or @keithmattix or @howardjohn?

@linsun
Copy link
Member

linsun commented Apr 4, 2024

While I understand that this approach solves the problem of the business pod waiting for the ztunnel to be ready, the init container is not a good way to do it. In a way it looks like Ambient is going back to sidecar mode.

While it's not really going back to sidecar mode, as sidecars never were able to fully solve this CNI readiness problem, I tend to agree on the optics, which is why, for people that need to manage this, #48818 is likely preferable.
This is a fallback for users that can't or won't use the untaint controller.

I agree with u , the untaint controller is preferable~ But if we use the initContainer approach, I think it would be better to leave the iptables execution logic to ztunnel (instead of istio-cni), since consistency is easier to achieve.

It'd be up to the user/provider to decide which approach (untaint or use this initcontainer approach). What we are providing is the 2nd choice as we have heard from the community some folks don't like untaint.

@howardjohn
Copy link
Member

Alternative idea: what if we try out the 'device plugin' idea for ambient here? #40303 (comment)

Its superior to this method if it works since it has no runtime cost; init containers are slow and add non-trivial latency to pod startup (can be seconds, which can be 2x for fast containers)

I don't mind this as well if its ready to go and we want to followup no device plugin long term. It doesn't need to block.

Note: device plugin would work with sidecars, too, but it might be a good opportunity to try the slightly riskier approach on new code vs old code

@bleggett
Copy link
Contributor

bleggett commented Apr 4, 2024

Alternative idea: what if we try out the 'device plugin' idea for ambient here? #40303 (comment)

Its superior to this method if it works since it has no runtime cost; init containers are slow and add non-trivial latency to pod startup (can be seconds, which can be 2x for fast containers)

I don't mind this as well if its ready to go and we want to followup no device plugin long term. It doesn't need to block.

Note: device plugin would work with sidecars, too, but it might be a good opportunity to try the slightly riskier approach on new code vs old code

I agree it's less invasive than init containers. We could probably reuse the node tainter as the DRA driver, and let you pick either method by configuring the same component.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ambient Issues related to ambient mesh do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed needs-rebase Indicates a PR needs to be rebased before being merged size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants