Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Early podIP notification to better co-operate with Calico in policy-only mode #4385

Open
fasaxc opened this issue Jul 4, 2024 · 5 comments
Assignees
Labels
feature-request Requested Features

Comments

@fasaxc
Copy link

fasaxc commented Jul 4, 2024

Is your feature request related to a problem? Please describe.

Azure CNI and Calico already work together and this configuration brings "best of both worlds": native Azure IPs for Pods; Calico policy engine for those who want that.

However, there is a long standing issue in the kubelet where, under load, it is slow to write the podIP back to the Pod resource. It can take 20s in extreme cases (one write fails and then it does a slow retry some time later). This is a problem for Calico in policy-only mode: we don't get the CNI kick so the only way we can get the podIP to render into policy is to read it from the datastore. This can result in very slow policy updates as we wait for kubelet to write back the podIP.

Describe the solution you'd like

I think my preferred solution would be for AKS to allow Calico to insert a chained CNI ahead of the AKS one so that we can snoop on the CNI ADD/DEL. This would allow us to apply our own annotation and it wouldn't require any configuration on the AKS side. The ask is just to make sure that we can drop a 00-calico-chain.conf into /etc/cni/net.d that runs ahead of the AKS one. We'd then call through to the AKS one.

Describe alternatives you've considered

When using Calico CNI, we solve this problem by writing an annotation back to the pod using a PATCH request in the CNI plugin itself. So, AKS CNI could do the same: write back an annotation (either the calico one or your own, which we can then use int eh same way). AWS VPC CNI has the option to apply such an annotation and that does solve the problem. However:

  • AWS VPC CNI only writes the annotation if configured to do so. This adds a hard-to-discover extra (manual) step, which is less than ideal.
  • We recently added a feature that allows Calico CNI to wait for policy to be fully applied and ready to go before continuing. This prevents pods from starting before they have connectivity in case calico is slow on its side. Adding the annotation wouldn't help with this problem; we also want the chained CNI to solve that...

Additional context

Kubelet bug: kubernetes/kubernetes#39113 There were a few attempts to track down the root cause but we never quite cracked it and the annotation workaround did the trick so we moved on.

@paulgmiller
Copy link
Member

@wedaly @rbtr will be interested in this.
Your solutuon sounds okay to me (though maybe 09 as that leaves us a few spaces to work ith)? But want to give

We might also want to have a slack convo about azure ipam plugins and if the managed calico on our side should use them instead of chaining but that is a larger change.

@wedaly
Copy link
Member

wedaly commented Jul 5, 2024

I'm a bit concerned this approach would break if the CNI plugin name changes. I think there's at least one case where the CNI binary name changed as part of a migration on Windows. @rbtr what do you think?

@rbtr
Copy link

rbtr commented Jul 5, 2024

I am curious if the Calico chained plugin would actually need to run first? This makes me think the first plugin may get special treatment.

I have heard that containerd is deprecating the .conf so this would need to be a single chained conflist (also heard that maybe there is systemd style conf.d/ drop-ins coming, which could make this easier). I am opposed to anyone else dropping an unmanaged conflist that takes precedence over ours, as we expect to fully manage that config and when others make assumptions about it that will inevitably break things like @wedaly noted.

@eyltl
Copy link

eyltl commented Jul 15, 2024

@paulgmiller does azure going to implement something in order to mitigate this?

@aojea
Copy link

aojea commented Oct 26, 2024

@fasaxc there is a much simpler solution if you can implement NRI functionality in Calico to interface with the container runtime.
NRI plugins allow a seamless integration with the CNI plugins

  • the PodStart hook happens immediately after the CNI plugin returned (see
  • does not require to modify existing installations (Container runtimes need to enable NRI plugins, will be enabled by default in containerd 2.0)
  • NRI plugins receive the existing state at startup, so it is possible to reconcile current state and do NRI plugins updates without risk of missing events

See containerd/nri#119 for a more clean integration if the Pod IPs are passed directly, but in the meantime you have access to the network namespace, so you can do some "namespace eavesdropping" and get the IP directly from it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Requested Features
Projects
None yet
Development

No branches or pull requests

6 participants