Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cilium in EKS without kube-proxy #10462

Closed
errordeveloper opened this issue Mar 4, 2020 · 24 comments
Closed

Cilium in EKS without kube-proxy #10462

errordeveloper opened this issue Mar 4, 2020 · 24 comments
Labels
area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. kind/feature This introduces new functionality. pinned These issues are not marked stale by our issue bot.

Comments

@errordeveloper
Copy link
Contributor

Proposal / RFE

Is your feature request related to a problem?

Cilium can replace kube-proxy, it should be possible to do it in EKS.

Describe the solution you'd like

Verify it works, update EKS documentation to show how to run Cilium without kube-proxy.

@errordeveloper errordeveloper added kind/feature This introduces new functionality. area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. area/kube-proxy-free labels Mar 4, 2020
@errordeveloper
Copy link
Contributor Author

Notes on testing this from @brb and @nebril:

@errordeveloper
Copy link
Contributor Author

errordeveloper commented Mar 5, 2020

The current EKS AMI (e.g. ami-0907724389e8705d9 in us-west-2) ships kernel version 4.14 (4.14.165-133.209.amzn2.x86_64 to be exact), and 4.19 will be required. There is a package for it, but that would require node reboot or custom AMI.

@brb
Copy link
Member

brb commented Mar 5, 2020

@errordeveloper Do you know which 4.19 version exactly (asking, as we require 4.19.57 or newer)?

@errordeveloper
Copy link
Contributor Author

They call it 4.19.84-33.70.amzn2.

@brb
Copy link
Member

brb commented Mar 5, 2020

Nice, then all features of the kube-proxy replacement should work.

@errordeveloper
Copy link
Contributor Author

I guess we could create an AMI, if we wanna show it working in demo or blog post. I'll check Ubuntu AMIs also.

@brb
Copy link
Member

brb commented Mar 5, 2020

Probably we don't want to maintain an AMI if there is one with the required kernel version.

@errordeveloper
Copy link
Contributor Author

Yeah, I wouldn't suggest creating one that will be maintained, more just as one-off, presumably new kernels will ship as default in AL2 AMIs soon enough.

@errordeveloper
Copy link
Contributor Author

So the Ubuntu AMI is 18.04.3 and it has kernel version 4.15.0-1057-aws. The 8.04.4 was released last month ships with 5.3 actually, so that might come to EKS sometime soon (but you never know, Canonical had been updating EKS AMIs very proactively in my experience).

@mogren
Copy link

mogren commented Mar 5, 2020

About disable-the-aws-node-daemonset-eks-only documentation, why replace the image instead of just deleting the whole daemonset? Same for kube-proxy and CoreDNS, they are just "add-ons" applied to the cluster when it's created.

@errordeveloper
Copy link
Contributor Author

This was actually fixed in #10461 earlier, the docs from master haven't been published yet.

@stale
Copy link

stale bot commented May 5, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label May 5, 2020
@brb brb added pinned These issues are not marked stale by our issue bot. and removed stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. labels May 5, 2020
@borkmann borkmann moved this from TODO (untriaged & unsorted) to WIP (Nate) in 1.9 kube-proxy removal & general dp optimization Jul 10, 2020
@borkmann borkmann moved this from WIP (Nate) to WIP (Aditi) in 1.9 kube-proxy removal & general dp optimization Jul 10, 2020
@tgraf tgraf added the priority/high This is considered vital to an upcoming release. label Sep 2, 2020
@jaygorrell
Copy link

I'm an EKS user with problems installing Cilium so dropping notes here after talking with @tgraf in Slack.

We're using the latest official AMI but it does run through a userdata script to set node labels, install nessus agent, and probably most relevant, runs amazon-linux-extras install -y kernel-ng followed by a reboot. This gets us to a 5.x kernel.

Installing Cilium results in intermittent connection failures and latency to different systems. I did confirm drops with Cilium itself:

>> IPCache entry upserted: {"cidr":"10.40.140.84/32","id":51826,"host-ip":"10.40.131.238","encrypt-key":0,"namespace":"default","pod-name":"jenkins-python-pod-46slj-l5lkr"}
>> IPCache entry upserted: {"cidr":"10.40.137.43/32","id":51826,"old-id":3,"host-ip":"10.40.131.238","old-host-ip":"10.40.131.238","encrypt-key":0,"namespace":"default","pod-name":"jenkins-python-pod-46slj-99skr"}
>> IPCache entry upserted: {"cidr":"10.40.137.220/32","id":51826,"old-id":3,"host-ip":"10.40.131.238","old-host-ip":"10.40.131.238","encrypt-key":0,"namespace":"default","pod-name":"jenkins-python-pod-46slj-xb86n"}
>> IPCache entry upserted: {"cidr":"10.40.129.48/32","id":51826,"old-id":3,"host-ip":"10.40.131.238","old-host-ip":"10.40.131.238","encrypt-key":0,"namespace":"default","pod-name":"jenkins-python-pod-46slj-5mz0j"}
level=info msg="Initializing dissection cache..." subsys=monitor
xx drop (FIB lookup failed) flow 0x4a3f4a3f to endpoint 0, identity 0->0: 10.40.145.114:51075 -> 10.40.154.177:8888 tcp SYN
xx drop (FIB lookup failed) flow 0x4a3f4a3f to endpoint 0, identity 0->0: 10.40.145.114:51075 -> 10.40.154.177:8888 tcp SYN
>> IPCache entry deleted: {"cidr":"10.40.145.108/32","id":49250,"old-host-ip":"10.40.146.67","encrypt-key":0,"namespace":"default","pod-name":"jenkins-kubectl-aws-pod-p6z4c-8q1q6"}
>> IPCache entry deleted: {"cidr":"10.40.159.63/32","id":51826,"old-host-ip":"10.40.146.67","encrypt-key":0,"namespace":"default","pod-name":"jenkins-python-pod-3jjgc-3lf1x"}
xx drop (FIB lookup failed) flow 0x4a3f4a3f to endpoint 0, identity 0->0: 10.40.145.114:51075 -> 10.40.154.177:8888 tcp SYN
>> Endpoint regenerated: {"id":1837,"labels":["k8s:usage=ci","k8s:lifecycle=normal","reserved:host","k8s:node.kubernetes.io/instance-type=m5.xlarge","k8s:topology.kubernetes.io/region=us-west-2","k8s:topology.kubernetes.io/zone=us-west-2b"]}
>> IPCache entry upserted: {"cidr":"10.40.169.197/32","id":3,"host-ip":"10.40.171.181","encrypt-key":0,"namespace":"default","pod-name":"integration-tests-dev-test-zmkls"}
xx drop (FIB lookup failed) flow 0x4a3f4a3f to endpoint 0, identity 0->0: 10.40.145.114:51075 -> 10.40.154.177:8888 tcp SYN
xx drop (FIB lookup failed) flow 0x4db54db5 to endpoint 0, identity 0->0: 10.40.145.114:65269 -> 10.40.154.177:80 tcp SYN
xx drop (FIB lookup failed) flow 0x4db54db5 to endpoint 0, identity 0->0: 10.40.145.114:65269 -> 10.40.154.177:80 tcp SYN

And the connectivity test results:

echo-a-58dd59998d-lfg4c                                  1/1     Running   0          4m56s
echo-b-865969889d-mkt9m                                  1/1     Running   0          4m55s
echo-b-host-659c674bb6-f9vh4                             1/1     Running   0          4m55s
host-to-b-multi-node-clusterip-6fb94d9df6-rtnds          1/1     Running   0          4m54s
host-to-b-multi-node-headless-7c4ff79cd-k5rql            1/1     Running   0          4m54s
pod-to-a-5c8dcf69f7-zmgms                                1/1     Running   0          4m52s
pod-to-a-allowed-cnp-75684d58cc-22f54                    0/1     Running   4          4m53s
pod-to-a-external-1111-669ccfb85f-bswlg                  1/1     Running   0          4m51s
pod-to-a-l3-denied-cnp-7b8bfcb66c-75b6q                  1/1     Running   0          4m53s
pod-to-b-intra-node-74997967f8-q5g59                     1/1     Running   0          4m52s
pod-to-b-intra-node-nodeport-775f967f47-l7wnh            1/1     Running   1          4m52s
pod-to-b-multi-node-clusterip-587678cbc4-7729p           1/1     Running   0          4m51s
pod-to-b-multi-node-headless-574d9f5894-59nkt            1/1     Running   1          4m51s
pod-to-b-multi-node-nodeport-7944d9f9fc-2k4qr            0/1     Running   1          4m51s
pod-to-external-fqdn-allow-google-cnp-6dd57bc859-b2vbm   0/1     Running   4          4m50s

After some experimenting in a test cluster, we narrowed this down to NodeLocalDNS which seems to be a known issue on recent kernels where kubeProxyReplacement: "probe" (helm default). Setting that to disabled fixes the problem.

Taking this to my real clusters, I did some quick testing. Setting kubeProxyReplacement: "disabled" fixes things as expected, but I also tried uninstalling NodeLocalDNS completely (removed the daemonset, kubelet arg, and replaced each node) and I have similar results -- failures without kube-proxy, but disabling the replacement works fine.

I'm a little stumped why NodeLocalDNS seemed to be a key factor in the test cluster but removing it seemed irrelevant in the real one. Are there any changes that could persist even through node replacement? In the past, I've had Istio and Calico on this cluster as well, but neither are currently installed.

Environment
EKS 1.17, v1.17.9-eks-4c6976
Cilium 1.8.2 (Installed via Helm)
Kernel 5.4.58-27.104.amzn2.x86_64

@brb
Copy link
Member

brb commented Sep 2, 2020

@jaygorrell Thanks. Can you paste the following from the node which reported the drops:

  1. ip -4 -o a
  2. kubectl get nodes -o wide

@jaygorrell
Copy link

Just flipped it back to probe and caught these pretty quick:

level=info msg="Initializing dissection cache..." subsys=monitor
xx drop (FIB lookup failed) flow 0xf3ebf3eb to endpoint 0, identity 0->0: 10.40.145.227:45838 -> 10.40.149.157:8080 tcp SYN
xx drop (FIB lookup failed) flow 0xf3ebf3eb to endpoint 0, identity 0->0: 10.40.145.227:45838 -> 10.40.149.157:8080 tcp SYN
  1. ip -4 -o a
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
2: eth0    inet 10.40.145.227/20 brd 10.40.159.255 scope global dynamic eth0\       valid_lft 2516sec preferred_lft 2516sec
4: cilium_host    inet 10.0.11.201/32 scope link cilium_host\       valid_lft forever preferred_lft forever
6: eth1    inet 10.40.158.125/20 brd 10.40.159.255 scope global eth1\       valid_lft forever preferred_lft forever
  1. kubectl get nodes -o wide
NAME                                          STATUS   ROLES    AGE     VERSION              INTERNAL-IP     EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION               CONTAINER-RUNTIME
ip-10-40-136-166.us-west-2.compute.internal   Ready    <none>   3h57m   v1.17.9-eks-4c6976   10.40.136.166   <none>        Amazon Linux 2   5.4.58-27.104.amzn2.x86_64   docker://19.3.6
ip-10-40-140-117.us-west-2.compute.internal   Ready    <none>   3h59m   v1.17.9-eks-4c6976   10.40.140.117   <none>        Amazon Linux 2   5.4.58-27.104.amzn2.x86_64   docker://19.3.6
ip-10-40-140-49.us-west-2.compute.internal    Ready    <none>   3h57m   v1.17.9-eks-4c6976   10.40.140.49    <none>        Amazon Linux 2   5.4.58-27.104.amzn2.x86_64   docker://19.3.6
ip-10-40-141-28.us-west-2.compute.internal    Ready    <none>   3h58m   v1.17.9-eks-4c6976   10.40.141.28    <none>        Amazon Linux 2   5.4.58-27.104.amzn2.x86_64   docker://19.3.6
ip-10-40-143-220.us-west-2.compute.internal   Ready    <none>   3h58m   v1.17.9-eks-4c6976   10.40.143.220   <none>        Amazon Linux 2   5.4.58-27.104.amzn2.x86_64   docker://19.3.6
ip-10-40-145-102.us-west-2.compute.internal   Ready    <none>   3h53m   v1.17.9-eks-4c6976   10.40.145.102   <none>        Amazon Linux 2   5.4.58-27.104.amzn2.x86_64   docker://19.3.6
ip-10-40-145-227.us-west-2.compute.internal   Ready    <none>   3h54m   v1.17.9-eks-4c6976   10.40.145.227   <none>        Amazon Linux 2   5.4.58-27.104.amzn2.x86_64   docker://19.3.6
ip-10-40-149-78.us-west-2.compute.internal    Ready    <none>   3h55m   v1.17.9-eks-4c6976   10.40.149.78    <none>        Amazon Linux 2   5.4.58-27.104.amzn2.x86_64   docker://19.3.6
ip-10-40-151-35.us-west-2.compute.internal    Ready    <none>   3h55m   v1.17.9-eks-4c6976   10.40.151.35    <none>        Amazon Linux 2   5.4.58-27.104.amzn2.x86_64   docker://19.3.6
ip-10-40-155-184.us-west-2.compute.internal   Ready    <none>   3h55m   v1.17.9-eks-4c6976   10.40.155.184   <none>        Amazon Linux 2   5.4.58-27.104.amzn2.x86_64   docker://19.3.6
ip-10-40-160-18.us-west-2.compute.internal    Ready    <none>   3h48m   v1.17.9-eks-4c6976   10.40.160.18    <none>        Amazon Linux 2   5.4.58-27.104.amzn2.x86_64   docker://19.3.6
ip-10-40-161-253.us-west-2.compute.internal   Ready    <none>   3h48m   v1.17.9-eks-4c6976   10.40.161.253   <none>        Amazon Linux 2   5.4.58-27.104.amzn2.x86_64   docker://19.3.6
ip-10-40-170-195.us-west-2.compute.internal   Ready    <none>   3h49m   v1.17.9-eks-4c6976   10.40.170.195   <none>        Amazon Linux 2   5.4.58-27.104.amzn2.x86_64   docker://19.3.6
ip-10-40-171-93.us-west-2.compute.internal    Ready    <none>   3h50m   v1.17.9-eks-4c6976   10.40.171.93    <none>        Amazon Linux 2   5.4.58-27.104.amzn2.x86_64   docker://19.3.6
ip-10-40-174-245.us-west-2.compute.internal   Ready    <none>   3h49m   v1.17.9-eks-4c6976   10.40.174.245   <none>        Amazon Linux 2   5.4.58-27.104.amzn2.x86_64   docker://19.3.6

@brb
Copy link
Member

brb commented Sep 3, 2020

@jaygorrell Can you please provide a sysdump?

@jaygorrell
Copy link

During business hours I have to keep things in working order so I have put things back into a working state for now. That means I installed NodeLocalDNS again and set kubeProxyReplacement: "disabled". Here is a dump in case it's useful in this state but if needed I can try to get one from the broken cluster state after-hours.

If you do indeed need a dump from the cluster when it's in the busted state, let me know if you need it without NodeLocalDNS installed. Since things break for me regardless of that if kubeProxyReplacement is set to probe, it doesn't seem relevant, but I know there are known issues around NodeLocalDNS as well.

@brb
Copy link
Member

brb commented Sep 4, 2020

If you do indeed need a dump from the cluster when it's in the busted state, let me know if you need it without NodeLocalDNS installed.

Yes, please w/o node-local DNS.

@borkmann
Copy link
Member

borkmann commented Sep 4, 2020

(potentially also related: #10645)

@jaygorrell
Copy link

Yes, please w/o node-local DNS.

@brb here you go

To get this I uninstalled NodeLocalDNS, removed the nodelocaldns kubelet arg, and replaced each host. At first I didn't have problems but had forgotten to flip kube-proxy-replacement to probe so did that and restarted cilium pods. Started getting the usual sporadic connectivity issues so took the dump at that point.

@brb
Copy link
Member

brb commented Sep 28, 2020

@jaygorrell Thanks for the sysdump. Can you list some xx drop (FIB lookup failed) from the cluster in the sysdump?

@aditighag
Copy link
Member

I tested the NodePort load-balancing scenarios on EKS, and found an issue with rp_filter's strict mode caused packet drops (details - #13130). Based on the comment, the rp_filter related fix has been implemented so I'll remove my assignment from this issue.

@aditighag aditighag assigned brb and unassigned aditighag Nov 10, 2020
@brb
Copy link
Member

brb commented Nov 18, 2020

@jaygorrell The failures in #10462 (comment) are the same as in #12824. I'm currently working on the fix.

@brb brb removed their assignment Jul 27, 2021
@brb brb removed area/kube-proxy-free priority/high This is considered vital to an upcoming release. labels Jul 27, 2021
@borkmann
Copy link
Member

Should have been fixed via #14201, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. kind/feature This introduces new functionality. pinned These issues are not marked stale by our issue bot.
Projects
No open projects
Development

No branches or pull requests

7 participants