Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-proxy in IPVS mode breaks MetalLB IPs #153

Closed
danderson opened this issue Jan 26, 2018 · 40 comments
Closed

kube-proxy in IPVS mode breaks MetalLB IPs #153

danderson opened this issue Jan 26, 2018 · 40 comments

Comments

@danderson
Copy link
Contributor

Is this a bug report or a feature request?:

Bug.

What happened:

As reported on slack: kube-proxy in IPVS mode needs to add VIPs to a dummy IPVS interface for the routing to work correctly when packets arrive at a machine. It seems that kube-proxy is adding ClusterIPs to the dummy interface, but not load-balancer IPs.

This is very surprising to me, because it effectively means that IPVS mode breaks load-balancing for most cloud providers, and in general is violating the expectations of what kube-proxy does on the node.

I need to set up an IPVS-powered test cluster, and examine the behavior. This might be an upstream bug, it might be a misconfiguration somewhere, or it might be a planned change of direction for kube-proxy that MetalLB needs to keep up with.

@danderson
Copy link
Contributor Author

Confirmed in my testbed cluster, kube-proxy in IPVS mode does not program the dataplane for LoadBalancer IPs, and apparently also not for externalIPs. This seems like a pretty major feature gap before IPVS mode can go GA. I piled onto the recently opened bug at kubernetes/kubernetes#59976 with more data and a request for resolution.

@danderson danderson removed their assignment Apr 1, 2018
@danderson
Copy link
Contributor Author

Allegedly, this is fixed in the latest 1.11 nightly builds of kube-proxy. I need to verify that.

@pgagnon
Copy link

pgagnon commented Jul 2, 2018

@danderson

Did you ever get around to testing IPVS mode in 1.11? I would love to know if it works or not.

Thanks!

@bjornryden
Copy link

I've been testing a bit on 1.11 with kube-proxy in IPVS mode and kube-router for networking. This seems to work OK with MetalLB. The kube-routers keep complaining about connections from the upstream firewalls, so there's definitely something I need to get fixed there (probably just make the upstreams passive on the BGP)

@kvaps
Copy link
Contributor

kvaps commented Oct 31, 2018

Here is upstream bugs:
Kube-proxy: kubernetes/kubernetes#59976
Kube-router: cloudnativelabs/kube-router#561

@kvaps
Copy link
Contributor

kvaps commented Nov 2, 2018

Probably fixed by kubernetes/kubernetes#70530
After this settings MetalLB is working fine with kube-proxy in ipvs-mode.
But I've tested only L2-mode.
Kubernetes 1.12.1

@kvaps
Copy link
Contributor

kvaps commented Nov 22, 2018

Fixed in kuber-router too
cloudnativelabs/kube-router#580

@m1093782566
Copy link

Can we close this issue now?

@kvaps
Copy link
Contributor

kvaps commented Nov 28, 2018

@m1093782566, I'm not sure about BGP-mode, but for L2 problem totally solved.

@m1093782566
Copy link

Thanks for confirm.

@halfa
Copy link

halfa commented Dec 27, 2018

I'm running 1.12.3 in IPVS + Cilium with mettallb 0.7.3 in BGP mode without issue so far (2 weeks, with workload)

@shenshouer
Copy link

shenshouer commented Jan 14, 2019

I'm running 1.13.1 in IPVS + Calico(IP2IP) with mettallb 0.7.3 in Layer2 mode without issue

@selmison
Copy link

selmison commented Feb 2, 2019

Probably fixed by kubernetes/kubernetes#70530
After this settings MetalLB is working fine with kube-proxy in ipvs-mode.
But I've tested only L2-mode.
Kubernetes 1.12.1

@kvaps
Is this fix applied in versions 1.11 or 1.12? From what I see this correction is applied only from 1.13.

@kvaps
Copy link
Contributor

kvaps commented Feb 2, 2019

Hi @selmison, yes, this fix is applied only since 1.13

@misterdorm
Copy link

Anyone have any updates? We're running MetalLB in BGP mode under v1.13 and I still experience it not working when kube-proxy is in IPVS mode. The external IPs are getting dropped on kube-ipvs0, which is in down/noarp mode (that was part of the fix suggested for this.) And I think the iptables FORWARD rules are in place (I don't see drop counters incrementing on the FORWARD chain.)

So there's still an issue somewhere else. kubernetes/kubernetes#71596 / kubernetes/kubernetes#72432 might be the answer, but I'm not clear if that's the problem I'm experiencing. We haven't yet bumped our clusters up to 1.14 in order to verify.

@sfudeus
Copy link
Contributor

sfudeus commented May 9, 2019

This week we tried it out again with k8s 1.14.1, Calico 3.5 and MetalLB in BGP mode and found it working after disabling ipv6 (which is fine for us). @johscheuer might be able to give more details if required.

@chrono2002
Copy link

This week we tried it out again with k8s 1.14.1, Calico 3.5 and MetalLB in BGP mode and found it working after disabling ipv6 (which is fine for us). @johscheuer might be able to give more details if required.

please

@hightoxicity
Copy link

With kubernetes 1.14.1 + calico 3.4.4 and metalLb in BGP mode, my clusterip randomly fails.
For example, the ipvs lb is well created for my apiserver on node (10.233.0.1:443).
Fetching this enpoint from host will always works properly...
Entering a pod running on this same node, fetching 10.233.0.1:443 randomly fails.

@kvaps
Copy link
Contributor

kvaps commented Aug 6, 2019

Important change

Since Kubernetes v1.14.2/v1.15 added new option for control ARP behavior in kube-proxy. By default this option is set to false:

[IPVS] Introduces flag ipvs-strict-arp to configure stricter ARP sysctls, defaulting to false to preserve existing behaviors. This was enabled by default in 1.13.0, which impacted a few CNI plugins. (#75295, @lbernail)

With MetalLB this option must be set to true

You can achieve this by editing kube-proxy config in current cluster

kubectl edit configmap -n kube-system kube-proxy

and set:

apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
  strictARP: true

You can also add this configuration snippet to your kubeadm-config, just append it with --- after the main configuration


@danderson shouldn't we add corresponding warning for IPVS users into MetalLB docs?

@jonathansloman
Copy link

This issue just hit us as well, using metallb in layer 2 mode with a trafficpolicy of local. We were finding all of our kubernetes nodes responding to ARPs for metallb service ips, resulting in traffic getting routed to places that couldn't handle it. The ARPs weren't coming from the metallb speakers. Eventually we stumbled across tickets mentioning the strictARP mode. PLEASE can this be added to the documentation to save others the pain we've just been through.

champtar added a commit to champtar/kubespray that referenced this issue Sep 17, 2019
strict ARP flag was added by
kubernetes/kubernetes#75295

It's disable by default to not break some CNI, including flannel
so we leave it off by default

We must enable it for MetalLB to work
metallb/metallb#153 (comment)
so fail MetalLB roles if it's not enabled
champtar added a commit to champtar/kubespray that referenced this issue Sep 17, 2019
When using IPVS, kube_proxy_strict_arp = true is required
metallb/metallb#153 (comment)

Add kube_proxy_strict_arp to inventory/sample
@MatthiasLohr
Copy link

I'm experiencing problems with MetalLB + Layer2 + IPVS: Reaching the LoadBalancer service from outside works perfect, but I'm not able to access the LoadBalancer service from inside the cluster (connection times out). It feels like this could be related to kubernetes/kubernetes#79783 with pull request kubernetes/kubernetes#79976.

@salanki
Copy link

salanki commented Feb 19, 2020

Any update on this? Having to run kube-proxy in iptables mode feels very stupid just to workaround traffic to LB IPs being blackholed when using externalTraffifPolicy local

@danderson
Copy link
Contributor Author

It's on upstream Kubernetes to fix kube-proxy at this point. It has been for the last 2 years. Personally, I'm not optimistic, but there's really nothing we can do in MetalLB.

@salanki
Copy link

salanki commented Feb 19, 2020

@danderson: Apologies, I thought I commented on one of the kube-proxy tickets. Long day.

Does anyone know if this problem exists if using kube-router IPVS instead of kube-proxy?

@kvaps
Copy link
Contributor

kvaps commented Feb 19, 2020

@salanki I'm using MetalLB with kube-router instead of kube-proxy. And I was using it with kube-proxy with sctrict-arp enabled (see: #507)

Both cases are working fine for me

@kvaps
Copy link
Contributor

kvaps commented Feb 19, 2020

@danderson, what is actually problem with the kube-proxy? We were fixing it in kubernetes/kubernetes#70530 and since kubernetes/kubernetes#75295 it's become to be an optional flag.

That's why I consider my PR #507 should be merged into existing MetalLB documentation

@salanki
Copy link

salanki commented Feb 19, 2020

The problem that some of us are discussing is unrelated to Proxy ARP (I run BGP mode).

When using externalTrafficPolicy: Local, and Kube-proxy in IPVS mode traffic from a node to a LoadBalancer IP is blackholed if this node has no Pods for that service. This is because kube-proxy adds some entry in IPVS but doesn’t add the endpoint IPs unless they are local. The behavior we would want is that it simply doesn’t add anything at all related to the LB IP in IPVS.

@kvaps
Copy link
Contributor

kvaps commented Feb 19, 2020

Ah alright, unfortunately I have no opportunity to check the BGP, but I can try to check that on L2, could you provide exact steps for reproduce? I will try it on kube-router.

@anroots
Copy link

anroots commented May 2, 2020

Two-node K8s 1.17 cluster, nginx-ingress running on one of them, with MetalLB-assigned External IP (BGP) for the ingress'es LoadBalancer Service. From inside a pod, trying to curl http://gitlab.mydomain.ee (resolves to MetalLB IP for the LB) would fail depending on if the Ingress pod was on the same node or not. Running ipvsadm showed that indeed, there's no "path" to the destination.

Reading the above thread; and @salanki's latest comment helped to resolve this: I scaled up my ingress controller to two instances, to run a pod on all my nodes (with pod anti-affinity helping with this). I guess it's not dumb, if it works - but granted, probably not the ideal solution.

@salanki
Copy link

salanki commented May 2, 2020

It’s a workaround and scales horribly for other applications or a lot of nodes. This really needs to be fixed.

@Elegant996
Copy link

Yes, current solution is run a daemonset for both MetalLB and your Ingress LB and ensure that their affinities match. Otherwise, you could lose out on access from master nodes.

@champtar
Copy link
Contributor

champtar commented May 2, 2020

I think there is still confusion here for some people. MetalLB is only doing the job of getting the packets to the nodes using ARP or BGP, everything else (iptables / ipvs config) is kube-proxy and should be reported to k8s team.
That said one of the work-around might be to use the clusterIP internally and LB IP externally (as originally intended)

@arsenhovhannisyan
Copy link

arsenhovhannisyan commented May 14, 2020

Get issue with Kubernetes 1.18.2 and Metallb (Helm Chart Version metallb-0.12.0 , images (metallb/controller:v0.8.1 , metallb/speaker:v0.8.1) ) , CNI is Flannel . Deployed ingress with service type LoadBalancer, IP is reachable from the same subnet but when we try to access from another subnet(User Subnet) (for example our k8s nodes are located on 10.100.0.0/24 subnet and User Subnet is 10.10.0.0/24 ) we can't access the endpoint . Tcpdump shows that request comes from some IP but don't get the reply. And when we do curl from worker and master nodes we all fail except one node. The interesting thing is that all are working from K8S Node subnet for example from the bastion vm. When changed kube-proxy mode from ipvs to ipables all is working fine.

@sathieu
Copy link
Contributor

sathieu commented Apr 13, 2021

I fixed this by re-enabling ARP on loadbalancer nodes :

$ echo 0  | sudo tee /proc/sys/net/ipv4/conf/all/arp_ignore

I still need to stabilize this by unsetting strictARP and bring this to kubespray.

@champtar
Copy link
Contributor

I fixed this by re-enabling ARP on loadbalancer nodes :

$ echo 0  | sudo tee /proc/sys/net/ipv4/conf/all/arp_ignore

I recommend you do an arping and see how many servers respond (it must be 1)
strictARP: true is required with IPVS, I don't see why it would have changed recently

@sathieu
Copy link
Contributor

sathieu commented Apr 14, 2021

I fixed this by re-enabling ARP on loadbalancer nodes :

$ echo 0  | sudo tee /proc/sys/net/ipv4/conf/all/arp_ignore

I recommend you do an arping and see how many servers respond (it must be 1)
strictARP: true is required with IPVS, I don't see why it would have changed recently

@champtar All LB nodes respond to ARP requests (non-LB nodes don't), and I've configured switches to allow this (those are VMware dvSwitches).

I needed this because I want to publish the k8s API which is not on LB nodes. Also, for other services (ingress), I find it strange to force autoscaleMin to the number of LB nodes (I use Istio ingressgateway which is a deployement and not a daemonset).

I don't understand why nodes are answering to arp requests when the endpoint is present on the node. I don't see any difference when issuing ipvsadm commands or ip commands.

@russellb
Copy link
Collaborator

I think we can close this issue as metallb works with ipvs and the known configuration needs are documented in https://metallb.universe.tf/installation/

for any new issues we should track them separately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests