-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefer k8s Node IP for SNAT IP (IPV{4,6}_NODEPORT) when multiple IP addrs are set to BPF NodePort device #12988
Comments
CC @brb |
@networkop Thanks for the issue. Two things - could you run |
sure, here's the bpf ipcache output first
|
@brb where can I upload the sysdump? |
oh wow, didn't know about GH's drag-and-drop cilium-sysdump-20200827-160052.zip |
@networkop Could you try w/o tunneling? You can do that by setting |
ok, l'll try. do you want me to collect any logs or just verify that it works? |
Thanks. Just to verify that it works. |
I've done the test, however I still don't have the e2e connectivity. In addition to changing the above flags I also had to specify the Additionally, in Azure I enabled "ip forwarding" on the NIC of the worker node I was testing, to stop it from dropping unknown packets. The only new things I've noticed was that traceroute now looks a bit more like what you'd expected in the beginning:
So it looks like there's a SNAT'ed packet, however I can't understand why the translated session keeps sending packets while I don't even see a SYN-ACK in the pod. So maybe it's a red herring. |
Hmm. Would it be possible to get an access to your Azure cluster (I'm |
the same bug , mark it |
@Mengkzhaoyun What is your setup and configuration? |
I deploy the cilium-dev:v1.9.0-rc0 & kubernetes v1.18.8 with out kube-proxy mod (3master ha with kube-vip) I login to bash on dashboard pod , it can not curl access other host's apiserver , but bash on the pod' vm host , the curl https://kube-apiserver.othervm.local:6443/version run success. I change enable-bpf-masquerade to false then my kubernetes'dashboard pod work correct. {
"auto-direct-node-routes": "false",
"bpf-lb-map-max": "65536",
"bpf-map-dynamic-size-ratio": "0.0025",
"bpf-policy-map-max": "16384",
"cluster-name": "default",
"cluster-pool-ipv4-cidr": "10.0.0.0/11",
"cluster-pool-ipv4-mask-size": "24",
"debug": "false",
"disable-cnp-status-updates": "true",
"disable-envoy-version-check": "true",
"enable-auto-protect-node-port-range": "true",
"enable-bpf-clock-probe": "true",
"enable-bpf-masquerade": "false",
"enable-endpoint-health-checking": "true",
"enable-external-ips": "true",
"enable-health-check-nodeport": "true",
"enable-host-port": "true",
"enable-host-reachable-services": "true",
"enable-hubble": "true",
"enable-ipv4": "true",
"enable-ipv6": "false",
"enable-node-port": "true",
"enable-remote-node-identity": "true",
"enable-session-affinity": "true",
"enable-well-known-identities": "false",
"enable-xt-socket-fallback": "true",
"hubble-listen-address": ":4244",
"hubble-socket-path": "/var/run/cilium/hubble.sock",
"identity-allocation-mode": "crd",
"install-iptables-rules": "true",
"ipam": "cluster-pool",
"k8s-require-ipv4-pod-cidr": "true",
"k8s-require-ipv6-pod-cidr": "false",
"kube-proxy-replacement": "partial",
"masquerade": "true",
"monitor-aggregation": "medium",
"monitor-aggregation-flags": "all",
"monitor-aggregation-interval": "5s",
"node-port-bind-protection": "true",
"node-port-range": "20,65000",
"operator-api-serve-addr": "127.0.0.1:9234",
"preallocate-bpf-maps": "false",
"sidecar-istio-proxy-image": "cilium/istio_proxy",
"tunnel": "vxlan",
"wait-bpf-mount": "false"
} |
@Mengkzhaoyun Thanks. Do you run cilium on the master nodes? |
there is 3 master nodes run cilium agent . |
@Mengkzhaoyun Can you please ping me ( |
@brb I'm pretty sure this is related to issue I was last discussing with you guys as well where I had to revert to iptables. |
@networkop @Mengkzhaoyun OK, so what happens in your case is that the wrong IPv4 addr is picked for SNAT, as the selected devices for NodePort have more than one IPv4 addr. Currently, cilium-agent can support only one per If any of you wants to work on the fix (a good opportunity to familiarize with Cilium internals), I'm happy to guide you. |
hostNetwork
pods with enable-bpf-masquerade
I'm happy to pick it up @brb |
The fix might be simple: in the function https://github.com/cilium/cilium/blob/master/pkg/node/address.go#L128 you need to pass k8s Node IPv{4,6} addr to the invocations of |
add CILIUM_IPV4_NODE env to cilium-agent can temp skip the bug. env:
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: CILIUM_IPV4_NODE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName |
@Mengkzhaoyun that makes my deployment crash as the
|
Here's the better way I suppose:
Didn't help my scenario so I must be hitting a different bug altogether. |
JFYI: |
As per discussion offline - it's not related. |
Here's a brief summary of the issues I was hitting:
diff --git a/pkg/node/address.go b/pkg/node/address.go
index a5fa3dadd..7c438a9d8 100644
--- a/pkg/node/address.go
+++ b/pkg/node/address.go
@@ -129,10 +129,14 @@ func InitNodePortAddrs(devices []string) error {
if option.Config.EnableIPv4 {
ipv4NodePortAddrs = make(map[string]net.IP, len(devices))
for _, device := range devices {
- ip, err := firstGlobalV4Addr(device, nil)
+ ip, err := firstGlobalV4Addr(device, GetK8sNodeIP())
if err != nil {
return fmt.Errorf("Failed to determine IPv4 of %s for NodePort", device)
}
+ log.WithFields(logrus.Fields{
+ "device": device,
+ "ip": ip,
+ }).Info("Pinning node IPs")
ipv4NodePortAddrs[device] = ip
}
}
@@ -140,7 +144,7 @@ func InitNodePortAddrs(devices []string) error {
if option.Config.EnableIPv6 {
ipv6NodePortAddrs = make(map[string]net.IP, len(devices))
for _, device := range devices {
- ip, err := firstGlobalV6Addr(device, nil)
+ ip, err := firstGlobalV6Addr(device, GetK8sNodeIP())
if err != nil {
return fmt.Errorf("Failed to determine IPv6 of %s for NodePort", device)
}
diff --git a/pkg/node/address_linux.go b/pkg/node/address_linux.go
index a05a3f38d..bfa16fc73 100644
--- a/pkg/node/address_linux.go
+++ b/pkg/node/address_linux.go
@@ -86,7 +86,7 @@ retryScope:
}
if len(ipsPublic) != 0 {
- if hasPreferred && ip.IsPublicAddr(preferredIP) {
+ if hasPreferred {
return preferredIP, nil
}
Thanks @brb for the support, let me know if you want me to do a PR with the above fix or it's ok to just close this issue for now. |
@networkop , good job , it is work. |
@networkop can you push a PR with that fix? that would be great! |
PR is ready for review 👍 |
Bug report
When Cilium is configured as a kube-proxy replacement, it fails to masquerade the source IP of pods when the target pod is in the
hostNetwork
of one of the k8s nodes.General Information
How to reproduce the issue
Additional details
I have a test pod (10.0.6.223) trying to connect to the API server on
10.0.128.1:443
. When doing a tcpdump on the underlying node, I cannot see SNAT'ed packets.Since the source IP of the pod is not masqueraded, it gets dropped by Azure's networking stack.
According to @brb's comment it should have been translated on TC egress, however I don't see it happening (see above tcpdump).
As soon as I change
enable-bpf-masquerade
to "false" and restart the agent on the node, connectivity gets restored.I've got the environment up and running for a while so I'm happy to collect any additional logs/outputs.
Here's the cilium configmap.
The text was updated successfully, but these errors were encountered: