Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No connectivity from client to server on the same host via kube-proxy ClusterIP when Cilium E/W LB is disabled #10567

Closed
aanm opened this issue Mar 13, 2020 · 9 comments
Assignees
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Milestone

Comments

@aanm
Copy link
Member

aanm commented Mar 13, 2020

client-pod: 10.0.0.13:XXX
service-ip: 172.20.62.249:9090
backend: 10.0.0.115:9090

tcpdump -i any:

00:41:31.413413 IP 10.0.0.13.40106 > 172.20.62.249.9090: Flags [S], seq 179494632, win 64390, options [mss 1370,sackOK,TS val 2655386555 ecr 0,nop,wscale 7], length 0
00:41:31.413492 IP 10.0.0.13.40106 > 10.0.0.115.9090: Flags [S], seq 179494632, win 64390, options [mss 1370,sackOK,TS val 2655386555 ecr 0,nop,wscale 7], length 0
00:41:31.413513 IP 10.0.0.115.9090 > 10.0.0.13.40106: Flags [S.], seq 2265501166, ack 179494633, win 65184, options [mss 1370,sackOK,TS val 2584967544 ecr 2655386555,nop,wscale 7], length 0
00:41:31.413527 IP 10.0.0.115.9090 > 10.0.0.13.40106: Flags [S.], seq 2265501166, ack 179494633, win 65184, options [mss 1370,sackOK,TS val 2584967544 ecr 2655386555,nop,wscale 7], length 0
00:41:31.413537 IP 10.0.0.13.40106 > 10.0.0.115.9090: Flags [R], seq 179494633, win 0, length 0
00:41:31.413544 IP 10.0.0.13.40106 > 10.0.0.115.9090: Flags [R], seq 179494633, win 0, length 0

monitor output:

Policy verdict log: flow 0xdece3b7d local EP ID 2591, remote ID 2, dst port 9090, proto 6, ingress false, action allow, match all, 10.0.0.13:39772 -> 172.20.62.249:9090 tcp SYN
-> stack flow 0xdece3b7d identity 7644->2 state new ifindex 0 orig-ip 0.0.0.0: 10.0.0.13:39772 -> 172.20.62.249:9090 tcp SYN
Policy verdict log: flow 0x492eb6dd local EP ID 2591, remote ID 56645, dst port 39772, proto 6, ingress true, action allow, match all, 10.0.0.115:9090 -> 10.0.0.13:39772 tcp SYN, ACK
-> endpoint 2591 flow 0x492eb6dd identity 56645->7644 state new ifindex lxc10371e843f08 orig-ip 10.0.0.115: 10.0.0.115:9090 -> 10.0.0.13:39772 tcp SYN, ACK

cilium config map

  auto-direct-node-routes: "false"
  bpf-ct-global-any-max: "262144"
  bpf-ct-global-tcp-max: "524288"
  bpf-nat-global-max: "841429"
  cluster-name: default
  debug: "false"
  enable-external-ips: "false"
  enable-host-reachable-services: "false"
  enable-ipv4: "true"
  enable-ipv6: "false"
  enable-metrics: "true"
  enable-node-port: "false"
  enable-remote-node-identity: "true"
  enable-well-known-identities: "false"
  enable-xt-socket-fallback: "true"
  identity-allocation-mode: crd
  install-iptables-rules: "true"
  k8s-require-ipv4-pod-cidr: "true"
  kube-proxy-replacement: disabled
  masquerade: "true"
  monitor-aggregation: medium
  monitor-aggregation-flags: all
  monitor-aggregation-interval: 5s
  node-port-mode: hybrid
  operator-api-serve-addr: 127.0.0.1:9234
  operator-prometheus-serve-addr: :6942
  policy-audit-mode: "false"
  preallocate-bpf-maps: "false"
  prometheus-serve-addr: :9090
  sidecar-istio-proxy-image: cilium/istio_proxy
  synchronize-k8s-nodes: "true"
  tofqdns-enable-poller: "false"
  tunnel: vxlan
  wait-bpf-mount: "false"

iptables rules

$ sudo iptables-save | grep 9090
-A KUBE-SEP-VTRHTEKTWBKBIVAE -p tcp -m tcp -j DNAT --to-destination 10.0.0.115:9090
-A KUBE-SERVICES -d 172.20.62.249/32 -p tcp -m comment --comment "kube-system/prometheus:webui cluster IP" -m tcp --dport 9090 -j KUBE-SVC-SMBNPD2J27EUPM6V
$ sudo iptables-save | grep KUBE-SVC-SMBNPD2J27EUPM6V
:KUBE-SVC-SMBNPD2J27EUPM6V - [0:0]
-A KUBE-SERVICES -d 172.20.62.249/32 -p tcp -m comment --comment "kube-system/prometheus:webui cluster IP" -m tcp --dport 9090 -j KUBE-SVC-SMBNPD2J27EUPM6V
-A KUBE-SVC-SMBNPD2J27EUPM6V -j KUBE-SEP-VTRHTEKTWBKBIVAE
$ sudo iptables-save | grep KUBE-SEP-VTRHTEKTWBKBIVAE
:KUBE-SEP-VTRHTEKTWBKBIVAE - [0:0]
-A KUBE-SEP-VTRHTEKTWBKBIVAE -s 10.0.0.115/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-VTRHTEKTWBKBIVAE -p tcp -m tcp -j DNAT --to-destination 10.0.0.115:9090
-A KUBE-SVC-SMBNPD2J27EUPM6V -j KUBE-SEP-VTRHTEKTWBKBIVAE

commit a734d81

@aanm aanm added area/kube-proxy-free kind/bug This is a bug in the Cilium logic. needs/triage This issue requires triaging to establish severity and next steps. labels Mar 13, 2020
@brb
Copy link
Member

brb commented Mar 13, 2020

When --kube-proxy-replacement=disabled and --disable-k8s-services=false, the datapath is using the old service mechanism for ClusterIP services. It seems that the rev-DNAT xlation was not applied for the SYN-ACK packet, therefore we see the TCP RST.

@aanm Did it happen in the CI? The CI run for the commit which you linked to failed because of the different test case (the managed ETCD). If yes, can you attach the test dump?

I was expecting that we have tests which check client @ host A -> Cluster IP -> server @ host A (https://github.com/cilium/cilium/blob/master/test/k8sT/Services.go#L192), but they sent a request to ClusterIP from the host netns.

@brb
Copy link
Member

brb commented Mar 13, 2020

Created an issue to extend the tests: #10568.

@borkmann
Copy link
Member

Backend is on same node where svc request is processed, Cilium is not handling any services at all since kube-proxy-replacement: disabled.

@aanm aanm changed the title Unable to reach service on same host with kube-proxy-replacement: disabled Unable to reach services with kube-proxy-replacement: disabled Mar 18, 2020
@aanm aanm added the kind/community-report This was reported by a user in the Cilium community, eg via Slack. label Mar 23, 2020
@brb
Copy link
Member

brb commented Mar 24, 2020

I've just managed to reproduce the issue on the kernel 5.6.0-rc2+ (ubuntu-next v45). The problem is that the rev-DNAT xlation is not performed for a reply. This happens because the service endpoint does skb_redirect to the client pod, which makes the reply to bypass the netfilter's conntrack module which is responsible for the xlation when --disable-k8s-service=true.

Interestingly, I cannot reproduce the issue on the kernel 5.5.6-arch1-1 when running cilium-agent as a standalone process (i.e. not in a container).

(OT: please ignore what I said about this issue during the community meeting - I mixed the issues)

@brb brb removed the needs/triage This issue requires triaging to establish severity and next steps. label Apr 6, 2020
@joestringer
Copy link
Member

Did we apply any fix to resolve this issue?

Are we seeing it in CI? These days we should be running effectively a Linux v5.7 RC kernel there so I would expect us to observe it regularly if it is some kind of kernel regression.

@brb
Copy link
Member

brb commented May 11, 2020

Did we apply any fix to resolve this issue?

AFAIK, noup.

Are we seeing it in CI?

We don't have any test which would test podA@host1 -> clusterIP -> podB@host1 with --disable-k8s-service=true.

@joestringer
Copy link
Member

joestringer commented Jun 22, 2020

Context: Cilium 1.7 had hit this because of auto-disablement of k8s services implementation when kube-proxy replacement was disabled. We released a fix for this to not auto-disable the existing implementation.

We have already deprecated --disable-k8s-services flag due to this issue.

Unless something changes, we do not intend to fix this issue.

@borkmann borkmann moved this from 1.8 fixes/CI needed to TODO (normal prio) in 1.9 kube-proxy removal & general dp optimization Jul 6, 2020
@borkmann borkmann moved this from TODO (untriaged) to TODO (user reported) in 1.9 kube-proxy removal & general dp optimization Jul 6, 2020
@borkmann borkmann moved this from TODO (user reported) to Done in 1.9 kube-proxy removal & general dp optimization Jul 6, 2020
@aanm aanm reopened this Jul 27, 2020
@borkmann borkmann added this to WIP (Martynas + Daniel) in 1.9 kube-proxy removal & general dp optimization Sep 4, 2020
@joestringer
Copy link
Member

Removing blocker for v1.9. @brb if you disagree, please reach out :-)

@borkmann borkmann added the sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. label Jan 12, 2021
@brb brb changed the title Unable to reach services with kube-proxy-replacement: disabled kube-proxy ClusterIP does not work when Cilium E/W LB is disabled May 7, 2021
@brb brb changed the title kube-proxy ClusterIP does not work when Cilium E/W LB is disabled No connectivity from client to server on the same host via kube-proxy ClusterIP when Cilium E/W LB is disabled May 7, 2021
@brb
Copy link
Member

brb commented Jul 27, 2021

Closing this issue in favor of #16197.

@brb brb closed this as completed Jul 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
No open projects
Development

No branches or pull requests

4 participants