datapath: Fix wrong rev-NAT xlation due to stale conntrack entry #10984

brb · 2020-04-15T09:09:59Z

See commit msgs.

In addition to the CI tests, I've tested the fix on a user's cluster which previously was hit by #10983 - the fix resolved the problem.

Fix #10983
Also, this might resolve some of the NodePort BPF flakes.

maintainer-s-little-helper · 2020-04-15T09:10:02Z

Please set the appropriate release note label.

brb · 2020-04-15T09:37:32Z

test-me-please

coveralls · 2020-04-15T11:47:28Z

Coverage increased (+0.02%) to 46.838% when pulling 56a2f84 on pr/brb/np-update-stale-conntrack into c7b9f3e on master.

brb · 2020-04-16T14:18:52Z

test-me-please

brb · 2020-04-16T16:35:31Z

test-gke

borkmann · 2020-04-16T19:03:10Z

bpf/lib/nodeport.h

@@ -541,13 +542,16 @@ static __always_inline int nodeport_lb6(struct __ctx_buff *ctx,

 	case CT_ESTABLISHED:
 	case CT_REPLY:
+		if (ct_state.rev_nat_index != svc->rev_nat_index) {
+			goto redo_all;
+		}


Small nit: no extra braces here and below (kernel style) and wrap in unlikely().

Isn't this racy? E.g. what happens if we had an earlier entry in there where packets are in flight and replies to be expected to go out from the Pod to the client? They will then go out with the wrong svc tuple?

Otoh, we have the same logic in https://github.com/cilium/cilium/blob/master/bpf/bpf_lxc.c#L199 . Given there can only be one such svc at the same time, it really must have been stale. Could we consolidate such check with the one from bpf_lxc into a small helper and add a comment on why it's okay to redo the full tuple? Also, is there any advantage to explicitly delete the existing one first? Perhaps we should remove that from bpf_lxc.c so both cases would be the same logic.

Isn't this racy?

The stale conntrack entry problem can only happen when a packet is sent from the same IP addr and port. So, this means that the client before sending a request to the other svc waited long enough (TIME_WAIT) until the port became available again. Therefore, I assume that such races are very unlikely to happen.

Could we consolidate such check with the one from bpf_lxc into a small helper and add a comment on why it's okay to redo the full tuple?

The check in bpf_lxc.c compares rev_nat_index from two ct_states, while in nodeport.h - from svc and ct_state. Probably it's OK to replace the former to check against svc instead of ct_state_new, so then the consolidation would make sense.

Also, is there any advantage to explicitly delete the existing one first?

ct_delete{4,6}() also tries to remove related SNAT entries. Perhaps, it's fine to keep the SNAT related entries in the case of the nodeport.h, as it can be reused by snat_process() later on. In the case of bpf_lxc.c, do you see a case in which SNAT entries would be created?

Not that I'm aware of, so it would make sense to have both logic the same, meaning we likely could just remove the ct_delete*().

PTAL - I removed the ct_delete*()'s, but I didn't add a helper "ct_belongs_to_svc()", as IMO it might confuse developers by allowing them to think that passing any CT entry would test whether it belongs to the given svc. This is a wrong assumption, as not all entries have rev_nat_index set (e.g. CT_INGRESS).

brb · 2020-04-17T06:45:44Z

GKE hit the Kafka panic flake and K8sDatapathConfig MonitorAggregation Checks that monitor aggregation restricts notifications flake.

brb · 2020-04-17T06:45:50Z

test-gke

brb · 2020-04-17T08:57:57Z

test-me-please

brb · 2020-04-17T11:59:05Z

test-me-please

brb · 2020-04-17T12:01:26Z

test-me-please

brb · 2020-04-20T05:15:20Z

test-me-please

brb · 2020-04-20T14:21:57Z

test-me-please

brb · 2020-04-20T17:20:31Z

test-me-please

The issue description: #10983 Signed-off-by: Martynas Pumputis <m@lambda.lt>

If a service endpoint is accessed via two different NodePort services from the same source IP and port, a reply from the endpoint will be wrongly rev-DNAT xlated. E.g.: 0x6a60e2e6: Conntrack lookup 1/2: src=172.29.50.224:33632 dst=172.29.2.21:32532 0x6a60e2e6: Conntrack lookup 2/2: nexthdr=6 flags=4 0x6a60e2e6: CT verdict: New, revnat=0 0x6a60e2e6: Slave service lookup: slave=1, dport=32532 0x6a60e2e6: Conntrack create: proxy-port=0 revnat=8 src-identity=0 lb=0.0.0.0 0x6a60e2e6: Conntrack lookup 1/2: src=172.29.50.224:33632 dst=10.42.1.254:3306 0x6a60e2e6: Conntrack lookup 2/2: nexthdr=6 flags=1 0x6a60e2e6: CT entry found lifetime=501525, revnat=22 0x6a60e2e6: CT verdict: Established, revnat=22 0xa140b3a1: Conntrack lookup 1/2: src=10.42.1.254:3306 dst=172.29.50.224:33632 0xa140b3a1: Conntrack lookup 2/2: nexthdr=6 flags=0 0xa140b3a1: CT entry found lifetime=523408, revnat=22 0xa140b3a1: CT verdict: Reply, revnat=22 0xa140b3a1: Reverse NAT lookup, index=22 0xa140b3a1: Performing reverse NAT, address=192.168.58.0 port=3306 TCP OUT 172.29.2.21:32532 -> 172.29.50.224:33632 service expires=501869 RxPackets=0 RxBytes=28 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x02 LastTxReport=501808 Flags=0x0000 [ ] RevNAT=8 SourceSecurityID=0 TCP OUT 192.168.58.0:3306 -> 172.29.50.224:33632 service expires=523409 RxPackets=0 RxBytes=28 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x1f LastTxReport=501808 Flags=0x0012 [ TxClosing SeenNonSyn ] RevNAT=22 SourceSecurityID=0 TCP OUT 172.29.50.224:33632 -> 10.42.1.254:3306 expires=523409 RxPackets=8 RxBytes=637 RxFlagsSeen=0x1b LastRxReport=501808 TxPackets=13 TxBytes=861 TxFlagsSeen=0x1f LastTxReport=501808 Flags=0x0032 [ TxClosing SeenNonSyn NodePort ] RevNAT=22 SourceSecurityID=2 First, the endpoint was accessed via 172.29.50.224:33632 -> 192.168.58.0:3306, and afterwards by 172.29.50.224:33632 -> 172.29.2.21:32532. In the second flow, the src IP and port of the reply was wrongly xlated to 192.168.58.0:3306. Fix this by recreating CT entries when stale (=of a different svc) entries are detected. Signed-off-by: Martynas Pumputis <m@lambda.lt>

Previously, when a stale CT entry (of non CT_SERVICE type) was detected (from a previous connection to a service with the same IP/port, but a different rev_nat_index), the entry was removed, and only afterwards recreated. The removal included the removal of related SNAT entries. However, in such case there were no SNAT entries created, thus the removal of CT + SNAT before the recreation was redundant. Signed-off-by: Martynas Pumputis <m@lambda.lt>

brb · 2020-04-20T20:21:31Z

test-me-please

brb · 2020-04-21T05:40:46Z

test-me-please

joestringer · 2020-06-04T01:38:55Z

@brb do you consider this PR to be a major bugfix / affecting multiple users to justify backport to v1.6? Seems like there's non-trivial conflicts and without your assistance it will not make its way into a v1.6 release.

https://docs.cilium.io/en/latest/contributing/release/backports/#backport-criteria-for-x-y-1-z-and-x-y-2-z

brb · 2020-06-04T07:40:42Z

@joestringer I do not expect that many users are accessing the same endoint via two different services from outside. So, it's OK to not include in v1.6. If we hear any reports, than I will backport it manually.

maintainer-s-little-helper bot added the dont-merge/needs-release-note label Apr 15, 2020

maintainer-s-little-helper bot added this to In progress in 1.8.0 Apr 15, 2020

brb force-pushed the pr/brb/np-update-stale-conntrack branch from d18e85d to 48d9d5e Compare April 16, 2020 14:17

brb changed the title ~~WIP: Fix wrong rev-NAT xlation due to stale conntrack entry~~ datapath: Fix wrong rev-NAT xlation due to stale conntrack entry Apr 16, 2020

brb added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Apr 16, 2020

maintainer-s-little-helper bot removed the dont-merge/needs-release-note label Apr 16, 2020

brb added needs-backport/1.6 labels Apr 16, 2020

maintainer-s-little-helper bot added this to Needs backport from master in 1.7.3 Apr 16, 2020

maintainer-s-little-helper bot added this to Needs backport from master in 1.6.9 Apr 16, 2020

brb added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. kind/bug This is a bug in the Cilium logic. area/kube-proxy-free labels Apr 16, 2020

brb requested a review from borkmann April 16, 2020 14:18

brb marked this pull request as ready for review April 16, 2020 14:18

brb requested a review from a team April 16, 2020 14:18

brb requested a review from a team as a code owner April 16, 2020 14:18

borkmann reviewed Apr 16, 2020

View reviewed changes

brb force-pushed the pr/brb/np-update-stale-conntrack branch from 48d9d5e to 495460b Compare April 17, 2020 08:57

brb force-pushed the pr/brb/np-update-stale-conntrack branch from 495460b to ffe32a8 Compare April 17, 2020 12:01

brb force-pushed the pr/brb/np-update-stale-conntrack branch from d4ee3ef to 86d58ce Compare April 20, 2020 14:21

brb force-pushed the pr/brb/np-update-stale-conntrack branch from 86d58ce to 56a2f84 Compare April 20, 2020 17:20

brb added 3 commits April 20, 2020 19:21

test: Add test to trigger stale conntrack issue

bd6e9b5

The issue description: #10983 Signed-off-by: Martynas Pumputis <m@lambda.lt>

joestringer mentioned this pull request Apr 20, 2020

v1.7 backports 2020-04-17 #11038

Merged

borkmann merged commit c3c6954 into master Apr 21, 2020

1.8.0 automation moved this from In progress to Merged Apr 21, 2020

borkmann deleted the pr/brb/np-update-stale-conntrack branch April 21, 2020 07:51

jrfastab added backport-pending/1.7 and removed needs-backport/1.7 labels Apr 22, 2020

maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.7 in 1.7.3 Apr 22, 2020

jrfastab mentioned this pull request Apr 22, 2020

v1.7 backports 2020-04-22 #11109

Merged

joestringer added backport-done/1.7 and removed backport-pending/1.7 labels Apr 29, 2020

maintainer-s-little-helper bot moved this from Backport pending to v1.7 to Backport done to v1.7 in 1.7.3 Apr 29, 2020

christarazi mentioned this pull request Apr 30, 2020

v1.6 backports 2020-04-30 #11266

Merged

qmonnet mentioned this pull request May 12, 2020

v1.6 backports 2020-05-12 #11496

Merged

joestringer mentioned this pull request Jun 4, 2020

v1.6 backports 2020-06-03 #11883

Merged

joestringer removed the needs-backport/1.6 label Jun 4, 2020

maintainer-s-little-helper bot removed this from Needs backport from master in 1.6.9 Jun 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datapath: Fix wrong rev-NAT xlation due to stale conntrack entry #10984

datapath: Fix wrong rev-NAT xlation due to stale conntrack entry #10984

brb commented Apr 15, 2020 •

edited

maintainer-s-little-helper bot commented Apr 15, 2020

brb commented Apr 15, 2020

coveralls commented Apr 15, 2020 •

edited

brb commented Apr 16, 2020

brb commented Apr 16, 2020

borkmann Apr 16, 2020 •

edited

borkmann Apr 16, 2020 •

edited

borkmann Apr 16, 2020 •

edited

brb Apr 17, 2020 •

edited

borkmann Apr 17, 2020

brb Apr 17, 2020

brb commented Apr 17, 2020

brb commented Apr 17, 2020

brb commented Apr 17, 2020

brb commented Apr 17, 2020

brb commented Apr 17, 2020

brb commented Apr 20, 2020

brb commented Apr 20, 2020

brb commented Apr 20, 2020

brb commented Apr 20, 2020

brb commented Apr 21, 2020

joestringer commented Jun 4, 2020

brb commented Jun 4, 2020

datapath: Fix wrong rev-NAT xlation due to stale conntrack entry #10984

datapath: Fix wrong rev-NAT xlation due to stale conntrack entry #10984

Conversation

brb commented Apr 15, 2020 • edited

maintainer-s-little-helper bot commented Apr 15, 2020

brb commented Apr 15, 2020

coveralls commented Apr 15, 2020 • edited

brb commented Apr 16, 2020

brb commented Apr 16, 2020

borkmann Apr 16, 2020 • edited

Choose a reason for hiding this comment

borkmann Apr 16, 2020 • edited

Choose a reason for hiding this comment

borkmann Apr 16, 2020 • edited

Choose a reason for hiding this comment

brb Apr 17, 2020 • edited

Choose a reason for hiding this comment

borkmann Apr 17, 2020

Choose a reason for hiding this comment

brb Apr 17, 2020

Choose a reason for hiding this comment

brb commented Apr 17, 2020

brb commented Apr 17, 2020

brb commented Apr 17, 2020

brb commented Apr 17, 2020

brb commented Apr 17, 2020

brb commented Apr 20, 2020

brb commented Apr 20, 2020

brb commented Apr 20, 2020

brb commented Apr 20, 2020

brb commented Apr 21, 2020

joestringer commented Jun 4, 2020

brb commented Jun 4, 2020

brb commented Apr 15, 2020 •

edited

coveralls commented Apr 15, 2020 •

edited

borkmann Apr 16, 2020 •

edited

borkmann Apr 16, 2020 •

edited

borkmann Apr 16, 2020 •

edited

brb Apr 17, 2020 •

edited