Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: recreate CT entry if proxy_redirect is stale for non-tcp #33222

Merged
merged 1 commit into from
Jun 21, 2024

Conversation

ysksuzuki
Copy link
Member

@ysksuzuki ysksuzuki commented Jun 18, 2024

This commit fixes the issue that datapath erroneously redirects (or doesn't redirect) the reply packets to the proxy if the packet hits the stale CT entry.

The PR #32653 fixed the issue for TCP by having __ct_lookup return CT_NEW if the packet hits a closing stale entry so that the caller can recreate an entry to update the proxy_redirect flag.

This commit lets datapath recreate an entry for non-TCP in the similar case to update the proxy_redirect flag.

This problem can occur, for example, when using dns-proxy.

kubectl -n cilium-test exec client3-7557dd665c-csxh6 -- curl --silent --fail --show-error --connect-timeout 2 --max-time 10 -4 http://echo-external-node.cilium-test.svc.cluster.local:8080/client-ip
curl: (28) Resolving timed out after 2001 milliseconds
command terminated with exit code 28

// The reply packet from core-dns is dropped
kubectl exec debug-8sx77 -- tcpdump -nl -i any udp and host 10.244.1.159 and port 53
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
03:43:28.350752 lxc7273688205e1 In  IP 10.244.1.159.34716 > 10.244.0.36.53: 1292+ A? echo-external-node.cilium-test.svc.cluster.local.cilium-test.svc.cluster.local. (96)
03:43:28.350883 cilium_vxlan Out IP 10.244.1.159.34716 > 10.244.0.36.53: 1292+ A? echo-external-node.cilium-test.svc.cluster.local.cilium-test.svc.cluster.local. (96)
03:43:28.351919 cilium_vxlan P   IP 10.244.0.36.53 > 10.244.1.159.34716: 1292 NXDomain*- 0/1/0 (189)
03:43:28.352224 lxc7273688205e1 In  IP 10.244.1.159.48844 > 10.244.0.108.53: 51484+ A? echo-external-node.cilium-test.svc.cluster.local.svc.cluster.local. (84)
03:43:28.352295 cilium_vxlan Out IP 10.244.1.159.48844 > 10.244.0.108.53: 51484+ A? echo-external-node.cilium-test.svc.cluster.local.svc.cluster.local. (84)
03:43:28.353106 cilium_vxlan P   IP 10.244.0.108.53 > 10.244.1.159.48844: 51484 NXDomain*- 0/1/0 (177)

// Stale CT entry with ProxyRedirect flag
kubectl -n kube-system exec cilium-f65hc -- cilium bpf ct list global 
UDP OUT 10.244.1.159:34716 -> 10.244.0.36:53 expires=1165307 Packets=0 Bytes=0 RxFlagsSeen=0x00 LastRxReport=1165247 TxFlagsSeen=0x00 LastTxReport=1165247 Flags=0x0000 [ ] RevNAT=0 SourceSecurityID=5242 IfIndex=0 
UDP OUT 10.244.1.159:48844 -> 10.244.0.108:53 expires=1165307 Packets=0 Bytes=0 RxFlagsSeen=0x00 LastRxReport=1165247 TxFlagsSeen=0x00 LastTxReport=1165247 Flags=0x0040 [ ProxyRedirect ] RevNAT=0 SourceSecurityID=5242 IfIndex=0 
Recreate CT entries for non-TCP to fix L7 proxy redirect failures.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 18, 2024
@ysksuzuki ysksuzuki added kind/bug This is a bug in the Cilium logic. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. release-note/bug This PR fixes an issue in a previous release of Cilium. feature/conntrack labels Jun 18, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 18, 2024
@ysksuzuki
Copy link
Member Author

/test

@julianwiedmann julianwiedmann self-requested a review June 18, 2024 07:56
@ysksuzuki ysksuzuki marked this pull request as ready for review June 18, 2024 14:51
@ysksuzuki ysksuzuki requested a review from a team as a code owner June 18, 2024 14:51
bpf/bpf_lxc.c Outdated Show resolved Hide resolved
This commit fixes the issue that datapath erroneously redirects
(or doesn't redirect) the reply packets to the proxy if the packet
hits the stale CT entry.

The PR cilium#32653 fixed the issue when the TCP connection hits a closing
stale entry by having __ct_lookup return CT_NEW in that case so that
the caller can recreate an entry to update the proxy_redirect flag.

This commit lets datapath recreate an entry in the case where
non-TCP packets hit the stale CT entry with the proxy_redirect flag,
or an active TCP connection suddenly comes into the scope of an L7 policy.

Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
@ysksuzuki ysksuzuki force-pushed the fix-state-proxy-redirect-non-tcp branch from 12a95e9 to 7d7d538 Compare June 19, 2024 02:17
@ysksuzuki
Copy link
Member Author

/test

@julianwiedmann julianwiedmann self-requested a review June 19, 2024 05:44
Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thank you!

@julianwiedmann julianwiedmann added needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch affects/v1.13 This issue affects v1.13 branch affects/v1.14 This issue affects v1.14 branch labels Jun 19, 2024
@julianwiedmann
Copy link
Member

@jrajahalme if you're in agreement, please make sure to also add the ready-to-merge 🙏

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 21, 2024
@jrajahalme jrajahalme added this pull request to the merge queue Jun 21, 2024
Merged via the queue into cilium:main with commit 6552e09 Jun 21, 2024
67 checks passed
@YutaroHayakawa YutaroHayakawa mentioned this pull request Jun 25, 2024
12 tasks
@YutaroHayakawa YutaroHayakawa added backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. and removed needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Jun 25, 2024
@github-actions github-actions bot added backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. and removed backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. labels Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.13 This issue affects v1.13 branch affects/v1.14 This issue affects v1.14 branch area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. feature/conntrack kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants