Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "datapath: Remove 2005 route table" #23346

Merged
merged 1 commit into from
Jan 26, 2023

Conversation

brb
Copy link
Member

@brb brb commented Jan 25, 2023

This reverts commit 2b58e0f.

After removing the 2005 rtable to fix the L7 issue, the kube-proxy NodePort with L7 netpol started to fail in the CI. After taking closer look, the removal of the rtable is causing the reply from the envoy proxy to be passed to lo instead of cilium_host :

14:54:33.585708 eth0 In IP6 fc00:f853:ccd:e793::3.52394 > fc00:f853:ccd:e793::4.30239: Flags [S], seq 504540809, win 64800, options [mss 1440,sackOK,TS val 3651151592 ecr 0,nop,wscale 7], length 0 14:54:33.585852 cilium_host Out IP6 fc00:f853:ccd:e793::4.13607 > fd00:10:244:2::c527.80: Flags [S], seq 504540809, win 64800, options [mss 1440,sackOK,TS val 3651151592 ecr
0,nop,wscale 7], length 0 14:54:33.585856 cilium_net P IP6
fc00:f853:ccd:e793::4.13607 > fd00:10:244:2::c527.80: Flags [S], seq 504540809,
win 64800, options [mss 1440,sackOK,TS val 3651151592 ecr 0,nop,wscale 7],
length 0 14:54:33.585916 lo In IP6 fd00:10:244:2::c527.80 >
fc00:f853:ccd:e793::4.13607: Flags [S.], seq 2619962850, ack 504540810, win
65464, options [mss 65476,sackOK,TS val 1096880080 ecr 3651151592,nop,wscale
7], length 0 14:54:33.585960 cilium_host Out IP6 fc00:f853:ccd:e793::4.13607 >
fd00:10:244:2::c527.80: Flags [R], seq 504540810, win 0, length 0

The NodePort request gets SNAT-ed by iptables to the cilium_host IP addr. The trace is taken on the fc00:f853:ccd:e793::4 node which runs the selected NodePort endpoint.

Fix #23258

@brb brb added kind/bug This is a bug in the Cilium logic. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/bug This PR fixes an issue in a previous release of Cilium. labels Jan 25, 2023
@brb brb requested a review from a team as a code owner January 25, 2023 15:10
Copy link
Member

@dylandreimerink dylandreimerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nbusseneau
Copy link
Member

nbusseneau commented Jan 25, 2023

Should this be marked as release blocker, since it breaks CI reliably on master K8s 1.20-1.24 kernel 4.9? cc @joestringer @aanm

@joestringer
Copy link
Member

@nbusseneau it looks like this was not yet backported to v1.13, so it doesn't need to impact the release.

@nbusseneau
Copy link
Member

Oh yah, didn't think this through.

@joestringer
Copy link
Member

joestringer commented Jan 25, 2023

/test

Job 'Cilium-PR-K8s-1.16-kernel-4.9' failed:

Click to show.

Test Name

K8sAgentHubbleTest Hubble Observe Test L3/L4 Flow

Failure Output

FAIL: hubble-relay was not able to get into ready state

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.16-kernel-4.9 so I can create one.

This reverts commit 2b58e0f.

After removing the 2005 rtable to fix the L7 issue, the kube-proxy NodePort
with L7 netpol started to fail in the CI. After taking closer look, the removal
of the rtable is causing the reply from the envoy proxy to be passed to lo
instead of cilium_host :

14:54:33.585708 eth0  In  IP6 fc00:f853:ccd:e793::3.52394 >
fc00:f853:ccd:e793::4.30239: Flags [S], seq 504540809, win 64800, options [mss
1440,sackOK,TS val 3651151592 ecr 0,nop,wscale 7], length 0 14:54:33.585852
cilium_host Out IP6 fc00:f853:ccd:e793::4.13607 > fd00:10:244:2::c527.80: Flags
[S], seq 504540809, win 64800, options [mss 1440,sackOK,TS val 3651151592 ecr
0,nop,wscale 7], length 0 14:54:33.585856 cilium_net P   IP6
fc00:f853:ccd:e793::4.13607 > fd00:10:244:2::c527.80: Flags [S], seq 504540809,
win 64800, options [mss 1440,sackOK,TS val 3651151592 ecr 0,nop,wscale 7],
length 0 14:54:33.585916 lo    In  IP6 fd00:10:244:2::c527.80 >
fc00:f853:ccd:e793::4.13607: Flags [S.], seq 2619962850, ack 504540810, win
65464, options [mss 65476,sackOK,TS val 1096880080 ecr 3651151592,nop,wscale
7], length 0 14:54:33.585960 cilium_host Out IP6 fc00:f853:ccd:e793::4.13607 >
fd00:10:244:2::c527.80: Flags [R], seq 504540810, win 0, length 0

The NodePort request gets SNAT-ed by iptables to the cilium_host IP addr. The
trace is taken on the fc00:f853:ccd:e793::4 node which runs the selected
NodePort endpoint.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
@brb brb force-pushed the pr/brb/revert-2005-removal branch from ca7d0f1 to 8a428f1 Compare January 26, 2023 09:24
@brb
Copy link
Member Author

brb commented Jan 26, 2023

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 26, 2023
Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😢

@joestringer joestringer merged commit 3ed62d5 into master Jan 26, 2023
@joestringer joestringer deleted the pr/brb/revert-2005-removal branch January 26, 2023 18:17
@brb
Copy link
Member Author

brb commented Mar 22, 2023

Tried running the reverted revert on top of the #24208. Unfortunately the issue still persists - kube-proxy's SNAT rule (EXT NodePort) masquerades the packet to cilium_host IP addr (because it has scope global), and then the reply is sent again to lo. Thinking about removing the table only if running with KPR=strict.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
5 participants