Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/node: Skip setting MTU on local node routes #14679

Merged
merged 1 commit into from Jan 21, 2021

Conversation

aditighag
Copy link
Member

@aditighag aditighag commented Jan 21, 2021

Previously, we were setting mtu on local node routes, which takes effect in
routing in recent kernels. This could lead to drops for jumbo packets when the
"Don't fragment" bit is set. Regardless of the kernel routing, we shouldn't
restrict mtu on the host local routes.

This fix avoids setting an mtu on local node routes (i.e., when
enable-local-node flag is enabled).

Testing notes are in the commit description. Integration tests will be added as a follow-up.

Release note

Fix a route MTU issue where pods cannot receive large packets from outside the cluster 
when the sender sets the "don't fragment" (DF) bit. 

@aditighag aditighag requested a review from a team January 21, 2021 00:16
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 21, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.10.0 Jan 21, 2021
@aditighag aditighag requested a review from jibi January 21, 2021 00:16
@aditighag aditighag added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Jan 21, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 21, 2021
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit on one of the godocs, but otherwise LGTM.

pkg/datapath/linux/node.go Outdated Show resolved Hide resolved
@joestringer
Copy link
Member

I would probably describe the release note as something like "Fix MTU issue where pods cannot receive large packets from outside the cluster", since that's the most likely scenario for this to occur (including the environment of the user that reported this issue).

@aditighag
Copy link
Member Author

I would probably describe the release note as something like "Fix MTU issue where pods cannot receive large packets from outside the cluster", since that's the most likely scenario for this to occur (including the environment of the user that reported this issue).

Done. Made the note more explicit by mentioning about the Don't fragment bit.

@aditighag
Copy link
Member Author

test-me-please

@aditighag
Copy link
Member Author

retest-gke

@aditighag
Copy link
Member Author

aditighag commented Jan 21, 2021

I'm unable to reproduce the smoke-test-ipv6 test failure on a local kind cluster. I followed all the steps from the yaml.

ks get pods -A
NAME                                             READY   STATUS    RESTARTS   AGE
echo-a-dc9bcfd8f-wcw7s                           1/1     Running   0          3m18s
echo-b-5884b7dc69-ggp6r                          1/1     Running   0          3m18s
echo-b-host-cfdd57978-gvf85                      1/1     Running   0          3m18s
host-to-b-multi-node-clusterip-c4ff7ff64-mrrc5   1/1     Running   0          3m17s
host-to-b-multi-node-headless-84d8f6f4c4-jgbzx   1/1     Running   1          3m17s
pod-to-a-5cdfd4754d-p6mmb                        1/1     Running   0          3m18s
pod-to-a-allowed-cnp-7d7c8f9f9b-l9c2s            1/1     Running   0          3m17s
pod-to-a-denied-cnp-75cb89dfd-rk7dr              1/1     Running   0          3m17s
pod-to-b-intra-node-nodeport-99b499f7d-d7vd4     1/1     Running   1          3m15s
pod-to-b-multi-node-clusterip-cd4d764b6-wm9sh    1/1     Running   0          3m17s
pod-to-b-multi-node-headless-6696c5f8cd-txrv7    1/1     Running   1          3m17s
pod-to-b-multi-node-nodeport-7ff5595558-w9pg9    1/1     Running   1          3m17s

@aditighag aditighag added release-note/bug This PR fixes an issue in a previous release of Cilium. and removed release-note/minor This PR changes functionality that users may find relevant to operating Cilium. labels Jan 21, 2021
@qmonnet
Copy link
Member

qmonnet commented Jan 21, 2021

retest-gke

@joestringer
Copy link
Member

I see that there were two re-runs of the GKE tests. In an ideal world, this would be zero. I get the impression that the GKE tests passed on the 3rd try, but what were the first two failures? Were they infrastructure issues? Some known flake (which issue #?) Or code issue that got fixed?

@qmonnet
Copy link
Member

qmonnet commented Jan 21, 2021

what were the first two failures?

Haven't seen the first one, but the second was infrastructure-related. GKE failed to find a cluster with nodepools.

@aditighag
Copy link
Member Author

what were the first two failures?

Haven't seen the first one, but the second was infrastructure-related. GKE failed to find a cluster with nodepools.

I posted the failure in the testing channel on Slack. The first one was identical to this. I saw the same failure on other PRs too - https://jenkins.cilium.io/job/Cilium-PR-K8s-GKE/4049/

@joestringer
Copy link
Member

OK, thanks for the info, so otherwise we have basically had all tests pass at least once; 2x GKE infra issues, 1x Travis infra issues. Code is reviewed and I think we're good to ship this. Merging.

@joestringer joestringer merged commit 4ad84d9 into cilium:master Jan 21, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.7 to Backport done to v1.7 in 1.7.13 Jan 21, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.7 to Backport done to v1.7 in 1.7.13 Jan 21, 2021
@aditighag aditighag deleted the mtu-fix branch January 21, 2021 23:31
@aanm aanm added this to Needs backport from master in 1.9.4 Jan 22, 2021
@aanm aanm removed this from Needs backport from master in 1.9.3 Jan 22, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.8 in 1.8.7 Jan 26, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.8 to Backport done to v1.8 in 1.8.7 Jan 28, 2021
@christarazi christarazi moved this from Needs backport from master to Backport done to v1.9 in 1.9.4 Feb 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.7.13
Backport done to v1.7
1.8.7
Backport done to v1.8
1.9.4
Backport done to v1.9
Development

Successfully merging this pull request may close these issues.

None yet

8 participants