Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch MTU from the nodeIP interface #10635

Merged
merged 1 commit into from Jun 3, 2020
Merged

Fetch MTU from the nodeIP interface #10635

merged 1 commit into from Jun 3, 2020

Conversation

manuelbuil
Copy link
Contributor

@manuelbuil manuelbuil commented Mar 19, 2020

The detected MTU is always the one belonging to the interface that acts
as gateway. That interface does not necessarily need to be the one that
holds nodeIP and thus the one that Cilium will use to send traffic
inside the k8s cluster. This patch fixes this and detects the mtu of the
interface used for cluster communications

Fixes: #10309
Signed-off-by: Manuel Buil mbuil@suse.com

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Thanks for contributing!

The detected MTU is always the one belonging to the interface that acts
as gateway. That interface does not necessarily need to be the one that
holds nodeIP and thus the one that Cilium will use to send traffic
inside the k8s cluster. This patch fixes this and detects the mtu of the
interface used for cluster communications

Fixes: #10309

Autodetection of the mtu correctly detects the mtu of the interface used for the kubernetes cluster communication. The mtu was incorrectly detected in cases where multiple interfaces were present and the gateway interface was not the one used for kubernetes cluster communication

@manuelbuil manuelbuil requested review from a team March 19, 2020 15:44
@maintainer-s-little-helper
Copy link

Please set the appropriate release note label.

@joestringer
Copy link
Member

Travis failure seems relevant:

# github.com/cilium/cilium/daemon/cmd
vet: daemon/cmd/status_test.go:83:96: too few arguments in call to mtu.NewConfiguration
make[1]: *** [govet] Error 2
make[1]: Leaving directory `/home/travis/gopath/src/github.com/cilium/cilium'
make: *** [unit-tests] Error 2

@vadorovsky
Copy link
Member

test-me-please

@coveralls
Copy link

coveralls commented Mar 19, 2020

Coverage Status

Coverage increased (+0.03%) to 36.931% when pulling 9d985f1 on manuelbuil:mtuSolution into 601852b on cilium:master.

Copy link
Member

@vadorovsky vadorovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit. Otherwise looks good.

pkg/mtu/detect_linux.go Outdated Show resolved Hide resolved
Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrfastab PTAL as well

pkg/mtu/detect_linux.go Show resolved Hide resolved
@aanm aanm requested a review from jrfastab March 20, 2020 09:31
@manuelbuil
Copy link
Contributor Author

test-me-please

2 similar comments
@aanm
Copy link
Member

aanm commented Mar 20, 2020

test-me-please

@vadorovsky
Copy link
Member

test-me-please

@vadorovsky
Copy link
Member

Travis hit the flake #10615. we can ignore it

@vadorovsky
Copy link
Member

test-me-please

daemon/cmd/daemon.go Outdated Show resolved Hide resolved
daemon/cmd/daemon.go Outdated Show resolved Hide resolved
@vadorovsky
Copy link
Member

test-me-please

@vadorovsky
Copy link
Member

Hitting the flake #10882

@vadorovsky
Copy link
Member

test-me-please

Copy link
Member

@vadorovsky vadorovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@vadorovsky
Copy link
Member

@tgraf PTAL

@vadorovsky
Copy link
Member

restart-ginkgo

@maintainer-s-little-helper
Copy link

Commit ac4bb90ab113c26e642764890e7d8522d75c70a1 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. and removed dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. labels May 25, 2020
@vadorovsky
Copy link
Member

test-me-please

@vadorovsky
Copy link
Member

https://jenkins.cilium.io/job/Cilium-PR-K8s-newest-kernel-4.9/358 failed with

12:38:10  • Failure [298.763 seconds]
12:38:10  K8sDatapathConfig
12:38:10  /home/jenkins/workspace/Cilium-PR-K8s-newest-kernel-4.9/k8s-1.18-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:436
12:38:10    Etcd
12:38:10    /home/jenkins/workspace/Cilium-PR-K8s-newest-kernel-4.9/k8s-1.18-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:436
12:38:10      Check connectivity [It]
12:38:10      /home/jenkins/workspace/Cilium-PR-K8s-newest-kernel-4.9/k8s-1.18-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:471
12:38:10  
12:38:10      cilium pre-flight checks failed
[2020-05-25T10:38:10.986Z]     Expected
[2020-05-25T10:38:10.986Z]         <*errors.errorString | 0xc00025e130>: {
[2020-05-25T10:38:10.986Z]             s: "Cilium validation failed: 4m0s timeout expired: Last polled error: connectivity health is failing: Cluster connectivity is unhealthy on 'cilium-s7rdp': Exitcode: 255 \nStdout:\n \t \nStderr:\n \t Error: Cannot get status/probe: Put \"http://%2Fvar%2Frun%2Fcilium%2Fhealth.sock/v1beta/status/probe\": context deadline exceeded\n\t \n\t command terminated with exit code 255\n\t \n",
[2020-05-25T10:38:10.986Z]         }
[2020-05-25T10:38:10.986Z]     to be nil
12:38:10  
12:38:10      /home/jenkins/workspace/Cilium-PR-K8s-newest-kernel-4.9/k8s-1.18-gopath/src/github.com/cilium/cilium/test/helpers/manifest.go:313

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the idea of taking the externalIP from k8s and using that to determine which device provides 'external' connectivity (and hence governs MTU) seems like a reasonable approach. I wish there were a better way to figure this out than iterating all IPs associated with all devices, but I'm not sure the kernel gives us any better APIs.

I have a bunch of minor nits that should be easy to address below.

pkg/mtu/detect_linux.go Outdated Show resolved Hide resolved
pkg/mtu/detect_linux.go Outdated Show resolved Hide resolved
pkg/mtu/detect_linux.go Outdated Show resolved Hide resolved
pkg/mtu/mtu.go Outdated Show resolved Hide resolved
pkg/mtu/detect_linux.go Outdated Show resolved Hide resolved
daemon/cmd/daemon.go Outdated Show resolved Hide resolved
Copy link
Contributor Author

@manuelbuil manuelbuil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @joestringer

pkg/mtu/mtu.go Outdated Show resolved Hide resolved
daemon/cmd/daemon.go Outdated Show resolved Hide resolved
pkg/mtu/detect_linux.go Outdated Show resolved Hide resolved
@manuelbuil manuelbuil requested a review from a team as a code owner May 29, 2020 08:13
@vadorovsky
Copy link
Member

test-me-please

@aanm aanm requested a review from joestringer May 29, 2020 13:59
@vadorovsky
Copy link
Member

test-me-please

The detected MTU is always the one belonging to the interface that acts
as gateway. That interface does not necessarily need to be the one that
holds nodeIP and thus the one that Cilium will use to send traffic
inside the k8s cluster. This patch fixes this and detects the mtu of the
interface used for cluster communications

The extra calculation of the correct MTU must be done before the daemon
is created and thus part of the K8s initialization code must be moved
before the daemon creation. The reason is that the Kubernetes IP for the
node is calculated during the K8s initialization.

Fixes: #10309

Signed-off-by: Manuel Buil <mbuil@suse.com>
@vadorovsky
Copy link
Member

test-me-please

@vadorovsky vadorovsky added kind/bug This is a bug in the Cilium logic. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. labels Jun 3, 2020
@vadorovsky
Copy link
Member

I guess that the GKE test failure can be ignored.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 3, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.8.0 Jun 3, 2020
@joestringer joestringer merged commit 507d9b8 into cilium:master Jun 3, 2020
1.8.0 automation moved this from In progress to Merged Jun 3, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.8 in 1.8.0 Jun 4, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.8 to Backport done to v1.8 in 1.8.0 Jun 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
No open projects
1.8.0
  
Merged
1.8.0
Backport done to v1.8
Development

Successfully merging this pull request may close these issues.

MTU autoDetect() fetches always the MTU from the gateway interface
10 participants