New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
daemon: Implement route-based device detection #17219
daemon: Implement route-based device detection #17219
Conversation
780dbc0
to
19a42da
Compare
daemon/cmd/devices.go
Outdated
|
||
// updateDevicesFromRoutes processes a batch of routes and updates the set of | ||
// devices. Returns true if devices changed. | ||
func (dm *DeviceManager) updateDevicesFromRoutes(routes []netlink.Route) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A follow-up PR will reuse this logic to detect devices at runtime and reload the datapath.
return nil | ||
} | ||
|
||
func (dm *DeviceManager) GetDevices() []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a follow-up with runtime detection of devices the plan is to stop using option.Config.Devices
throughout and rather call into device manager to ask for the list of devices as it may mutate over time. Hence I already have the state for the devices and this function here.
See #17187 for the work in progress for the runtime detection and reasoning behind the structure of this file.
@@ -112,6 +112,8 @@ type Daemon struct { | |||
monitorAgent *monitoragent.Agent | |||
ciliumHealth *health.CiliumHealth | |||
|
|||
deviceManager *DeviceManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more suitable place for DeviceManager
might be in pkg/datapath
, but I'd like to leave it here for now.
test-me-please |
19a42da
to
b6bfcd9
Compare
test-me-please |
6bd1d94
to
c75e3f1
Compare
c75e3f1
to
cd0c249
Compare
test-me-please |
cd0c249
to
ee8436a
Compare
test-runtime |
test-gke |
231790e
to
23bc0f4
Compare
test-runtime |
e235b4b
to
8dadc1a
Compare
test-me-please Job 'Cilium-PR-K8s-1.16-net-next' failed and has not been observed before, so may be related to your PR: Click to show.Test Name
Failure Output
If it is a flake, comment |
8dadc1a
to
4bb3944
Compare
test-me-please Job 'Cilium-PR-K8s-1.16-net-next' failed and has not been observed before, so may be related to your PR: Click to show.Test Name
Failure Output
If it is a flake, comment |
test-1.16-netnext |
@pchaigno PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you break commit
daemon: Address review comments on device detection
and merge it in the respective commits it fixes/improves? Having it in a separate commit hinders reviewing and doesn't make sense once merged.Ah I was just trying to make reviewing easier. I'll squash.
I didn't mean you should squash everything; that doesn't make reviews any easier ;-)
If you're looking for guidance on how to split commits in pull requests, maybe a good starting point is https://kernelnewbies.org/PatchPhilosophy#How_to_break_up_changes. For this specific pull request, a good example is the code you moved from daemon/cmd/kube_proxy_replacement{,_test}.go
to a new file; that move should be in its own commit, before the rest. It would help reviewers because we wouldn't have to re-review existing code that was just moved around.
This reimplements the device detection logic to use the route information rather than detecting devices based on k8s nodeIP and the device with the default route. The devices are discovered by finding all devices mentioned in global unicast routes and filtering out cilium-managed devices by prefix. The reason for introducing DeviceManager is to later add support for dynamically reconfiguring the datapath for devices added at runtime. Also the detected devices are now also used for host firewall and bandwidth manager so makes sense for this logic to be moved out. Fixes: cilium#15960 Signed-off-by: Jussi Maki <jussi@isovalent.com>
Since IPsec is not yet using the detected devices, disable detection if only IPsec is enabled. This fixes a failure in IPsec L7 tests. Root cause still unknown why detecting the devices causes issues. Signed-off-by: Jussi Maki <jussi@isovalent.com>
4bb3944
to
78faa05
Compare
test-me-please Job 'Cilium-PR-K8s-1.16-net-next' failed and has not been observed before, so may be related to your PR: Click to show.Test Name
Failure Output
If it is a flake, comment |
test-1.16-netnext |
I think the two tests failing in k8s-1.16-kernel-netnext are new, so likely a case of #15474. You can confirm by checking you don't have the related changes (the changes those tests test) in your branch. Happy to help on Slack. If that's confirmed, then I think we're good to go. The remaining cilium/agent review is for changes under |
Yep, it's a new test. |
Yep, but are the changes added with that test included in your branch? |
This reimplements the device detection logic to use the route
information rather than detecting devices based on k8s nodeIP and
the device with the default route.
The devices are discovered by finding all devices mentioned in global
unicast routes and filtering out cilium-managed devices by prefix.
Motivation for this change is to handle cases like #15960 more reliably
and not have the user specify devices manually. This also prepares for
later work to detect devices at runtime to allow e.g. use of KPR with ENI,
which adds devices dynamically.
The reason for introducing DeviceManager is to later add support for
dynamically reconfiguring the datapath for devices added at runtime.
Also the detected devices are now also used for host firewall and
bandwidth manager so makes sense for this logic to be moved out.
Fixes: #15960