New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wireguard: Add fallback to userspace implementation #17451
Conversation
e88a950
to
b7370f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
I wonder if it might be possible to unit test the functions that setup the server, you could run two servers on one host and simulate traffic between them, what do you think? That would only cover user-space server functionality, and won't cover more desirable setup where one node uses the kernel implementation and another relies on user-space fallback, but at least we could have some tests for this new functionality... |
The reason why I actually started this work is to add WireGuard to the Clustermesh CI 3.0, which runs on VMs which don't have WireGuard support. That CI test will cover the functionality (and more) added by this PR much better than unit tests. Do you think that's a suitable alternative? With regards to compatibility between userspace-mode and kernel-mode: This is something guaranteed by upstream (wireguard-go is developed by same people who are responsible for the kernel-mode). Do you have a particular concern in mind here? After all, there are no parameters for us to configure which could break this compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not directly related to this PR but something that came to my mind while reviewing:
In case the wireguard kernel implementation is built as a module (which seems to be the default configuration and is the case on the few systems I checked): do we check whether the module is loaded before attempting to set up wireguard or would it make sense to add such a check (similar to the way we check for iptables modules being loaded)? Or is it guaranteed to be autoloaded once we create the wgX
interface?
5807ece
to
5f8c332
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
5f8c332
to
2b9bdf5
Compare
/test Regarding my last push: Rebased on master before running CI. Job 'Cilium-PR-K8s-1.16-net-next' failed and has not been observed before, so may be related to your PR: Click to show.Test Name
Failure Output
Edit:
Edit 2: The net-next has been updated and some other workflows disabled. Rerunning tests. |
2b9bdf5
to
b6a1fea
Compare
/test
|
Init used to be called after the WireGuard tunnel IP was allocatd, however this is not the case anymore since commit 0ecab37. Init is still called after the watcher initialization, but it does not require an IP address anymore. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
It should be propagated to the caller, as we exit the function when rp_filter cannot be disabled. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
To aid in troubleshooting when Init() fails. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
The operator does not have any functionality that conditionally depends on WireGuard anymore. Fixes: 0ecab37 ("wireguard: Remove operator") Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Previously, the Init() function did not lock the WireGuard agent under the assumption that there can be no concurrent calls into the agent. That assumption however is false, both Close() (via signal handler cleanup) and UpdatePeer() (via node manager) can be called concurrently while Init() is being invoked. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
b6a1fea
to
290b847
Compare
This commit adds the packages used for the WireGuard userspace implementation added in a subsequent commit. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
WireGuard on Linux usually uses the kernel-mode implementation, as it is faster and does not rely on a userspace process running to process packets. However, some kernels do not ship with WireGuard support, either because they are too old, or because WireGuard support has explicitly compiled out. For such cases, we can fall back to the WireGuard user-space implementation based on a TUN device. While this is not recommended in production, it can be useful for testing or in cases where some machines cannot upgrade to a more recent kernel. The userspace implementation requires a daemon which encrypts and decrypts the packets sent over the TUN device. In Cilium, this task is performed by the cilium-agent process which calls into the wireguard-go library. This has the consequence that if cilium-agent is down, connectivity between WireGuard peers, and therefore connectivity between Cilium-managed pods will also be unavailable. Thus, running WireGuard in userspace mode is not recommeneded for production workloads. The fallback to userspace mode must be explicitly enabled via the `enable-wireguard-userspace-fallback` option. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Mentioning the caveats that apply. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
To enable the fallback to user-space mode implemented in the previous commits. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
290b847
to
718c4f7
Compare
/test Had to rebase due to conflicts. Job 'Cilium-PR-K8s-1.21-kernel-4.9' failed and has not been observed before, so may be related to your PR: Click to show.Test Name
Failure Output
If it is a flake, comment This looks like a failiure with K8s itself, not Cilium. |
test-1.21-4.9 Edit: Given the above flake is unrelated to WireGuard and I would like to get this PR in for the 1.11 feature freeze, I will restart the failed test. |
Marking this ready-to-merge. It has gotten a lot of reviews and is blocking further CI work (#17453). Plus I also would like to get this into Cilium 1.11 before the feature freeze. |
WireGuard on Linux usually uses the kernel-mode implementation, as it is
faster and does not rely on a userspace process running to process
packets.
However, some kernels do not ship with WireGuard support, either because
they are too old, or because WireGuard support has explicitly compiled
out. For such cases, we can fall back to the WireGuard user-space
implementation based on a TUN device. While this is not recommended in
production, it can be useful for testing or in cases where some machines
cannot upgrade to a more recent kernel.
The userspace implementation requires a daemon which encrypts and
decrypts the packets sent over the TUN device. In Cilium, this task is
performed by the cilium-agent process which calls into the wireguard-go
library. This has the consequence that if cilium-agent is down,
connectivity between WireGuard peers, and therefore connectivity between
Cilium-managed pods will also be unavailable. Thus, running WireGuard in
userspace mode is not recommeneded for production workloads.
The fallback to userspace mode must be explicitly enabled via the
enable-wireguard-userspace-fallback
option.Review per commit. The first few commits contain drive-by cleanups.