Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conn, device, tun: implement vectorized I/O on Linux #64

Closed
wants to merge 1 commit into from

Conversation

jwhited
Copy link
Contributor

@jwhited jwhited commented Dec 7, 2022

This commit changes the tun.Device and conn.Bind interfaces to accept packet vectors for reading and writing. Internal plumbing between these interfaces now passes a vector of packets. Vectors move untouched between these interfaces, i.e. if 128 packets are received from conn.Bind.Read(), 128 packets are passed to tun.Device.Write(). There is no internal buffering.

Platform-specific implementations of tun.Device have been updated to the new tun.Device interface, but only Linux supports passing more than one packet for now. The Linux tun.Device implementation accomplishes this via TSO and GRO, which is made possible by virtio extensions in the TUN driver. Linux TUN offloading can be disabled by setting the WG_DISABLE_OFFLOAD environment variable to 1.

conn.LinuxSocketEndpoint has been deleted in favor of a collapsed conn.StdNetBind. conn.StdNetBind makes use of recvmmsg() and sendmmsg() on Linux. All platforms fall under conn.StdNetBind, except for Windows, which remains in conn.WinRingBind. Sticky sockets support has been refactored as part of this work to eventually be applicable on platforms other than just Linux, however Linux remains the sole platform that fully implements it. The conn.Bind UDP socket buffer is now being sized to 7MB, whereas it was previously inheriting the system default.

device.Keypairs locking has been simplified from a RWMutex + atomic to just a Mutex.

Signed-off-by: Jordan Whited jordan@tailscale.com
Signed-off-by: James Tucker james@tailscale.com
Co-authored-by: James Tucker james@tailscale.com

conn/bind_std.go Outdated Show resolved Hide resolved
@zx2c4
Copy link
Member

zx2c4 commented Dec 7, 2022

Thanks for this very cool approach! Before I jump in, some surfacy questions:

Draft pull requests cannot be merged.

You marked this as a draft. What's missing? What's broken? Should I wait to look at it until it's ready, or is this still worth a read through now?

Linux TUN offloading can be disabled by setting the WG_DISABLE_OFFLOAD environment variable to 1.

Why would somebody want to do this? Can we eliminate whatever that reason is, so that there's one choice that works for everyone, and then not need the env var?

@jwhited jwhited marked this pull request as ready for review December 8, 2022 00:01
@raggi
Copy link
Contributor

raggi commented Dec 8, 2022

Draft pull requests cannot be merged.

You marked this as a draft. What's missing? What's broken? Should I wait to look at it until it's ready, or is this still worth a read through now?

It's ready, it's had extensive testing already, but as we had some new rebasing to do there were some likely changes (which Jordan has pushed already). The code is very stable and ready for review - we're not expecting any major changes from further testing. We are planning to continue doing more testing with this code integrated into tailscaled.

Linux TUN offloading can be disabled by setting the WG_DISABLE_OFFLOAD environment variable to 1.

Why would somebody want to do this? Can we eliminate whatever that reason is, so that there's one choice that works for everyone, and then not need the env var?

WG_DISABLE_OFFLOAD is added out of an abundance of caution. The new feature we rely upon was added in 2.6.27, and has had many minor changes since then, some of which were bugfixes. We've tested a variety of kernels and found that this works well, however we can not test all possible kernel versions and configurations.

We added this flag so that if a user discovers that they're on a buggy kernel, or discovers an edge case we have not been able to, that there is a knob they can use to get functional behavior. Offloads downstream of TUN are extremely lightly used outside of VMMs, we're not actually aware of any public code that makes use of these kernel code paths outside of VMMs.

We could remove the flag, most of the behavior can be removed by changing the option using ethtool, however using the environment variable provided also reduces memory allocation from buffers.

jwhited added a commit to tailscale/tailscale that referenced this pull request Dec 9, 2022
…ard-go

This is temporary while we work to upstream performance work in
WireGuard/wireguard-go#64. A replace directive
is less ideal as it breaks dependent code without duplication of the
directive.

Signed-off-by: Jordan Whited <jordan@tailscale.com>
jwhited added a commit to tailscale/tailscale that referenced this pull request Dec 9, 2022
…ard-go

This is temporary while we work to upstream performance work in
WireGuard/wireguard-go#64. A replace directive
is less ideal as it breaks dependent code without duplication of the
directive.

Signed-off-by: Jordan Whited <jordan@tailscale.com>
jwhited added a commit to tailscale/tailscale that referenced this pull request Dec 9, 2022
…ard-go (#6692)

This is temporary while we work to upstream performance work in
WireGuard/wireguard-go#64. A replace directive
is less ideal as it breaks dependent code without duplication of the
directive.

Signed-off-by: Jordan Whited <jordan@tailscale.com>
endian/little.go Outdated Show resolved Hide resolved
endian/big.go Outdated Show resolved Hide resolved
@KyleSanderson
Copy link

It may be worth mentioning that the in-kernel implementation isn't optimized, and the golang version is the preferred method on Linux. I tried flipping a note to the mailing list a couple months back before the article came out, but it was moderated.

@raggi
Copy link
Contributor

raggi commented Jan 3, 2023

I don't think that's strictly true in all cases. We managed to achieve excellent performance with these patches and that's a great outcome - but for some shapes of CPU kernel is still faster and uses less resources. There are also additional considerations around packet pacing such as in congestion control that also work better with the kernel implementation in many scenarios.

Both are solid implementations but I don't think that one should be preferred without context or condition. Despite being C the kernel version has also probably had more verification work done on it overall, so it also may be considered by some as more secure (though there'd be plenty to debate in general).

In short throughput is important, but it's not the only consideration.

@raggi
Copy link
Contributor

raggi commented Jan 9, 2023

Hi @zx2c4 friendly ping. @jwhited followed up with some changes. This has been working well in our testing, and we're eager to move it into release in Tailscale. Thanks!

conn/sticky_linux.go Outdated Show resolved Hide resolved
@@ -321,6 +321,11 @@ func (bind *WinRingBind) Close() error {
return nil
}

func (bind *WinRingBind) BatchSize() int {
// TODO: implement batching in and out of the ring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably all those '!fixup' commits are intended to be squashed down, and this whole PR made into one big commit. That's fine, so long as it's complete. But it looks like the Windows part still isn't done. So I'm wondering what the intention is with regards to that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the fixups are essentially from a git commit --fixup workflow, there are many ways to cut this pie, but one of the slightly more helpful patterns here is that subsequent reviews can easily look at a diff of the changes made in response to comments, rather than seeking them out of a new singular whole. git rebase --autosquash will flatten these down, and this is the default behavior of "rebase and merge" on Github. It is our intent to have the fixups squashed.

Regarding Windows, we're not planning to include full support in this change. As you can see, we moved the API over for consistency and in anticipation of doing that. We have identified additional changes that need to be made and tested independently in order to get there, and we've started looking more deeply into those details already, but it's worthwhile getting these changes reviewed and landed in the meantime. We certainly don't want to leave Windows behind - but it has it's own ABI and API differences for offloads, and we also likely need to make some changes in upstream repos as well (e.g. to implement ReadBatch/WriteBatch support in x/* (for bindstd)), and for full effect also some changes in Wintun we don't know the full shape of yet. There may be interesting discussions to be had as to whether rio is worthwhile once batching and segment offloading are implemented, but we can have that discussion once we have more concrete change proposals and benchmark results.

Copy link
Member

@zx2c4 zx2c4 Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding Windows, we're not planning to include full support in this change.

Okay. In that case, please separate Linux into its own commit, since Windows (and other platforms) will be a standalone commit too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That requires going back on the prior discussion we had of unifying the IO interface, so we'd have to reintroduce having two different IO APIs (one for a batch of bufs, the other for a single set of bufs). I'm not sure the split would be worthwhile, it makes the interfaces substantially more complex to read and use, and may even reduce performance due to the need for more dynamic interface usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't require that. Rather, in the initial commit, you do for Linux what you did for Windows - batchsize=1 with the new api. Then in the subsequent one, you wireup sendmmsg. Will make bisecting easier too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first commit in the series excludes mmsg and TSO/GRO, batchSize=1 everywhere. d115be4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second commit in the series implements mmsg and TSO/GRO e6bfbdf

conn/bind_windows.go Outdated Show resolved Hide resolved
conn/conn.go Outdated Show resolved Hide resolved
conn/conn.go Outdated Show resolved Hide resolved
conn/conn_test.go Outdated Show resolved Hide resolved
// silently clamp the value to other maximums, such as linux clamping to
// net.core.{r,w}mem_max (see _linux.go for additional implementation that works
// around this limitation)
const socketBufferSize = 7 << 20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be platform defined? Seems odd to justify the use of 7MBs on the grounds of macOS and then apply it everywhere. Or is it the case where every other platform clamps below 7MB and never above, so 7MB represents a useful upperbound?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be platform defined anytime if we decide that's a good idea, if we demonstrate a scenario where it is worthwhile.

This value would ideally actually be tunable, it is essentially the main tuning knob around which you can address BDP. What we observed in testing is that we need a value substantially larger than the rather conservative default of hundreds of kilobytes in scenarios where there is either significant latency between peers, or significant bandwidth available.

The 7mb value while a bit strange courtesy of macOS limit, is not strongly worse than say, splitting this out and picking 8mb for Linux. We'd like to see more feedback from users regarding use cases here, in our testing this makes 100ms latency links perform much better (once whatever is inside also warms up, such as window expansion), as well as removes a significant bottleneck in reaching speeds in the 1-10gbps range. It is possible it is not sufficient for some 25gbps+ use cases, however other bottlenecks in the system need to be resolved first, so adding potentially substantially more memory usage now would be premature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just worry that we'll want to change it at some point and then inadvertently break macOS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just worry that we'll want to change it at some point and then inadvertently break macOS.

Break because we make the value too large and miss the comment? Would adding a darwin test checking the value be sufficient in this case? Happy to break it out per-platform, but maybe one test file is simpler? Or we could do both.

controlFns = append(controlFns,

// Attempt to set the socket buffer size beyond net.core.{r,w}mem_max by
// using SO_*BUFFORCE. This requires CAP_NET_ADMIN, and is allowed here to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you envision tun being available but not SO_*BUFFORCE? Android where it's passed over a unix domain socket?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for better and worse it's becoming a bit chaotic out there what feature sets overlap, largely due to security policy constraints. The two big aspects are Android and container runtimes, with there being much larger variation in configurations in the latter. In Android this call fails due to lack of CAP_NET_ADMIN, though I would have expected it to also be caught by the selinux configuration (as sadly the offload is) 🤷🏼

We might consider reporting the outcome, but to some extent I'm also waiting to see what happens here with the defaults once userspace QUIC deployments become more common. This will come up there once those folks also get through optimizations, perhaps it's time to change the defaults or some of the constraints.

I'd be happy to write some notes about this in a doc somewhere, if we have a suitable doc in a suitable place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to write some notes about this in a doc somewhere, if we have a suitable doc in a suitable place.

The right place for those types of implementation notes is commit messages. It seems like this snippet can be separated out into a standalone thing. If you think there's interesting context and such, just put it in its own commit and stick that in a commit message. (But in case if it's not that interesting and doesn't matter, don't bother - up to you.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The socket options for UDP snd/rcv buffer size have been broken out into their own commit, with these details included in the commit message aeed2ba.

switch network {
case "udp4":
c.Control(func(fd uintptr) {
err = syscall.SetsockoptInt(int(fd), syscall.IPPROTO_IP, syscall.IP_PKTINFO, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applies everywhere, but I just noticed -- sometimes syscall is used and othertimes unix is used. Should we be centralizing around using the unix package instead? At least for Windows code, syscall is obsolete and x/sys/windows is where it's at.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move a bunch over, while it's true the syscall package is mostly frozen, it's also in that regard explicitly more stable, but I don't imagine any of these packages are likely to cause us regressions. As we are going to use things that likely won't arrive in the syscall package it may be more consistent to use the new packages, except where we end up with API/symbol conflicts.

Copy link
Contributor

@raggi raggi Jan 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction: I spoke too soon. There are a bunch of places where we're using syscall because it defines symbols that are portable across unix and windows, for example syscall.EAFNOSUPPORT which in a windows build is equivalent to the windows.WSAEAFNOSUPPORT symbol, but removes the need to bifurcate large volumes of code such as about 50% of bind_std.go.

I could bind the value to a local symbol from three files, but that's just repeating what the syscall package is doing, I think it would be a net loss to cost of maintenance, and definitely adds substantial build time compared with not doing so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let's use unix. where possible, and only use syscall. when absolutely necessary? Or, perhaps a better way of thinking of it is to use unix. in files that are _unix-tagged, and windows. in files that are _windows-tagged, and then syscall. is leftover for the shared files.

Does that seem reasonable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

James applied this in f54aa5a.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like syscall. is still used in the Windows portion of aeed2ba

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in a6db822

// Set SO_RCVBUF/SO_SNDBUF - this could be common with the _unix code except
// for the unfortunate type specificity of syscall.Handle.
func(network, address string, c syscall.RawConn) error {
return c.Control(func(fd uintptr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually do anything on Windows, given that Windows uses WinRIO? It might, I just don't know myself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at least for API users who use StdNetBind, such as Tailscale.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SO_{RCV,SND}BUF seems like a new thing. Can you separate its use out into a different commit with its own justification?

(It can stay in this PR, but just have a separate commit with a message I can read to understand this.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The socket options for UDP snd/rcv buffer size have been broken out into their own commit, with these details included in the commit message aeed2ba.

device/device.go Outdated Show resolved Hide resolved
device/device.go Outdated Show resolved Hide resolved
device/device.go Outdated Show resolved Hide resolved
device/send.go Outdated Show resolved Hide resolved
device/send.go Outdated Show resolved Hide resolved
@nshalman nshalman mentioned this pull request Feb 27, 2023
2 tasks
device/receive.go Outdated Show resolved Hide resolved
device/receive.go Outdated Show resolved Hide resolved
device/receive.go Outdated Show resolved Hide resolved
device/receive.go Outdated Show resolved Hide resolved
tun/netstack/tun.go Outdated Show resolved Hide resolved
tun/tcp_offload_linux.go Outdated Show resolved Hide resolved
tun/tcp_offload_linux.go Outdated Show resolved Hide resolved
tun/tcp_offload_linux.go Outdated Show resolved Hide resolved
tun/tcp_offload_linux.go Outdated Show resolved Hide resolved
tun/tun.go Outdated Show resolved Hide resolved
tun/tun_darwin.go Outdated Show resolved Hide resolved
tun/tun_linux.go Outdated
}

const (
// TODO: support TSO with ECN bits
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the status of this TODO? How important is it? Any plans to do it? Or not really?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be done, but my current thinking is that it's low priority. Operating over a WireGuard tunnel I think it would be pretty rare for ECN to be marked, echoed, and responded w/CWR bit set. The middle boxes don't have visibility to the headers for marking to occur, and WireGuard doesn't leak them to the outer header. In the case of forwarding beyond the tunnel, maybe this becomes more commonplace.

The Linux Kernel netdev features doc says:

NETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit
set, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6).

I don't currently know what properly means. Probably just setting on the first/last packet in the batch? Would have to confirm. Windows docs hints:

If the CWR bit in the TCP header of the large TCP packet is set, the miniport driver must set this bit in the TCP header of the first packet that it creates from the large TCP packet. The miniport driver may choose to set this bit in the TCP header of the last packet that it creates from the large TCP packet, although this is less desirable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth noting that the kernel implementation does leak the ECN bits in accordance with some RFC, which helps CAKE do its thing. This has long been an overdue feature of wireguard-go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. In that case I imagine TSO_ECN support should be added before ECN leak. We can certainly add TSO_ECN support in a followup series. I'm not yet familiar with everything required for the ECN leak.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at PACKET_CB(skb)->ds = ip_tunnel_ecn_encap(0, ip_hdr(skb), skb); in send.c and then at INET_ECN_decapsulate(skb, PACKET_CB(skb)->ds, ip_hdr(skb)->tos); and INET_ECN_decapsulate(skb, PACKET_CB(skb)->ds, ipv6_get_dsfield(ipv6_hdr(skb))); in receive.c. The ip_tunnel_ecn_encap and INET_ECN_decapsulate and ipv6_get_dsfield functions are all worth looking at in their own right.

tun/tun_linux.go Outdated Show resolved Hide resolved
tun/tun_openbsd.go Outdated Show resolved Hide resolved
@zx2c4
Copy link
Member

zx2c4 commented Feb 27, 2023

Again, awesome work on this patchset. Can't wait to merge it. That GRO/TSO code is really cool.

I just took a last pass at it. After addressing those comments, please do proceed to the squashing/resplitting of commits.

@jwhited
Copy link
Contributor Author

jwhited commented Mar 1, 2023

Again, awesome work on this patchset. Can't wait to merge it. That GRO/TSO code is really cool.

I just took a last pass at it. After addressing those comments, please do proceed to the squashing/resplitting of commits.

👍 I'm going to start squashing/resplitting/rebasing. There is one thread outstanding with an open question: #64 (comment). Will include any changes needed there if/when they arise.

@jwhited jwhited force-pushed the tailscale-vector-io branch 2 times, most recently from cdac47d to aeed2ba Compare March 2, 2023 23:46
@jwhited
Copy link
Contributor Author

jwhited commented Mar 3, 2023

@zx2c4 I've squashed, rebased, and split the commits. Ready for final review.

@zx2c4
Copy link
Member

zx2c4 commented Mar 3, 2023

Alright, I've merged those first two commits. So congratulations! That's the bulk of the work!

The last one still has the overlooked syscall->windows thing to fix. But more importantly, I'd like to actually see some justification and reasoning in the commit message for why those sockopts are necessary and what they accomplish and some measurements on the speedup achieved. Right now it just looks like fiddling some knobs.

The conn.Bind UDP sockets' send and receive buffers are now being sized
to 7MB, whereas they were previously inheriting the system defaults.
The system defaults are considerably small and can result in dropped
packets on high speed links. By increasing the size of these buffers we
are able to achieve higher throughput in the aforementioned case.

The iperf3 results below demonstrate the effect of this commit between
two Linux computers with 32-core Xeon Platinum CPUs @ 2.9Ghz. There is
roughly ~125us of round trip latency between them.

The first result is from commit 792b49c which uses the system defaults,
e.g. net.core.{r,w}mem_max = 212992. The TCP retransmits are correlated
with buffer full drops on both sides.

Starting Test: protocol: TCP, 1 streams, 131072 byte blocks
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-10.00  sec  4.74 GBytes  4.08 Gbits/sec  2742   285 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  4.74 GBytes  4.08 Gbits/sec  2742   sender
[  5]   0.00-10.04  sec  4.74 GBytes  4.06 Gbits/sec         receiver

The second result is after increasing SO_{SND,RCV}BUF to 7MB, i.e.
applying this commit.

Starting Test: protocol: TCP, 1 streams, 131072 byte blocks
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-10.00  sec  6.14 GBytes  5.27 Gbits/sec    0   3.15 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  6.14 GBytes  5.27 Gbits/sec    0    sender
[  5]   0.00-10.04  sec  6.14 GBytes  5.25 Gbits/sec         receiver

The specific value of 7MB is chosen as it is the max supported by a
default configuration of macOS. A value greater than 7MB may further
benefit throughput for environments with higher network latency and
lower CPU clocks, but will also increase latency under load
(bufferbloat). Some platforms will silently clamp the value to other
maximums. On Linux, we use SO_{SND,RCV}BUFFORCE in case 7MB is beyond
net.core.{r,w}mem_max.

Co-authored-by: James Tucker <james@tailscale.com>
Signed-off-by: James Tucker <james@tailscale.com>
Signed-off-by: Jordan Whited <jordan@tailscale.com>
@jwhited
Copy link
Contributor Author

jwhited commented Mar 3, 2023

Alright, I've merged those first two commits. So congratulations! That's the bulk of the work!

The last one still has the overlooked syscall->windows thing to fix. But more importantly, I'd like to actually see some justification and reasoning in the commit message for why those sockopts are necessary and what they accomplish and some measurements on the speedup achieved. Right now it just looks like fiddling some knobs.

I've added more details to the commit message in a6db822 along with some measurements.

@zx2c4
Copy link
Member

zx2c4 commented Mar 4, 2023

That looks really good (and you noticed the ordering change of the S-o-b :-P). Thanks for the measurements. Looks like that makes a huge difference. Hopefully a RRUL doesn't suffer too much from it in terms of latency. Committed! So I'll close this PR now, with one outstanding thing I don't yet know what to do about:

Unfortunately, it looks like this isn't actually used on Windows. Because listenConfig() is only used by bind_std.go. So the control functions framework is _std-specific. Should we remove the Windows code because it's not used? Or keep it because it could be used if for some reason you use bind_std on Windows instead of the winrio stuff? Or maybe the Windows code should use the control functions too?

@zx2c4 zx2c4 closed this Mar 4, 2023
@zx2c4
Copy link
Member

zx2c4 commented Mar 4, 2023

Also, I made a small cleanup here: https://git.zx2c4.com/wireguard-go/commit/?id=b6a68cf211aa7b585c8e178e401060a4764ce5db

If you guys have a moment to finish off bind_windows.go, that would be super great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
6 participants