Mainstream some day? #1

CodeFetch · 2021-11-16T13:33:44Z

Hi! I've been fiddling with VPNs for some years now. Did you write this module due to performance reasons? Shall this be a mainstream module some day?

We had performance issues with OpenVPN some years back due to context-switching of the TUN/TAP devices having a syscall for every packet read op. An optimized userspace clone of OpenVPN was therefore created called fastd. It is faster than OpenVPN. But we still had the context switch bottleneck. Therefore I wrote a hacky patch for fastd to utilize io_uring. With that the context switch bottleneck is gone.
Maybe you could adopt this for the OpenVPN userspace version to reach a performance similar to the kernel module:
CodeFetch/fastd@059cdf7

The text was updated successfully, but these errors were encountered:

ordex · 2021-11-16T13:47:26Z

Hi @CodeFetch and thanks for your message.
To answer your questions: yes, the idea is to take this kernel module upstream once it reaches reasonable maturity.
We are currently working on supporting it in OpenVPN2, so that it can get to a larger user base.

The main reason is definitely performance: with this kernel module we are basically moving the whole data plane (not the control plane!) in kernelspace, similarly to other device drivers or tunnel implementations. On the other hand, it allows us to greatly simplify the linux part of the userspace implementation as it doesn't need to handle user data anymore.

io_uring sounds interesting, but at the moment it doesn't go towards the direction we are taking (as we are moving the data plane in kernel directly). Therefore it wouldn't be meaningful to allocate energy on that side right now.

Still, if you want to play with it and work on a PoC (that may result in clean patches) for OpenVPN, please do.

dsommers · 2021-11-16T19:04:11Z

This approach is interesting for platforms which will not or cannot support our DCO kernel module. I would expect DCO still to be faster (as the network packets will still have the fastest path between the physical network interface and the virtual one, without any context switching at all).

But a faster user-space implementation with io_uring might be useful on lower-end routers where getting ovpn-dco running being too difficult, or in setups where the user insists on using a not recommended non-GCM based ciphers or compression or other protocol features not available in the ovpn-dco module.

Is io_uring supported on other platforms than Linux? I believe I read something about Jens Axboe being involved, who is a Linux kernel developer.

CodeFetch · 2021-11-16T19:17:29Z

@dsommers io_uring is only supported by Linux from what I know. It allows you to define kind of ring buffers on which a kernel thread works on. This allows networking in userspace with almost no context switches. Together with MSG_ZEROCOPY this allows performance equal to running natively in the kernel minus the skb_copy operation for sending packets. As most ressource-limited devices which would profit from that are practically CPEs with higher downstream bandwidth usage, this would not hurt much I guess. Tests have showed that raw encryption performance in userspace is near what a kernel thread could achieve e.g. using WireGuard.

dsommers · 2021-11-16T20:01:28Z

Thanks! So, then the advantage of ovpn-dco will basically be that it can utilize all the CPU cores on the data plane. OpenVPN 2.x is (still!) single-threaded and will therefor hit some limitations on the server side when more clients are connected, and I expect io_uring in an OpenVPN implementation to also be limited in that regards. However, for server sides with only one client connected, the performance difference might not be that big.

huangya90 · 2021-11-17T00:51:18Z

@CodeFetch Would you help to share some performance numbers to prove advantage of io_uring?

CodeFetch · 2021-11-17T10:47:24Z

@huangya90 I don't have any statistics on that anymore, but it was a constant 20-30% increase in throughput on an Intel 4820K. Single threaded... I have looked at it with oprofile. The recvmsg sendmsg syscalls are gone which accounted for 20-30% previously. So that matches.

CodeFetch · 2021-11-17T10:53:06Z

@huangya90 Keep in mind there is more potential for improvement. As far as I know OpenVPN does not only lack multithreading support, but it doesn't have a buffer pool either and isn't optimized for cache hotness.

huangya90 · 2021-11-17T10:56:35Z

I don't have any statistics on that anymore, but it was a constant 20-30% increase in throughput on an Intel 4820K. Single threaded... I have looked at it with oprofile. The recvmsg sendmsg syscalls are gone which accounted for 20-30% previously. So that matches.

If so, speeding up in the kernel space is much better. Please refer to performance numbers [1]of ovpn-dco tested before.

[1] https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg21584.html

CodeFetch · 2021-11-17T20:29:17Z

@huangya90 That's the gain of fastd which is better optimized than OpenVPN. The numbers don't look comparable to me. I guess he used a multicore processor and there must have been AES hardware acceleration as ChaCha should be faster.

cron2 · 2021-11-17T21:11:25Z

Hi,

On Wed, Nov 17, 2021 at 12:29:28PM -0800, Vincent Wiemann wrote: @huangya90 That's the gain of fastd which is better optimized than OpenVPN. The numbers don't look comparable to me. I guess he used a multicore processor and there must have been AES hardware acceleration as ChaCha should be faster.

The goal of DCO is, of course, to use all CPU threads. On processors where AES-NI is available, it is a waste of energy to not go for AES :-) - on ARM, Chacha is more efficient. DCO can do both. gert -- "If was one thing all people took for granted, was conviction that if you feed honest figures into a computer, honest figures come out. Never doubted it myself till I met a computer with a sense of humor." Robert A. Heinlein, The Moon is a Harsh Mistress Gert Doering - Munich, Germany ***@***.***

CodeFetch · 2021-11-17T22:38:21Z

@cron2 Yes, but it's like comparing apples with Microsoft. Using CPU threads for all cores is possible with the userspace implementation, too, but not implemented in OpenVPN. ovpn-dco will definitely perform better than a userspace version, but not that much without hardware encryption on a single thread machine.

BTW does ovpn-dco also support layer 2 tunnels?

andywangevertz · 2021-11-27T18:02:07Z

@CodeFetch We are using openvpn on level 2 device (tap0) and the process looks like below (RX side)
packets -> kernel -> openvpn(decrypt) -> kernel -> tap0 -> Application

As you can see, it will go into and come out of kernel twice.. I am not sure what fastd does on the level 2 device/tunnel, do you think that io_uring could also benefit on openvpn userspace application (like 20%-30%)? I would like to give it a try and would like hear some advice for io_uring.

Thanks!

CodeFetch · 2021-11-28T15:27:33Z

@andywangevertz Indeed. You will save the syscall for the freads/fwrites of the TAP device. TAP devices can ordinarily only accept one packet at a time per syscall
This is a lot of context-switching. With io_uring you lift that limitation. With it you can from my experience send/receive about 64 packets per syscall. Still it's slower than e.g. WireGuard, because WireGuard has multithreading support and does not need to copy the packets from userspace memory to kernel memory and vice-versa, but the skb_copy for that is not the performance killer. I guess with multithreading support or on single-core devices or with multiple OpenVPN instances you could reach almost 90% of WireGuard's throughput.

ordex · 2021-12-09T07:59:04Z

I am closing this, but for further discussions, please reach out to the openvpn-devel mailing list, where a broader audience will be able to join the conversation.
Cheers!

ordex closed this as completed Dec 9, 2021

bmeirellesRJ mentioned this issue Jan 12, 2023

Error in server mode OpenVPN/openvpn#218

Open

bernhardschmidt mentioned this issue Jan 5, 2023

Backtrace on modprobe... #10

Closed

sebidotorg mentioned this issue Apr 5, 2023

ovpn-dco: kernel errors about use after free, memleak on Linux 6.1.12 #23

Closed

xevilstar mentioned this issue Aug 11, 2023

dkms: autoinstall for kernel: 6.5.0-rc5 failed! #42

Closed

kEu3lWPoZ8XRv1O3hEt5xSQQJM0TvH2k mentioned this issue Apr 23, 2024

netlink reports object not found #50

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mainstream some day? #1

Mainstream some day? #1

CodeFetch commented Nov 16, 2021

ordex commented Nov 16, 2021 •

edited

Loading

dsommers commented Nov 16, 2021

CodeFetch commented Nov 16, 2021

dsommers commented Nov 16, 2021 •

edited

Loading

huangya90 commented Nov 17, 2021

CodeFetch commented Nov 17, 2021

CodeFetch commented Nov 17, 2021

huangya90 commented Nov 17, 2021 •

edited

Loading

CodeFetch commented Nov 17, 2021

cron2 commented Nov 17, 2021 via email

CodeFetch commented Nov 17, 2021

andywangevertz commented Nov 27, 2021

CodeFetch commented Nov 28, 2021

ordex commented Dec 9, 2021

Mainstream some day? #1

Mainstream some day? #1

Comments

CodeFetch commented Nov 16, 2021

ordex commented Nov 16, 2021 • edited Loading

dsommers commented Nov 16, 2021

CodeFetch commented Nov 16, 2021

dsommers commented Nov 16, 2021 • edited Loading

huangya90 commented Nov 17, 2021

CodeFetch commented Nov 17, 2021

CodeFetch commented Nov 17, 2021

huangya90 commented Nov 17, 2021 • edited Loading

CodeFetch commented Nov 17, 2021

cron2 commented Nov 17, 2021 via email

CodeFetch commented Nov 17, 2021

andywangevertz commented Nov 27, 2021

CodeFetch commented Nov 28, 2021

ordex commented Dec 9, 2021

ordex commented Nov 16, 2021 •

edited

Loading

dsommers commented Nov 16, 2021 •

edited

Loading

huangya90 commented Nov 17, 2021 •

edited

Loading