[tun] implement basic multiqueue tun interface #46

dpeckett · 2025-05-15T09:03:09Z

Still need to implement all the available offloading features, we're probably leaving quite a bit of performance on the table. If we are careful we might be able to crack an aggregate of around 20gbit/s but each flow will individually limited to ~2-3gbit/s.

Perf testing of two TUN interfaces on a c7g.metal ec2 box. peaks at around 15gbit/s

$ iperf3 -V -C cubic -c fd00::1 -t 10 -P 32
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.03 GBytes   881 Mbits/sec  38609            sender
[  5]   0.00-10.00  sec  1.02 GBytes   880 Mbits/sec                  receiver
[  7]   0.00-10.00  sec  1.06 GBytes   912 Mbits/sec  42119            sender
[  7]   0.00-10.00  sec  1.06 GBytes   910 Mbits/sec                  receiver
[  9]   0.00-10.00  sec  1.06 GBytes   914 Mbits/sec  41085            sender
[  9]   0.00-10.00  sec  1.06 GBytes   914 Mbits/sec                  receiver
[ 11]   0.00-10.00  sec  1.08 GBytes   930 Mbits/sec  41365            sender
[ 11]   0.00-10.00  sec  1.08 GBytes   930 Mbits/sec                  receiver
[ 13]   0.00-10.00  sec  1.07 GBytes   923 Mbits/sec  41974            sender
[ 13]   0.00-10.00  sec  1.07 GBytes   922 Mbits/sec                  receiver
[ 15]   0.00-10.00  sec  1.08 GBytes   931 Mbits/sec  41364            sender
[ 15]   0.00-10.00  sec  1.08 GBytes   930 Mbits/sec                  receiver
[ 17]   0.00-10.00  sec  1.09 GBytes   936 Mbits/sec  41629            sender
[ 17]   0.00-10.00  sec  1.09 GBytes   935 Mbits/sec                  receiver
[ 19]   0.00-10.00  sec  1.06 GBytes   908 Mbits/sec  40729            sender
[ 19]   0.00-10.00  sec  1.06 GBytes   907 Mbits/sec                  receiver
[ 21]   0.00-10.00  sec  1016 MBytes   852 Mbits/sec  39958            sender
[ 21]   0.00-10.00  sec  1015 MBytes   851 Mbits/sec                  receiver
[ 23]   0.00-10.00  sec  1.04 GBytes   897 Mbits/sec  40635            sender
[ 23]   0.00-10.00  sec  1.04 GBytes   896 Mbits/sec                  receiver
[ 25]   0.00-10.00  sec  1001 MBytes   840 Mbits/sec  37889            sender
[ 25]   0.00-10.00  sec  1000 MBytes   839 Mbits/sec                  receiver
[ 27]   0.00-10.00  sec  1.08 GBytes   932 Mbits/sec  41773            sender
[ 27]   0.00-10.00  sec  1.08 GBytes   931 Mbits/sec                  receiver
[ 29]   0.00-10.00  sec  1.03 GBytes   887 Mbits/sec  40003            sender
[ 29]   0.00-10.00  sec  1.03 GBytes   887 Mbits/sec                  receiver
[ 31]   0.00-10.00  sec  1.09 GBytes   933 Mbits/sec  42185            sender
[ 31]   0.00-10.00  sec  1.09 GBytes   932 Mbits/sec                  receiver
[ 33]   0.00-10.00  sec  1.06 GBytes   907 Mbits/sec  40702            sender
[ 33]   0.00-10.00  sec  1.06 GBytes   907 Mbits/sec                  receiver
[ 35]   0.00-10.00  sec   984 MBytes   825 Mbits/sec  37595            sender
[ 35]   0.00-10.00  sec   983 MBytes   824 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec  16.8 GBytes  14.4 Gbits/sec  649614             sender
[SUM]   0.00-10.00  sec  16.8 GBytes  14.4 Gbits/sec                  receiver
CPU Utilization: local/sender 38.6% (0.6%u/38.0%s), remote/receiver 227.3% (4.2%u/223.1%s)

We're probably being in part limited by the single queue connection.Pipe() implementation, notably slower on the graviton than my personal laptop.

=== RUN   TestPipeThroughput
    pipe_test.go:56: [+1s] Throughput: 23.08 Gbps
    pipe_test.go:57: [+1s] Packets: 2254273
    pipe_test.go:56: [+2s] Throughput: 23.15 Gbps
    pipe_test.go:57: [+2s] Packets: 4515376
    pipe_test.go:56: [+3s] Throughput: 22.67 Gbps
    pipe_test.go:57: [+3s] Packets: 6728834
    pipe_test.go:56: [+4s] Throughput: 22.68 Gbps
    pipe_test.go:57: [+4s] Packets: 8943763
    pipe_test.go:73: Total Throughput: 22.89 Gbps
--- PASS: TestPipeThroughput (4.47s)
PASS

Also I'd try to avoid TUNSETSTEERINGEBPF, if we can get the application to use the same sending queue for each flow we don't need to worry too much about steering. Might get a little tricky but doing this in userspace might be doable?

[tun] implement basic multiqueue tun interface

e661b10

dpeckett requested a review from dilyevsky May 15, 2025 09:08

dilyevsky approved these changes May 15, 2025

View reviewed changes

dilyevsky merged commit 9676901 into main May 15, 2025
1 check passed

dilyevsky deleted the dpeckett/multiqueue-tun branch May 15, 2025 21:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[tun] implement basic multiqueue tun interface #46

[tun] implement basic multiqueue tun interface #46

Uh oh!

dpeckett commented May 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[tun] implement basic multiqueue tun interface #46

[tun] implement basic multiqueue tun interface #46

Uh oh!

Conversation

dpeckett commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dpeckett commented May 15, 2025 •

edited

Loading