Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing Crusader plots to Flent #14

Open
richb-hanover opened this issue Jan 11, 2024 · 20 comments
Open

Comparing Crusader plots to Flent #14

richb-hanover opened this issue Jan 11, 2024 · 20 comments

Comments

@richb-hanover
Copy link
Contributor

richb-hanover commented Jan 11, 2024

[Not really a crusader bug report]

In #6 (comment), @dtaht wrote:

And see how weird the up+down test is? What's the link?

Dave: Could you explain what you're seeing in this plot? What do you see there? What did you expect?

Also: And I just created the second plot. Any surprises there? Many thanks.

From #6
image

From richb-hanover test
plot 2024 01 11 07-58-58

@dtaht
Copy link

dtaht commented Jan 12, 2024

It is so cool to be looking at a new plotting mechanism. I do not quite know what I am looking at. The second plot should not be occilating like that in the 2nd and 3rd phases, the first has a puzzling latency change in the first phase. Perhaps some of the behavior could be explained by the tool, others by the underlying link.

What is the aqm on the link? ecn on or off? What is the underlying tech?

For simplicity, a single flow would be easier to look at, rather than 16.

A staggered start test, also of 2 flows.

what does a 16 flow rrul test look like?

@richb-hanover
Copy link
Contributor Author

OK. I'll have to replicate, then run a Flent test...

@richb-hanover richb-hanover changed the title "Weird plots" Comparing Crusader plots to Flent Jan 14, 2024
@richb-hanover
Copy link
Contributor Author

Using the same test configuration as #9 (comment), (Macbook on Wi-Fi to Ubuntu on Ethernet) I ran both Flent and Crusader. I got these plots:

Flent 2.1.1

image

Crusader 0.0.10

image

Crusader settings (default, I think)
image



@dtaht
Copy link

dtaht commented Jan 15, 2024

boy are those two different. Crusader with 4 flows might be directly comparable. But I strongly suspect we have a way to go to make crusader drive the test (s), and it might have some internal bottlenecks, like using green rather than real threads (or in rust parlance, async vs threads)...

Anyway rrul_be (not the rrul test) vs 4 crusader flows should be roughly the same in the third segment of the crusader test. Thanks for calibrating!!!!

@dtaht
Copy link

dtaht commented Mar 6, 2024

I am going to try and ramp up on crusader related testing in the coming months. I still do not understand these results.
Your crusader test in the third panel is observing mere 100ms peaks while the rrul test observes 200+ms peaks. 4 flows compared to 4 flows might be revealing

@richb-hanover
Copy link
Contributor Author

Another data point: I ran both Flent and Crusader between a Mac mini and an Odroid C4. They were connected via Ethernet through the LAN port of a (venerable) WNDR3800 running stock OpenWrt and no VLANs.

The plots look quite similar: both show high speed data with relatively little increase in latency during the test. I also attach the corresponding data files below.

rrul_-_RRUL_to_Odroid-2

plot 2024 03 06 10-59-21

rrul-2024-03-06T105425.795316.RRUL_to_Odroid-2.flent.gz

data 2024.03.06 10-59-26.crr.zip

@dtaht
Copy link

dtaht commented Mar 6, 2024

That is quite promising. I am puzzled by the spikes at t+15.5 seconds. The way crusader works is by multiplexing a few connections through the rust async subsystem (I think), which might be leading to that sort of variability. It does not capture tcp rtt stats natively and I wish I knew enough about rust to make that syscall and plot that.

@mcuee
Copy link

mcuee commented Mar 29, 2024

Server:
Debian 12 VM under Proxmox PVE 8.0 (Intel N100 Mini PC), connected to Asus RT-AX86U router LAN port

Client:
Mac Mini M1 2020 running latest macOS 14.4.1 and up-to-date Homebrew, wireless connection to Asus RT-AX86U router

Flent:

(py310venv_universal) mcuee@mcuees-Mac-mini python % flent rrul -p all_scaled -l 60 -H 192.168.50.80 -t flent_macos -o macos_wireless_asus.png
Starting Flent 2.1.1 using Python 3.10.9.
Starting rrul test. Expected run time: 70 seconds.
Data file written to ./rrul-2024-03-29T193653.612449.flent_macos.flent.gz
Initialised matplotlib v3.8.3 on numpy v1.26.4.
WARNING: Unable to build our own tight layout: 'Figure' object has no attribute '_cachedRenderer'

macos_wireless_asus

Crusader:

(py310venv_universal) mcuee@mcuees-Mac-mini crusader-aarch64-apple-darwin % ./crusader test 192.168.50.80
Connected to server 192.168.50.80:35481
Latency to server 3.52 ms
Testing download...
Testing upload...
Testing both download and upload...
Writing data...
Saved raw data as data 2024.03.29 19-43-07.crr
Saved plot as plot 2024.03.29 19-43-07.png

plot 2024 03 29 19-43-07

iperf3 result for reference:

(py310venv_universal) mcuee@mcuees-Mac-mini python % iperf3 -c 192.168.50.80 -R
Connecting to host 192.168.50.80, port 5201
Reverse mode, remote host 192.168.50.80 is sending
[  5] local 192.168.50.29 port 49984 connected to 192.168.50.80 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  68.0 MBytes   568 Mbits/sec                  
[  5]   1.00-2.00   sec  87.0 MBytes   733 Mbits/sec                  
[  5]   2.00-3.00   sec  84.5 MBytes   706 Mbits/sec                  
[  5]   3.00-4.00   sec  88.0 MBytes   738 Mbits/sec                  
[  5]   4.00-5.01   sec  86.6 MBytes   726 Mbits/sec                  
[  5]   5.01-6.00   sec  87.9 MBytes   737 Mbits/sec                  
[  5]   6.00-7.00   sec  86.4 MBytes   726 Mbits/sec                  
[  5]   7.00-8.00   sec  84.9 MBytes   712 Mbits/sec                  
[  5]   8.00-9.00   sec  88.8 MBytes   745 Mbits/sec                  
[  5]   9.00-10.00  sec  81.2 MBytes   681 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.01  sec   846 MBytes   709 Mbits/sec  114             sender
[  5]   0.00-10.00  sec   843 MBytes   707 Mbits/sec                  receiver

iperf Done.
(py310venv_universal) mcuee@mcuees-Mac-mini python % iperf3 -c 192.168.50.80   
Connecting to host 192.168.50.80, port 5201
[  5] local 192.168.50.29 port 49986 connected to 192.168.50.80 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  55.9 MBytes   468 Mbits/sec                  
[  5]   1.00-2.00   sec  51.8 MBytes   433 Mbits/sec                  
[  5]   2.00-3.00   sec  57.1 MBytes   480 Mbits/sec                  
[  5]   3.00-4.00   sec  70.5 MBytes   593 Mbits/sec                  
[  5]   4.00-5.00   sec  72.4 MBytes   606 Mbits/sec                  
[  5]   5.00-6.00   sec  73.8 MBytes   616 Mbits/sec                  
[  5]   6.00-7.00   sec  78.4 MBytes   660 Mbits/sec                  
[  5]   7.00-8.00   sec  75.2 MBytes   629 Mbits/sec                  
[  5]   8.00-9.00   sec  76.8 MBytes   643 Mbits/sec                  
[  5]   9.00-10.00  sec  56.0 MBytes   471 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec   668 MBytes   560 Mbits/sec                  sender
[  5]   0.00-10.02  sec   665 MBytes   557 Mbits/sec                  receiver

iperf Done.

@mcuee
Copy link

mcuee commented Mar 29, 2024

@dtaht

BTW, just wondering if you can help fixing singpore.starlink.taht.net. Thanks. I am located in Singapore.
https://blog.cerowrt.org/post/flent_fleet/

mcuee@debian12vmn100new:~/build$ ping -4 -c 4 singpore.starlink.taht.net
ping: singpore.starlink.taht.net: Name or service not known
mcuee@debian12vmn100new:~/build$ ping -6 -c 4 singpore.starlink.taht.net
ping: singpore.starlink.taht.net: Name or service not known

@mcuee
Copy link

mcuee commented Mar 30, 2024

@richb-hanover

It seems to me that the server netperf.bufferbloat.net (also called netperf-east.bufferbloat.net) has been down for quite a while.
https://flent.org/intro.html#quick-start
https://blog.cerowrt.org/post/flent_fleet/

Just wondering if it is possible to revive the server, or probably update the flent.org website.

@mcuee
Copy link

mcuee commented Mar 30, 2024

@dtaht
Just wondering if it is possible to host some test crusader server along with flent as well. Thanks.

@mcuee
Copy link

mcuee commented Mar 30, 2024

Using my own Crusader server over the internet to countercheck Waveform.com test results..

May not be a good example but Crusader is much better than Waveform.com since the Speed is on the low side and I will have doubts about the validity of the result.

Test server: Ubuntu 22.04 LxC container on an Intel N100 Mini PC running Proxmox PVE 8.0 (quad 2.5G ports), The Mini PC is connected to Asus RT-AX86U router 2.5G LAN port. 1Gbps Fibre Internet.

Test client: Acer Windows 11 laptop and Ugreen USB 3 to 2.5G adapter, connected to OpenWRT virtual router 2.5G LAN port.

BTW, I have not been able to test flent using my own server over the internet yet. My two home networks actually share the same upstream GPON ONT so that I can only test upload or download and not both, if I need to test over the internet.

  1. Without SQM it is already good.
    plot 2024 03 30 20-19-11
    plot 2024 03 30 20-20-24

Waveform.com bufferbloat test result: A
https://www.waveform.com/tools/bufferbloat?test-id=a1968217-f78a-456c-acae-217bd38ed00e

  1. With SQM (Cake, 1Gbps download limit, 200Mbps upload limit)
    plot 2024 03 30 20-17-38
    plot 2024 03 30 20-18-13

Waveform.com bufferbloat test result: A+
https://www.waveform.com/tools/bufferbloat?test-id=da64d035-23be-4e6a-a54e-39d0d9b28d47

OpenWRT 23.05 SQM settings:
Queueing discipline: cake
Queue setup script: piece_of_cake.qos
Screenshot 2024-03-30 195433

@mcuee
Copy link

mcuee commented Mar 30, 2024

Crusader vs Flent (internal OpenWRT WAN side server and LAN side client). You can see that crusader seems to be able to catch up with virtual network adapter (10Gbps) whereas flent can not cope.

Server: Ubuntu 22.04 LXC container (192.168.50.15) on Intel N100 mini PC running Proxmox PVE 8.0
Client: Ubuntu 22.04 VM (192.168.48.9) on Intel N100 mini PC running Proxmox PVE 8.0
OpenWRT 23.05 virtual router on Intel N100 mini PC running Proxmox PVE 8.0
OpenWRT 23.05 virtual router WAN -- 192.168.50.134
OpenWRT 23.05 virtual rotuer LAN -- 192.168.48.1
No SQM/QoS settings enabled.

mcuee@ubuntu2204vmbr0:~/build/crusader$ ./crusader test 192.168.50.15
Connected to server 192.168.50.15:35481
Latency to server 0.42 ms
Testing download...
Testing upload...
Testing both download and upload...
Writing data...
Saved raw data as data 2024.03.30 21-03-01.crr

mcuee@ubuntu2204vmbr0:~/build/crusader$ flent rrul -p all_scaled -l 60 -H 192.168.50.15 -t openwrt_lan_client_wan_server_flent -o openwrt_flent_wan_lan.png 
Starting Flent 2.0.1 using Python 3.10.12.
Starting rrul test. Expected run time: 70 seconds.
Data file written to ./rrul-2024-03-30T210556.265357.openwrt_lan_client_wan_server_flent.flent.gz
Initialised matplotlib v3.5.1 on numpy v1.21.5.

openwrt_flent_wan_lan
plot 2024 03 30 21-03-01

@richb-hanover
Copy link
Contributor Author

richb-hanover commented Mar 30, 2024

It seems to me that the server netperf.bufferbloat.net (also called netperf-east.bufferbloat.net) has been down for quite a while.

Yes. I have been stymied by heavy abuse of the server. In addition to legitimate researchers or occasional users, I see people running a speed test every five minutes, 24x7.

I created a bunch of scripts to review the netperf server logs and use iptables to shut off people who abuse the server. Even with those scripts running, I have been unable to keep the traffic sent/received below the 4TB/month cap at my VPS.

I'm feeling pretty discouraged... I am going to ask about this on the Bloat mailing list to see if anyone has ideas. Thanks.

@mcuee
Copy link

mcuee commented Apr 1, 2024

I am going to ask about this on the Bloat mailing list to see if anyone has ideas. Thanks.

Reference discussion here:
https://lists.bufferbloat.net/pipermail/bloat/2024-March/017987.html

@mcuee
Copy link

mcuee commented Apr 1, 2024

@Zoxc

I tend to think I find a good use of crusader here. Still just wondering if you have some ideas how to test the effectiveness of cake-autorate better. Thanks.

@mcuee
Copy link

mcuee commented Apr 4, 2024

In the end cake-autorate is not suitable for my use case. But still crusader proves to be a good tool during the testing.

@dtaht
Copy link

dtaht commented Jul 30, 2024

I note that I am very behind on github...

@Zoxc
Copy link
Owner

Zoxc commented Aug 7, 2024

The way crusader works is by multiplexing a few connections through the rust async subsystem (I think), which might be leading to that sort of variability.

That could probably be tested by running a separate crusader client and server instance only measuring latency and see if it reproduces that.

@mcuee
Copy link

mcuee commented Oct 1, 2024

The way crusader works is by multiplexing a few connections through the rust async subsystem (I think), which might be leading to that sort of variability.

That could probably be tested by running a separate crusader client and server instance only measuring latency and see if it reproduces that.

Looks like this has been implemented, at least for the GUI version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants