Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upSlow inter-vm network speed, turning on scatter gather helps to a degree #3510
Comments
andrewdavidwong
added
bug
C: other
labels
Feb 1, 2018
andrewdavidwong
added this to the Release 3.2 updates milestone
Feb 1, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ij1
Feb 2, 2018
Also, netvm -> proxyvm direction seems fine with 20Gbps+ after tx checksumming is enabled for the vif.
This leaves only proxyvm -> netvm direction affected by this issue.
Curiously enough, there's significant difference in RTT depending on which direction the data is being transmitted (3x to 10x larger RTT when netvm is the sink).
ij1
commented
Feb 2, 2018
•
|
Also, netvm -> proxyvm direction seems fine with 20Gbps+ after tx checksumming is enabled for the vif. Curiously enough, there's significant difference in RTT depending on which direction the data is being transmitted (3x to 10x larger RTT when netvm is the sink). |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Feb 5, 2018
Member
For reference:
SG is disabled here and here. It was introduced by this commit, back in 2011, with comment "Apparently vif frontend has broken sg implementation". Maybe it is no longer the case. Try enabling SG on eth0 in both proxyvm and appvm.
As for tx checksumming, see here - worth checking if it doesn't break HVMs using emulated devices (Windows, or Linux with disabled PV drivers disabled).
Also, netvm -> proxyvm direction seems fine with 20Gbps+ after tx checksumming is enabled for the vif.
This leaves only proxyvm -> netvm direction affected by this issue.
Have you enabled SG on eth0 in proxyvm?
|
For reference:
Have you enabled SG on eth0 in proxyvm? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ij1
Feb 5, 2018
Thanks for taking a look.
Have you enabled SG on eth0 in proxyvm?
Yes, SG was enabled for that eth0 and it increases throughput somewhat (roughly from 1.5-2Gbps to 4Gbps) but nowhere near what the opposite direction towards the proxyvm is capable of.
In addition, I've also tried to make sure that there is no TCP receiver window limitation (that the sender might hit due to the extra delay that seems to occur for some reason for that direction) but that just translated to a higher delay and not an increase in the throughput.
Is there perhaps something that causes different behavior in kernel and/or xen related e.g., to the fact that any PCI device is assigned there (that is only done for the netvm in this setup) even if the device used in the intervm test itself is just a virtual one? ...Given that there seems to be some extra delay, something related e.g., to IRQ delivery would seem like a good candidate.
I can later take a look on tx checksumming and HVM to see if I can find some breakage still with it.
ij1
commented
Feb 5, 2018
|
Thanks for taking a look.
Yes, SG was enabled for that eth0 and it increases throughput somewhat (roughly from 1.5-2Gbps to 4Gbps) but nowhere near what the opposite direction towards the proxyvm is capable of. In addition, I've also tried to make sure that there is no TCP receiver window limitation (that the sender might hit due to the extra delay that seems to occur for some reason for that direction) but that just translated to a higher delay and not an increase in the throughput. Is there perhaps something that causes different behavior in kernel and/or xen related e.g., to the fact that any PCI device is assigned there (that is only done for the netvm in this setup) even if the device used in the intervm test itself is just a virtual one? ...Given that there seems to be some extra delay, something related e.g., to IRQ delivery would seem like a good candidate. I can later take a look on tx checksumming and HVM to see if I can find some breakage still with it. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Feb 5, 2018
Member
Is there perhaps something that causes different behavior in kernel and/or xen related e.g., to the fact that any PCI device is assigned there (that is only done for the netvm in this setup) even if the device used in the intervm test itself is just a virtual one?
If we're talking about R3.2, there should be no difference. But in 4.0 there is - VMs with PCI devices are HVM, others are PVH.
If we're talking about R3.2, there should be no difference. But in 4.0 there is - VMs with PCI devices are HVM, others are PVH. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ij1
commented
Feb 5, 2018
R3.2, yes. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ij1
Feb 10, 2018
I'm quite sure now that it has something to do with any PCI device being passed into the netvm.
I created a dummy netvm and set it as the netvm for the proxyvm. When the dummy netvm had no PCI devices, the performance is good (at least 14Gbps+). But then I put a PCI device there and I get only roughly 4Gbps through (the PCI device was not even a network device).
For the other direction traffic (netvm->proxyvm), having a PCI device in the proxyvm does not seem to affect performance.
ij1
commented
Feb 10, 2018
•
|
I'm quite sure now that it has something to do with any PCI device being passed into the netvm. I created a dummy netvm and set it as the netvm for the proxyvm. When the dummy netvm had no PCI devices, the performance is good (at least 14Gbps+). But then I put a PCI device there and I get only roughly 4Gbps through (the PCI device was not even a network device). For the other direction traffic (netvm->proxyvm), having a PCI device in the proxyvm does not seem to affect performance. |
jarlethorsen commentedJan 31, 2018
•
edited
Edited 1 time
-
jarlethorsen
edited Feb 2, 2018 (most recent)
Qubes OS version:
3.2
Affected TemplateVMs:
Fedora 26
Steps to reproduce the behavior:
Have a 10Gbe network card in netvm. Run "iperf -c <external_serverip>"
Watch result.
Then do the same in appvm connected to netvm.
Watch result.
Expected behavior:
Running the same iperf command in both netvm and appvm connected to it should show similar performance.
Actual behavior:
Performance in netvm is a expected (8+ Gbit/s), while appvm is really slow (1.43Gbit/s)
The ksoftirqd process in netvm is showing a very high cpu-load when iperf is run in appvm.
General notes:
Enabling Scatter Gather on vif* in netvm, and eth0 in appvm ("sudo ethtool -K eth0 sg on") has a huge performance boost. iperf in appvm can now send 4+ Gbit/s, but still far from 10 Gbit/s
Related thread: https://groups.google.com/forum/#!topic/qubes-users/RZVgpndzmow
UPDATE:
Using the following setup:
appvm<->proxyvm<->netvm
I easily get 19Gbit/s from appvm to proxyvm (if SG is enabled in appvm), but I only get 3-4Gbit/s from proxyvm to netvm. Guess this is something to look into for the developers?