New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow inter-vm network speed, turning on scatter gather helps to a degree #3510

Open
jarlethorsen opened this Issue Jan 31, 2018 · 6 comments

Comments

Projects
None yet
4 participants
@jarlethorsen

jarlethorsen commented Jan 31, 2018

Qubes OS version:

3.2

Affected TemplateVMs:

Fedora 26


Steps to reproduce the behavior:

Have a 10Gbe network card in netvm. Run "iperf -c <external_serverip>"
Watch result.

Then do the same in appvm connected to netvm.
Watch result.

Expected behavior:

Running the same iperf command in both netvm and appvm connected to it should show similar performance.

Actual behavior:

Performance in netvm is a expected (8+ Gbit/s), while appvm is really slow (1.43Gbit/s)
The ksoftirqd process in netvm is showing a very high cpu-load when iperf is run in appvm.

General notes:

Enabling Scatter Gather on vif* in netvm, and eth0 in appvm ("sudo ethtool -K eth0 sg on") has a huge performance boost. iperf in appvm can now send 4+ Gbit/s, but still far from 10 Gbit/s

Related thread: https://groups.google.com/forum/#!topic/qubes-users/RZVgpndzmow

UPDATE:

Using the following setup:
appvm<->proxyvm<->netvm

I easily get 19Gbit/s from appvm to proxyvm (if SG is enabled in appvm), but I only get 3-4Gbit/s from proxyvm to netvm. Guess this is something to look into for the developers?

@ij1

This comment has been minimized.

Show comment
Hide comment
@ij1

ij1 Feb 2, 2018

Also, netvm -> proxyvm direction seems fine with 20Gbps+ after tx checksumming is enabled for the vif.
This leaves only proxyvm -> netvm direction affected by this issue.

Curiously enough, there's significant difference in RTT depending on which direction the data is being transmitted (3x to 10x larger RTT when netvm is the sink).

ij1 commented Feb 2, 2018

Also, netvm -> proxyvm direction seems fine with 20Gbps+ after tx checksumming is enabled for the vif.
This leaves only proxyvm -> netvm direction affected by this issue.

Curiously enough, there's significant difference in RTT depending on which direction the data is being transmitted (3x to 10x larger RTT when netvm is the sink).

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Feb 5, 2018

Member

For reference:
SG is disabled here and here. It was introduced by this commit, back in 2011, with comment "Apparently vif frontend has broken sg implementation". Maybe it is no longer the case. Try enabling SG on eth0 in both proxyvm and appvm.
As for tx checksumming, see here - worth checking if it doesn't break HVMs using emulated devices (Windows, or Linux with disabled PV drivers disabled).

Also, netvm -> proxyvm direction seems fine with 20Gbps+ after tx checksumming is enabled for the vif.
This leaves only proxyvm -> netvm direction affected by this issue.

Have you enabled SG on eth0 in proxyvm?

Member

marmarek commented Feb 5, 2018

For reference:
SG is disabled here and here. It was introduced by this commit, back in 2011, with comment "Apparently vif frontend has broken sg implementation". Maybe it is no longer the case. Try enabling SG on eth0 in both proxyvm and appvm.
As for tx checksumming, see here - worth checking if it doesn't break HVMs using emulated devices (Windows, or Linux with disabled PV drivers disabled).

Also, netvm -> proxyvm direction seems fine with 20Gbps+ after tx checksumming is enabled for the vif.
This leaves only proxyvm -> netvm direction affected by this issue.

Have you enabled SG on eth0 in proxyvm?

@ij1

This comment has been minimized.

Show comment
Hide comment
@ij1

ij1 Feb 5, 2018

Thanks for taking a look.

Have you enabled SG on eth0 in proxyvm?

Yes, SG was enabled for that eth0 and it increases throughput somewhat (roughly from 1.5-2Gbps to 4Gbps) but nowhere near what the opposite direction towards the proxyvm is capable of.

In addition, I've also tried to make sure that there is no TCP receiver window limitation (that the sender might hit due to the extra delay that seems to occur for some reason for that direction) but that just translated to a higher delay and not an increase in the throughput.

Is there perhaps something that causes different behavior in kernel and/or xen related e.g., to the fact that any PCI device is assigned there (that is only done for the netvm in this setup) even if the device used in the intervm test itself is just a virtual one? ...Given that there seems to be some extra delay, something related e.g., to IRQ delivery would seem like a good candidate.

I can later take a look on tx checksumming and HVM to see if I can find some breakage still with it.

ij1 commented Feb 5, 2018

Thanks for taking a look.

Have you enabled SG on eth0 in proxyvm?

Yes, SG was enabled for that eth0 and it increases throughput somewhat (roughly from 1.5-2Gbps to 4Gbps) but nowhere near what the opposite direction towards the proxyvm is capable of.

In addition, I've also tried to make sure that there is no TCP receiver window limitation (that the sender might hit due to the extra delay that seems to occur for some reason for that direction) but that just translated to a higher delay and not an increase in the throughput.

Is there perhaps something that causes different behavior in kernel and/or xen related e.g., to the fact that any PCI device is assigned there (that is only done for the netvm in this setup) even if the device used in the intervm test itself is just a virtual one? ...Given that there seems to be some extra delay, something related e.g., to IRQ delivery would seem like a good candidate.

I can later take a look on tx checksumming and HVM to see if I can find some breakage still with it.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Feb 5, 2018

Member

Is there perhaps something that causes different behavior in kernel and/or xen related e.g., to the fact that any PCI device is assigned there (that is only done for the netvm in this setup) even if the device used in the intervm test itself is just a virtual one?

If we're talking about R3.2, there should be no difference. But in 4.0 there is - VMs with PCI devices are HVM, others are PVH.

Member

marmarek commented Feb 5, 2018

Is there perhaps something that causes different behavior in kernel and/or xen related e.g., to the fact that any PCI device is assigned there (that is only done for the netvm in this setup) even if the device used in the intervm test itself is just a virtual one?

If we're talking about R3.2, there should be no difference. But in 4.0 there is - VMs with PCI devices are HVM, others are PVH.

@ij1

This comment has been minimized.

Show comment
Hide comment
@ij1

ij1 Feb 5, 2018

If we're talking about R3.2, there should be no difference.

R3.2, yes.

ij1 commented Feb 5, 2018

If we're talking about R3.2, there should be no difference.

R3.2, yes.

@ij1

This comment has been minimized.

Show comment
Hide comment
@ij1

ij1 Feb 10, 2018

I'm quite sure now that it has something to do with any PCI device being passed into the netvm.

I created a dummy netvm and set it as the netvm for the proxyvm. When the dummy netvm had no PCI devices, the performance is good (at least 14Gbps+). But then I put a PCI device there and I get only roughly 4Gbps through (the PCI device was not even a network device).

For the other direction traffic (netvm->proxyvm), having a PCI device in the proxyvm does not seem to affect performance.

ij1 commented Feb 10, 2018

I'm quite sure now that it has something to do with any PCI device being passed into the netvm.

I created a dummy netvm and set it as the netvm for the proxyvm. When the dummy netvm had no PCI devices, the performance is good (at least 14Gbps+). But then I put a PCI device there and I get only roughly 4Gbps through (the PCI device was not even a network device).

For the other direction traffic (netvm->proxyvm), having a PCI device in the proxyvm does not seem to affect performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment