-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major performance change with vm-memory bump #1258
Comments
@rbradford can you give a bit more details of how you're running these tests? What do you think about creating a new issue in the rust-vmm repository as well? |
@andreeaflorescu it's iperf3 with default settings and default virtio-net settings |
@rbradford are you sure this is related to vm-memory? Have you identified the commit which introduced the regression? |
Yes, the before is the parent of the vm-memory bump and after is the bump commit. |
@sboeuf this change ( rust-vmm/vm-memory#94 ) in vm-memory moved away from using the SSE2/AVX2 optimised glibc |
@andreeaflorescu @bonzini do you think you can gate the commit rust-vmm/vm-memory@d0aaccc based on |
Are you doing memcpy into the buffers, instead of doing direct read/write into memory? You could also try adding SSE support to vm-memory. |
Should we discuss about possible fixes in vm-memory? |
@bonzini |
@sboeuf There is |
Can you try benchmarking something like this: diff --git a/src/volatile_memory.rs b/src/volatile_memory.rs
index 7c8aa1a..68518fe 100644
--- a/src/volatile_memory.rs
+++ b/src/volatile_memory.rs
@@ -477,6 +477,12 @@ fn alignment(addr: usize) -> usize {
// we're only using integer primitives.
unsafe fn copy_single(align: usize, src_addr: usize, dst_addr: usize) {
match align {
+ 16 => {
+ #[cfg(target_arch = "x86_64")] {
+ type Vec128 = std::arch::x86_64::__m256i;
+ write_volatile(dst_addr as *mut Vec128, read_volatile(src_addr as *const Vec128));
+ }
+ }
8 => write_volatile(dst_addr as *mut u64, read_volatile(src_addr as *const u64)),
4 => write_volatile(dst_addr as *mut u32, read_volatile(src_addr as *const u32)),
2 => write_volatile(dst_addr as *mut u16, read_volatile(src_addr as *const u16)),
@@ -504,6 +510,8 @@ fn copy_slice(dst: &mut [u8], src: &[u8]) -> usize {
}
};
+ #[cfg(target_arch = "x86_64")]
+ copy_aligned_slice(16);
if size_of::<usize>() > 4 {
copy_aligned_slice(8);
} |
This caused a segfault as is but i made some changes and added "support" for AVX2 (no evidence that rust is actually using AVX2 or SSE2 instructions here) diff --git a/src/volatile_memory.rs b/src/volatile_memory.rs
index 9e9901f..8129e91 100644
--- a/src/volatile_memory.rs
+++ b/src/volatile_memory.rs
@@ -477,6 +477,21 @@ fn alignment(addr: usize) -> usize {
// we're only using integer primitives.
unsafe fn copy_single(align: usize, src_addr: usize, dst_addr: usize) {
match align {
+ #[cfg(target_arch = "x86_64")]
+ 32 => {
+ type Vec256 = std::arch::x86_64::__m256i;
+ write_volatile(
+ dst_addr as *mut Vec256,
+ read_volatile(src_addr as *const Vec256),
+ );
+ }
+ 16 => {
+ type Vec128 = std::arch::x86_64::__m128i;
+ write_volatile(
+ dst_addr as *mut Vec128,
+ read_volatile(src_addr as *const Vec128),
+ );
+ }
8 => write_volatile(dst_addr as *mut u64, read_volatile(src_addr as *const u64)),
4 => write_volatile(dst_addr as *mut u32, read_volatile(src_addr as *const u32)),
2 => write_volatile(dst_addr as *mut u16, read_volatile(src_addr as *const u16)),
@@ -504,6 +519,10 @@ fn copy_slice(dst: &mut [u8], src: &[u8]) -> usize {
}
};
+ #[cfg(target_arch = "x86_64")]
+ copy_aligned_slice(32);
+ #[cfg(target_arch = "x86_64")]
+ copy_aligned_slice(16);
if size_of::<usize>() > 4 {
copy_aligned_slice(8);
} The difference was very small:
|
What were the changes in the profile before the changes to vm-memory vs. now? |
Also another useful optimization could be loop unrolling, possibly only for the 16- or 32-byte version. |
Where small objects are those objects that are less then the native data width for the platform. This ensure that volatile and alignment safe read/writes are used when updating structures that are sensitive to this such as virtio devices where the spec requires writes to be atomic. Fixes: cloud-hypervisor/cloud-hypervisor#1258 Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently released vm-memory uses aligned and volatile copying for all data. The version in the fork only uses the assured (and slower) path for data upto the natural data width. Fixes: cloud-hypervisor#1258 Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently released vm-memory uses aligned and volatile copying for all data. The version in the fork only uses the assured (and slower) path for data upto the natural data width. Fixes: #1258 Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Where small objects are those objects that are less then the native data width for the platform. This ensure that volatile and alignment safe read/writes are used when updating structures that are sensitive to this such as virtio devices where the spec requires writes to be atomic. Fixes: cloud-hypervisor/cloud-hypervisor#1258 Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Where small objects are those objects that are less then the native data width for the platform. This ensure that volatile and alignment safe read/writes are used when updating structures that are sensitive to this such as virtio devices where the spec requires writes to be atomic. Fixes: cloud-hypervisor/cloud-hypervisor#1258 Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Where small objects are those objects that are less then the native data width for the platform. This ensure that volatile and alignment safe read/writes are used when updating structures that are sensitive to this such as virtio devices where the spec requires writes to be atomic. Fixes: cloud-hypervisor/cloud-hypervisor#1258 Fixes: rust-vmm#100 Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Update the version number used to point to the latest version but continue to use our patched version due to the fix for cloud-hypervisor#1258 Signed-off-by: Rob Bradford <robert.bradford@intel.com>
A new version of vm-memory was released upstream which resulted in some components pulling in that new version. Update the version number used to point to the latest version but continue to use our patched version due to the fix for cloud-hypervisor#1258 Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Where small objects are those objects that are less then the native data width for the platform. This ensure that volatile and alignment safe read/writes are used when updating structures that are sensitive to this such as virtio devices where the spec requires writes to be atomic. Fixes: cloud-hypervisor/cloud-hypervisor#1258 Fixes: rust-vmm#100 Signed-off-by: Rob Bradford <robert.bradford@intel.com>
A new version of vm-memory was released upstream which resulted in some components pulling in that new version. Update the version number used to point to the latest version but continue to use our patched version due to the fix for #1258 Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Where small objects are those objects that are less then the native data width for the platform. This ensure that volatile and alignment safe read/writes are used when updating structures that are sensitive to this such as virtio devices where the spec requires writes to be atomic. Fixes: cloud-hypervisor/cloud-hypervisor#1258 Fixes: #100 Signed-off-by: Rob Bradford <robert.bradford@intel.com>
For debug builds (after/before): 8% of previous throughput
For release buids (after/before): 56% of previous throughput
Before (debug):
After (debug)
Before (release)
After (release)
The text was updated successfully, but these errors were encountered: