Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve binary endpoint memory usage #189

Conversation

Daniel-Svensson
Copy link
Member

@Daniel-Svensson Daniel-Svensson commented Sep 27, 2019

Use the BufferManager to use pooled memory when serializing messages.
This should be able to drastically reduce memory usage compared to the MemoryStream.
And for larger messages the pressure on the LOH and Gen2 GCs should se improvements.

Fixes #177

  • Add MemoryStream replacement
  • Stress test
  • Benchmark
  • Unit tests (Write based on current "draft" and maybe add some more)

Benchmarks

Before

BenchmarkDotNet=v0.11.5, OS=Windows 10.0.18362
Intel Core i5-8250U CPU 1.60GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
  [Host]    : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.8.3815.0
  MediumRun : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.8.3815.0

Job=MediumRun  IterationCount=15  LaunchCount=2  
WarmupCount=10  
Method NumEntities DomainClient Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
GetCititesUniqueContext 10 WcfBinary 1.453 ms 0.4366 ms 0.6535 ms 0.9640 ms 21.4844 - - 71.38 KB
GetCititesReuseContext 10 WcfBinary 1.124 ms 0.3025 ms 0.4527 ms 0.9710 ms 19.5313 - - 62.57 KB
GetCititesUniqueContext 100 WcfBinary 1.933 ms 0.4316 ms 0.6190 ms 1.6856 ms 48.8281 15.6250 - 191.3 KB
GetCititesReuseContext 100 WcfBinary 2.406 ms 0.5691 ms 0.8519 ms 1.9335 ms 46.8750 9.7656 - 163.89 KB
GetCititesUniqueContext 1000 WcfBinary 11.147 ms 1.9772 ms 2.8981 ms 12.3418 ms 367.1875 195.3125 39.0625 1318.2 KB
GetCititesReuseContext 1000 WcfBinary 7.322 ms 1.8208 ms 2.6690 ms 5.6060 ms 273.4375 117.1875 39.0625 1049.08 KB

Final with BinaryWriter

BenchmarkDotNet=v0.11.5, OS=Windows 10.0.18362
Intel Core i5-8250U CPU 1.60GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.8.4018.0
  DefaultJob : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.8.4018.0

Method NumEntities DomainClient Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
GetCititesUniqueContext 10 WcfBinary 876.0 us 16.72 us 19.26 us 20.5078 0.9766 - 65.01 KB
GetCititesReuseContext 10 WcfBinary 848.6 us 16.74 us 38.79 us 15.6250 - - 56.12 KB
GetCititesUniqueContext 100 WcfBinary 1,559.3 us 29.18 us 31.23 us 42.9688 11.7188 - 165.75 KB
GetCititesReuseContext 100 WcfBinary 1,444.0 us 28.37 us 45.00 us 37.1094 5.8594 - 124.98 KB
GetCititesUniqueContext 1000 WcfBinary 6,169.1 us 1,339.30 us 1,187.25 us 343.7500 171.8750 - 1063.03 KB
GetCititesReuseContext 1000 WcfBinary 5,357.5 us 794.58 us 704.37 us 218.7500 93.7500 - 793.62 KB

Only buffermanager stream

These benchmarks are

BenchmarkDotNet=v0.11.5, OS=Windows 10.0.18362
Intel Core i5-8250U CPU 1.60GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.8.4010.0
  DefaultJob : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.8.4010.0

Method NumEntities DomainClient Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
GetCititesUniqueContext 10 WcfBinary 935.1 us 18.51 us 37.802 us 930.9 us 21.4844 0.9766 - 67.74 KB
GetCititesReuseContext 10 WcfBinary 866.5 us 17.19 us 39.843 us 859.4 us 18.5547 - - 58.85 KB
GetCititesUniqueContext 100 WcfBinary 1,640.7 us 10.18 us 9.518 us 1,640.3 us 42.9688 13.6719 - 168.43 KB
GetCititesReuseContext 100 WcfBinary 1,584.1 us 145.51 us 199.172 us 1,545.4 us 39.0625 7.8125 - 136.03 KB
GetCititesUniqueContext 1000 WcfBinary 8,554.4 us 966.66 us 2,850.210 us 6,184.7 us 343.7500 171.8750 - 1066.23 KB
GetCititesReuseContext 1000 WcfBinary 7,328.3 us 839.02 us 2,473.874 us 5,249.5 us 234.3750 101.5625 - 797.26 KB

@Daniel-Svensson Daniel-Svensson added this to the 5.0 milestone Sep 27, 2019
@Daniel-Svensson Daniel-Svensson changed the title Improve binary endpoint memory usage [WIP] Improve binary endpoint memory usage Sep 27, 2019
@Daniel-Svensson
Copy link
Member Author

Add note about looking into using similar behaviour as in binary encoding which performs size prediction that should allow us to skip memory copies and further improve performance.

@Daniel-Svensson Daniel-Svensson changed the title [WIP] Improve binary endpoint memory usage Improve binary endpoint memory usage Oct 3, 2019
* By using BufferManager memory preassure and LOH allocation should decrease substantially
/// that should allow us to skip memory copies and further improve performance.
///
/// We should be able to pool both the stream and the binary writer togheter with size data
using (var stream = new BufferManagerStream(bufferManager, messageOffset, minAllocationSize: 2 * 1024, maxAllocationSize: maxMessageSize))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current min and max size are arbitrary

One could consider setting a fixed max size at 64k since that is just below the LOH threashold.
But since the memory should be pooled anyway the arrays should probably end up in Gen2 when everything works as expected.

It might make sens to set a cap at some reasonable large block size such somewhere between 128Kb (first size in LOH and 1Mb) in order to keep the number of buffer sizes used bounded, with less possible LOH fragmentation

if (count == 0)
return;

if (Is64BitProcess)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blockcopy is faster for sizes above 1024 on net framework for x64, but it may not be significant enough to change the code

BenchmarkDotNet=v0.11.5, OS=Windows 10.0.18362
Intel Core i5-8250U CPU 1.60GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
  [Host]       : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.8.4010.0
  LegacyJitX86 : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.8.4010.0
  RyuJitX64    : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.8.4010.0

Runtime=Clr  
Method Job Jit Platform NumBytes Mean Error StdDev Median Ratio RatioSD
Buffer_BlockCopy LegacyJitX86 LegacyJit X86 1024 79.36 ns 1.664 ns 4.081 ns 79.36 ns 1.00 0.00
Buffer_MemoryCopy LegacyJitX86 LegacyJit X86 1024 417.77 ns 8.339 ns 20.613 ns 422.18 ns 5.28 0.33
Buffer_BlockCopy RyuJitX64 RyuJit X64 1024 72.26 ns 1.563 ns 4.034 ns 72.05 ns 1.00 0.00
Buffer_MemoryCopy RyuJitX64 RyuJit X64 1024 72.13 ns 1.504 ns 3.004 ns 71.99 ns 1.01 0.07
Buffer_BlockCopy LegacyJitX86 LegacyJit X86 2048 117.32 ns 2.431 ns 6.145 ns 118.53 ns 1.00 0.00
Buffer_MemoryCopy LegacyJitX86 LegacyJit X86 2048 787.86 ns 15.789 ns 30.041 ns 786.57 ns 6.75 0.46
Buffer_BlockCopy RyuJitX64 RyuJit X64 2048 101.15 ns 2.056 ns 4.200 ns 101.48 ns 1.00 0.00
Buffer_MemoryCopy RyuJitX64 RyuJit X64 2048 215.62 ns 22.340 ns 65.870 ns 239.91 ns 1.66 0.55
Buffer_BlockCopy LegacyJitX86 LegacyJit X86 8192 448.76 ns 34.963 ns 103.088 ns 453.13 ns 1.00 0.00
Buffer_MemoryCopy LegacyJitX86 LegacyJit X86 8192 526.23 ns 48.595 ns 143.283 ns 518.02 ns 1.26 0.55
Buffer_BlockCopy RyuJitX64 RyuJit X64 8192 626.81 ns 14.647 ns 43.187 ns 627.00 ns 1.00 0.00
Buffer_MemoryCopy RyuJitX64 RyuJit X64 8192 740.20 ns 14.804 ns 20.265 ns 742.05 ns 1.17 0.06

// For x86 it is significantly faster to do copying of int's and longs
// or similar in managed code for smaller counts (below 100-200)
// But we expect most copies to be larger since xml writer buffer around 500 bytes
Buffer.BlockCopy(src, srcOffset, dest, destOffset, count);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x86 copy speed

Number of bytes Buffer_BlockCopy FastCopy_Long
4 25.402 6.404
40 30.176 22.202
200 65.845 54.938

@Daniel-Svensson Daniel-Svensson merged commit 0edd1fa into OpenRIAServices:master Oct 30, 2019
@Daniel-Svensson Daniel-Svensson deleted the feature/buffermanagerstream branch October 30, 2019 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Idea: Improve memory usage of binary endpoint
1 participant