New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve binary endpoint memory usage #189
Improve binary endpoint memory usage #189
Conversation
Add note about looking into using similar behaviour as in binary encoding which performs size prediction that should allow us to skip memory copies and further improve performance. |
* By using BufferManager memory preassure and LOH allocation should decrease substantially
a5fd4b8
to
3055fc8
Compare
/// that should allow us to skip memory copies and further improve performance. | ||
/// | ||
/// We should be able to pool both the stream and the binary writer togheter with size data | ||
using (var stream = new BufferManagerStream(bufferManager, messageOffset, minAllocationSize: 2 * 1024, maxAllocationSize: maxMessageSize)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the current min and max size are arbitrary
One could consider setting a fixed max size at 64k since that is just below the LOH threashold.
But since the memory should be pooled anyway the arrays should probably end up in Gen2 when everything works as expected.
It might make sens to set a cap at some reasonable large block size such somewhere between 128Kb (first size in LOH and 1Mb) in order to keep the number of buffer sizes used bounded, with less possible LOH fragmentation
src/OpenRiaServices.DomainServices.Hosting/Test/Data/BufferManagerStreamTests.cs
Show resolved
Hide resolved
...ervices.Hosting/Framework/Services/MessageEncoders/PoxBinaryMessageEncodingBindingElement.cs
Outdated
Show resolved
Hide resolved
if (count == 0) | ||
return; | ||
|
||
if (Is64BitProcess) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blockcopy is faster for sizes above 1024 on net framework for x64, but it may not be significant enough to change the code
BenchmarkDotNet=v0.11.5, OS=Windows 10.0.18362
Intel Core i5-8250U CPU 1.60GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
[Host] : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.8.4010.0
LegacyJitX86 : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.8.4010.0
RyuJitX64 : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.8.4010.0
Runtime=Clr
Method | Job | Jit | Platform | NumBytes | Mean | Error | StdDev | Median | Ratio | RatioSD |
---|---|---|---|---|---|---|---|---|---|---|
Buffer_BlockCopy | LegacyJitX86 | LegacyJit | X86 | 1024 | 79.36 ns | 1.664 ns | 4.081 ns | 79.36 ns | 1.00 | 0.00 |
Buffer_MemoryCopy | LegacyJitX86 | LegacyJit | X86 | 1024 | 417.77 ns | 8.339 ns | 20.613 ns | 422.18 ns | 5.28 | 0.33 |
Buffer_BlockCopy | RyuJitX64 | RyuJit | X64 | 1024 | 72.26 ns | 1.563 ns | 4.034 ns | 72.05 ns | 1.00 | 0.00 |
Buffer_MemoryCopy | RyuJitX64 | RyuJit | X64 | 1024 | 72.13 ns | 1.504 ns | 3.004 ns | 71.99 ns | 1.01 | 0.07 |
Buffer_BlockCopy | LegacyJitX86 | LegacyJit | X86 | 2048 | 117.32 ns | 2.431 ns | 6.145 ns | 118.53 ns | 1.00 | 0.00 |
Buffer_MemoryCopy | LegacyJitX86 | LegacyJit | X86 | 2048 | 787.86 ns | 15.789 ns | 30.041 ns | 786.57 ns | 6.75 | 0.46 |
Buffer_BlockCopy | RyuJitX64 | RyuJit | X64 | 2048 | 101.15 ns | 2.056 ns | 4.200 ns | 101.48 ns | 1.00 | 0.00 |
Buffer_MemoryCopy | RyuJitX64 | RyuJit | X64 | 2048 | 215.62 ns | 22.340 ns | 65.870 ns | 239.91 ns | 1.66 | 0.55 |
Buffer_BlockCopy | LegacyJitX86 | LegacyJit | X86 | 8192 | 448.76 ns | 34.963 ns | 103.088 ns | 453.13 ns | 1.00 | 0.00 |
Buffer_MemoryCopy | LegacyJitX86 | LegacyJit | X86 | 8192 | 526.23 ns | 48.595 ns | 143.283 ns | 518.02 ns | 1.26 | 0.55 |
Buffer_BlockCopy | RyuJitX64 | RyuJit | X64 | 8192 | 626.81 ns | 14.647 ns | 43.187 ns | 627.00 ns | 1.00 | 0.00 |
Buffer_MemoryCopy | RyuJitX64 | RyuJit | X64 | 8192 | 740.20 ns | 14.804 ns | 20.265 ns | 742.05 ns | 1.17 | 0.06 |
3055fc8
to
c3f6493
Compare
// For x86 it is significantly faster to do copying of int's and longs | ||
// or similar in managed code for smaller counts (below 100-200) | ||
// But we expect most copies to be larger since xml writer buffer around 500 bytes | ||
Buffer.BlockCopy(src, srcOffset, dest, destOffset, count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x86 copy speed
Number of bytes | Buffer_BlockCopy | FastCopy_Long |
---|---|---|
4 | 25.402 | 6.404 |
40 | 30.176 | 22.202 |
200 | 65.845 | 54.938 |
* Also add heurisics for guessing buffer size
Use the BufferManager to use pooled memory when serializing messages.
This should be able to drastically reduce memory usage compared to the MemoryStream.
And for larger messages the pressure on the LOH and Gen2 GCs should se improvements.
Fixes #177
Benchmarks
Before
Final with BinaryWriter
Only buffermanager stream
These benchmarks are