Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce channel overhead (GC pressure) #15

Closed
AArnott opened this issue Jul 27, 2018 · 3 comments
Closed

Reduce channel overhead (GC pressure) #15

AArnott opened this issue Jul 27, 2018 · 3 comments

Comments

@AArnott
Copy link
Collaborator

AArnott commented Jul 27, 2018

The MultiplexingStreamPerfTests reveal that communicating the same data over channels is a bit more expensive in CPU, and significantly more expensive in terms of GC pressure. We should analyze the allocations and optimize anything we can with a goal of reducing allocations by at least 50%.

To illustrate, running a simple JSON-RPC invocation over named pipes incurs this cost:

Test host launched with: "D:\git\Nerdbank.FullDuplexStream\bin\Nerdbank.Streams.Tests\Debug\net461\IsolatedTestHost.exe" "D:\git\Nerdbank.FullDuplexStream\bin\Nerdbank.Streams.Tests\Debug\net461\Nerdbank.Streams.Tests.dll" "MultiplexingStreamPerfTests" "JsonRpcPerf_Pipe" "False"
Bytes allocated during quiet wait period: 648
373112 bytes allocated (373 per iteration)
Elapsed time: 279ms (0.279ms per iteration)

While running the same test except using a single channel over that same named pipe incurs 3.3K more allocations (10X the amount of the named pipe)!

Test host launched with: "D:\git\Nerdbank.FullDuplexStream\bin\Nerdbank.Streams.Tests\Debug\net461\IsolatedTestHost.exe" "D:\git\Nerdbank.FullDuplexStream\bin\Nerdbank.Streams.Tests\Debug\net461\Nerdbank.Streams.Tests.dll" "MultiplexingStreamPerfTests" "JsonRpcPerf_Channel" "False"
Bytes allocated during quiet wait period: 2248
Bytes allocated during quiet wait period: 32
3749344 bytes allocated (3749 per iteration)
Elapsed time: 308ms (0.308ms per iteration)
@AArnott
Copy link
Collaborator Author

AArnott commented Aug 10, 2018

Channel overhead is ~17%. Good enough till someone says otherwise.

@AArnott AArnott closed this as completed Aug 10, 2018
@buybackoff
Copy link

buybackoff commented Sep 21, 2018

Thanks a lot for FullDuplexStream! Testing WebSockets has never been so easy :)

Look like DuplexStream ReadAsync/WriteAsync with ValueTask just use base Stream's virtual methods, which allocate a lot. I'm testing/profiling non-allocating code that uses .NET Core 2.1 zero-alloc path with ValueTask and there is a lot of noise. On the pic the profile of my code with 100% unhappy path (async state machine always allocates) and even there most garbage is from base Stream. It would be nice to have non-allocating implementation to test high perf non-allocating protocol stacks that use this corefx work: https://github.com/dotnet/corefx/issues/27445

image

On the happy path, there is no alloc in my code. I could see that in memory profiler and that's fine, but performance profiling is harder to digest.

image

@AArnott
Copy link
Collaborator Author

AArnott commented Sep 21, 2018

Hi @buybackoff. Thanks for sharing your results. I would love to correct this. It sounds like at least I can override ValueTask ReadAsync and ValueTask WriteAsync methods to fix this. This particular issue was tracking GC pressure on the MultiplexingStream in particular. Can you open a new issue to track this? I'll be happy to send you a candidate fix build for you to validate as part of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants