New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two Socket/NetworkStream-related optimizations #12664

Merged
merged 2 commits into from Oct 15, 2016

Conversation

@stephentoub
Member

stephentoub commented Oct 14, 2016

This PR provides two independent but related optimizations, one for NetworkStream and one for Socket. The latter was done as it showed up as a significant source of allocations once the former was done.

  1. Add NetworkStream.CopyToAsync override. This does several custom things. First, it uses ArrayPool for the buffer used in the copy operation. Second, it uses a SocketAsyncEventArgs to avoid per-operation costs related to the socket. And third, it then uses a custom awaitable to avoid async/Task-related overheads that would otherwise occur per-ReadAsync.
  2. Removes a per-operation SafeHandle allocation from any operation on a SocketAsyncEventArgs.

I wrote a little benchmark that connects a socket to a localhost server which serves up 10MB of data. It then creates a NetworkStream and does a CopyToAsync on it to copy all of the data to Stream.Null. And it does all of this 10 times. Prior to the changes, there are lots of allocations (I'm only showing line items with > 10K of impact):
image

After adding CopyToAsync, most of the allocations go away, but we end up with a significant number of SafeNativeOverlapped SafeHandles:
image

After the second fix to address the SafeHandles, our memory usage is much more reasonable:
image

(Most of what remains is unrelated to the actual operation being tested, and is coming from things elsewhere in the test app, e.g. strings created at startup.)

cc: @ericeil, @davidsh, @CIPop, @davidfowl, @benaadams
Fixes #11573
Fixes #12659

@benaadams

This comment has been minimized.

Show comment
Hide comment
@benaadams

benaadams Oct 14, 2016

Collaborator

Very nice 💯

Collaborator

benaadams commented Oct 14, 2016

Very nice 💯

@davidfowl

This comment has been minimized.

Show comment
Hide comment
@davidfowl

davidfowl Oct 14, 2016

Contributor

This is great! Even better would be a way to pool the AwaitableSocketAsyncEventArgs /SocketAsyncEventArgs for reuse but that's an unrelated optimization.

Contributor

davidfowl commented Oct 14, 2016

This is great! Even better would be a way to pool the AwaitableSocketAsyncEventArgs /SocketAsyncEventArgs for reuse but that's an unrelated optimization.

@ericeil

Very nice!

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Oct 14, 2016

Member

@dotnet-bot test this please (Jenkins rebooted)

Member

stephentoub commented Oct 14, 2016

@dotnet-bot test this please (Jenkins rebooted)

@CIPop

This comment has been minimized.

Show comment
Hide comment
@CIPop

CIPop Oct 15, 2016

Member

Thanks @stephentoub. Please validate this in both netcore as well as UWP profiles using the performance tests to ensure no regressions under stress using either of the ThreadPool implementations.

Member

CIPop commented Oct 15, 2016

Thanks @stephentoub. Please validate this in both netcore as well as UWP profiles using the performance tests to ensure no regressions under stress using either of the ThreadPool implementations.

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Oct 15, 2016

Member

Please validate this in both netcore as well as UWP profiles using the performance tests to ensure no regressions under stress using either of the ThreadPool implementations.

Where are instructions for doing this?

Member

stephentoub commented Oct 15, 2016

Please validate this in both netcore as well as UWP profiles using the performance tests to ensure no regressions under stress using either of the ThreadPool implementations.

Where are instructions for doing this?

@CIPop

This comment has been minimized.

Show comment
Hide comment
@CIPop

CIPop Oct 15, 2016

Member

Please manually run the outerloop https://github.com/dotnet/corefx/tree/master/src/System.Net.Sockets/tests/PerformanceTests tests, changing the counters to something that takes a couple of minutes.

UWP: I'm asking only because I remember some interesting behavior differences for netcore50 as well as netnative modes. @ericeil can talk more about the differences between the two platforms in terms of IOCP/Overlapped/ThreadPool.
To test UWP I was using the internal ToF tests. If this isn't possible anymore, please open a tracking issue to run extended perf/stress tests for UAP whenever testing for this platform is enabled in the GitHub branch.

Member

CIPop commented Oct 15, 2016

Please manually run the outerloop https://github.com/dotnet/corefx/tree/master/src/System.Net.Sockets/tests/PerformanceTests tests, changing the counters to something that takes a couple of minutes.

UWP: I'm asking only because I remember some interesting behavior differences for netcore50 as well as netnative modes. @ericeil can talk more about the differences between the two platforms in terms of IOCP/Overlapped/ThreadPool.
To test UWP I was using the internal ToF tests. If this isn't possible anymore, please open a tracking issue to run extended perf/stress tests for UAP whenever testing for this platform is enabled in the GitHub branch.

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Oct 15, 2016

Member

@dotnet-bot Test OuterLoop Windows_NT Debug please (#11737)

Member

stephentoub commented Oct 15, 2016

@dotnet-bot Test OuterLoop Windows_NT Debug please (#11737)

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Oct 15, 2016

Member

Please manually run the outerloop

Thanks. I ran the perf tests locally and didn't see any regressions.

Member

stephentoub commented Oct 15, 2016

Please manually run the outerloop

Thanks. I ran the perf tests locally and didn't see any regressions.

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Oct 15, 2016

Member

@dotnet-bot Test Outerloop Windows_NT Debug please
@dotnet-bot Test Outerloop OSX Debug please

Member

stephentoub commented Oct 15, 2016

@dotnet-bot Test Outerloop Windows_NT Debug please
@dotnet-bot Test Outerloop OSX Debug please

stephentoub added some commits Oct 14, 2016

Optimize NetworkStream.CopyToAsync
This commit overrides CopyToAsync on NetworkStream to provide an optimized implementation.  Several optimizations:
- Use ArrayPool for a pooled copy buffer rather than allocating a new one for each CopyToAsync operation
- Uses a SocketAsyncEventArgs to avoid per-socket operation costs like pinning of the buffer
- Uses a custom awaitable to avoid per-ReadAsync costs like the allocated Tasks and IAsyncResult objects involved
Avoid allocating a SafeNativeOverlapped per SocketAsyncEventArgs oper…
…ation

Each operation ends up allocating a SafeNativeOverlapped, which results in a ton of allocations when trying to optimize code via SocketAsyncEventArgs.  This commit reuses the same SafeHandle object for many / all of the operations on the event args instance.

@stephentoub stephentoub merged commit c9b1b94 into dotnet:master Oct 15, 2016

10 checks passed

Innerloop CentOS7.1 Debug Build and Test Build finished.
Details
Innerloop CentOS7.1 Release Build and Test Build finished.
Details
Innerloop Linux ARM Emulator Debug Cross Build Build finished.
Details
Innerloop Linux ARM Emulator Release Cross Build Build finished.
Details
Innerloop OSX Debug Build and Test Build finished.
Details
Innerloop OSX Release Build and Test Build finished.
Details
Innerloop Ubuntu14.04 Debug Build and Test Build finished.
Details
Innerloop Ubuntu14.04 Release Build and Test Build finished.
Details
Innerloop Windows_NT Debug Build and Test Build finished.
Details
Innerloop Windows_NT Release Build and Test Build finished.
Details
{
cancellationToken.ThrowIfCancellationRequested();
int bytesRead = await ea.ReceiveAsync();

This comment has been minimized.

@jamesqo

jamesqo Oct 15, 2016

Contributor

Should a ConfigureAwait(false) have been added here, or was it intentionally omitted? I noticed the second await has it, but this one does not.

@jamesqo

jamesqo Oct 15, 2016

Contributor

Should a ConfigureAwait(false) have been added here, or was it intentionally omitted? I noticed the second await has it, but this one does not.

This comment has been minimized.

@stephentoub

stephentoub Oct 15, 2016

Member

This isn't a Task. The behavior of this awaitable matches that of ConfigureAwait(false) implicitly, in that it always ignores the current context.

@stephentoub

stephentoub Oct 15, 2016

Member

This isn't a Task. The behavior of this awaitable matches that of ConfigureAwait(false) implicitly, in that it always ignores the current context.

@stephentoub stephentoub deleted the stephentoub:networkstream_copytoasync branch Oct 15, 2016

@scionwest

This comment has been minimized.

Show comment
Hide comment
@scionwest

scionwest Nov 26, 2016

I just ran into this issue over yesterday! I was seeing a GC collection happen once per-second. Glad to see it has already been fixed, awesome stuff!

scionwest commented Nov 26, 2016

I just ran into this issue over yesterday! I was seeing a GC collection happen once per-second. Glad to see it has already been fixed, awesome stuff!

@scionwest

This comment has been minimized.

Show comment
Hide comment
@scionwest

scionwest Nov 27, 2016

This appears to only be an issue with netcoreapp projects. Using SocketAsyncEventArgs on the full framework, 4.6, does not have this issue. I was able to create the exact same app in both frameworks and see it allocate like crazy in netcoreapp, and run nice and smooth in 4.6. Interesting, looking forward to getting this into my netcoreapp project for sure.

scionwest commented Nov 27, 2016

This appears to only be an issue with netcoreapp projects. Using SocketAsyncEventArgs on the full framework, 4.6, does not have this issue. I was able to create the exact same app in both frameworks and see it allocate like crazy in netcoreapp, and run nice and smooth in 4.6. Interesting, looking forward to getting this into my netcoreapp project for sure.

@karelz karelz modified the milestone: 1.2.0 Dec 3, 2016

buybackoff added a commit to Spreads/Spreads.IPC that referenced this pull request Jul 13, 2017

Run UDP benchmark on .NET Core. A lot of GC, slower, and dotMemory ca…
…nnot attach.

PerfView shows that the issue is solely dotnet/corefx#12664, so should target netstandard2.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment