New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of SslStream reading and writing #12935

Merged
merged 4 commits into from Oct 25, 2016

Conversation

@stephentoub
Member

stephentoub commented Oct 23, 2016

This PR makes a significant dent in the allocations incurred while reading/writing an SslStream (though there is still more that can and should be done).

I wrote a simple benchmark that creates a producer and a consumer, where the producer WriteAsyncs 10K 1024 byte buffers to an SslStream, and a consumer reads with 1024 byte buffers until all data is consumed. To try to keep things isolated from the rest of the networking stack (as some common types are shared, making analysis more difficult), the SslStreams wrap NamedPipeServer/ClientStreams. Prior to this change, the allocations end up looking like this:
image

This PR does two things:

  1. It removes some of the Task-related costs by switching away from using Task.Factory.FromAsync in SslStream.WriteAsync/ReadAsync. When Stream didn't provide the Begin/EndRead/Write methods in the contract, SslStream's Begin/End methods were internal, and the implementation of Read/WriteAsync used FromAsync to delegate to them. But the base Stream's Read/WriteAsync overloads have an optimized implementation for delegating to the Begin/End methods, taking advantage of some Task internals. Now that Stream does expose Begin/EndRead/Write and SslStream overrides those, we can remove the FromAsync calls and just let the base stream do its thing. (In the future, we can improve on this further by avoiding going through Begin/End at all, but for now this simple change improves the allocation profile.)
  2. On Windows, every call to EncryptMessage and DecryptMessage ends up going through a generic helper for dealing with SSPI. The interface to these methods uses only safe code, so arrays of classes end up getting passed, and ends up having to do lots of expensive things in order to deal with arbitrary inputs. But the encrypt/decrypt operations are relatively simple, and we can avoid all of those costs by just going to the SSPI calls directly. Previously, every encrypt/decrypt call would allocate 4 SecurityBuffer instances, a SecurityBuffer[4] array, a SecBuffer[4] array, a GCHandle[4] array, a byte[4][] array, and a SecBufferDesc... after this change, all of those except for the SecBufferDesc are gone.

With these changes, a new profile of the same scenario looks like:
image
reducing allocation numbers/size by ~70%. Throughput also increased by ~15%.

The remaining allocations after this change are:

  • The SecBufferDesc allocated in EncryptMessage/DecryptMessage. (We can get rid of that by changing the type to a struct and changing the interface to take a SecBufferDesc* (it can't be a ref SecBufferDesc as we need to be able to pass in null). That impacts a lot of code paths, though, so I haven't done it in this change.)
  • The Task returned from ReadAsync/WriteAsync. (The one from ReadAsync is mostly unavoidable, but if we rewrite the implementation with async/await, if the underlying write operation completes synchronously, we should be able to avoid the task returned from WriteAsync.)
  • A BufferAsyncResult, AsyncProtocolRequest, and boxed Int32 for each read operation. (By rewriting these methods with async/await, we should be able to eliminate the boxed Int32 entirely, and replace the other two with smaller allocations associated with the async state machine, removing them altogether if the operation completes synchronously.)
  • A LazyAsyncResult and an AsyncProtocol Request for each write operation. (Same possibilities as above.)

We can then further reduce these for the CopyToAsync scenario by ammortizing most of the read-related costs.

(The ReadWriteTask allocations showing up in all of the traces are from the named pipe streams and can be ignored for the purposes of this PR.)

Contributes to #11826
cc: @davidsh, @CIPop, @ericeil, @bartonjs, @davidfowl, @benaadams

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Oct 23, 2016

Member

@dotnet-bot Test Outerloop Windows_NT Debug please
@dotnet-bot Test Outerloop Ubuntu14.04 Debug please

Member

stephentoub commented Oct 23, 2016

@dotnet-bot Test Outerloop Windows_NT Debug please
@dotnet-bot Test Outerloop Ubuntu14.04 Debug please

@davidsh davidsh added this to the 1.2.0 milestone Oct 23, 2016

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Oct 23, 2016

Member

@dotnet-bot Test Innerloop Ubuntu14.04 Release Build and Test please (build hang). FYI @adityamandaleeka, @mmitche, another Ubuntu14.04 hang.

Member

stephentoub commented Oct 23, 2016

@dotnet-bot Test Innerloop Ubuntu14.04 Release Build and Test please (build hang). FYI @adityamandaleeka, @mmitche, another Ubuntu14.04 hang.

@benaadams

This comment has been minimized.

Show comment
Hide comment
@benaadams

benaadams Oct 24, 2016

Collaborator

Great improvements!

Can SecBufferDesc can be made a struct and either address taken or passed by ref?

Collaborator

benaadams commented Oct 24, 2016

Great improvements!

Can SecBufferDesc can be made a struct and either address taken or passed by ref?

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Oct 24, 2016

Member

Great improvements!

Thanks

Can SecBufferDesc can be made a struct and either address taken or passed by ref?

See my comments about it in the PR description.

Member

stephentoub commented Oct 24, 2016

Great improvements!

Thanks

Can SecBufferDesc can be made a struct and either address taken or passed by ref?

See my comments about it in the PR description.

@benaadams

This comment has been minimized.

Show comment
Hide comment
@benaadams

benaadams Oct 24, 2016

Collaborator

See my comments about it in the PR description

Makes perfect sense 😄

LGTM

Collaborator

benaadams commented Oct 24, 2016

See my comments about it in the PR description

Makes perfect sense 😄

LGTM

decspc[2] = new SecurityBuffer(null, SecurityBufferType.SECBUFFER_EMPTY);
decspc[3] = new SecurityBuffer(null, SecurityBufferType.SECBUFFER_EMPTY);
Interop.SECURITY_STATUS errorCode = (Interop.SECURITY_STATUS)SSPIWrapper.DecryptMessage(

This comment has been minimized.

@bartonjs

bartonjs Oct 24, 2016

Member

Are the SSPIWrapper methods still being used elsewhere, or should they be deleted?

@bartonjs

bartonjs Oct 24, 2016

Member

Are the SSPIWrapper methods still being used elsewhere, or should they be deleted?

This comment has been minimized.

@stephentoub

stephentoub Oct 24, 2016

Member

Yeah, from SslStream's initialization and then also from a bunch of code paths in NegotiateStream. I understand the reason why these helpers were put in place, but in the fullness of time it'd probably be nice to either revamp them to be more efficient or to remove them entirely; as this PR shows, they add very non-trivial overhead.

@stephentoub

stephentoub Oct 24, 2016

Member

Yeah, from SslStream's initialization and then also from a bunch of code paths in NegotiateStream. I understand the reason why these helpers were put in place, but in the fullness of time it'd probably be nice to either revamp them to be more efficient or to remove them entirely; as this PR shows, they add very non-trivial overhead.

This comment has been minimized.

@CIPop

CIPop Oct 24, 2016

Member

The wrapper was trying to provide an abstraction over the SSPI API and is used in Desktop for both the Schannel and Negotiate SSPIs.
@stephentoub are you planning on making these changes for NegotiateStream, NTAuthentication and NegoState as well?

@CIPop

CIPop Oct 24, 2016

Member

The wrapper was trying to provide an abstraction over the SSPI API and is used in Desktop for both the Schannel and Negotiate SSPIs.
@stephentoub are you planning on making these changes for NegotiateStream, NTAuthentication and NegoState as well?

Buffer.BlockCopy(input, offset, output, headerSize, size);
const int NumSecBuffers = 4; // header + data + trailer + empty
var unmanagedBuffer = stackalloc Interop.SspiCli.SecBuffer[NumSecBuffers];

This comment has been minimized.

@bartonjs

bartonjs Oct 24, 2016

Member

Nit: should this have a plural name?

@bartonjs

bartonjs Oct 24, 2016

Member

Nit: should this have a plural name?

This comment has been minimized.

@stephentoub

stephentoub Oct 24, 2016

Member

Maybe? It's the same name that's used elsewhere for this case, and it does point to the whole buffer which contains several elements. I think I'll leave it as is for now.

@stephentoub

stephentoub Oct 24, 2016

Member

Maybe? It's the same name that's used elsewhere for this case, and it does point to the whole buffer which contains several elements. I think I'll leave it as is for now.

@CIPop

CIPop approved these changes Oct 24, 2016

Nice! Thanks @stephentoub!

Please run the outerloop tests to ensure that the server and client cert paths are exercised.

decspc[2] = new SecurityBuffer(null, SecurityBufferType.SECBUFFER_EMPTY);
decspc[3] = new SecurityBuffer(null, SecurityBufferType.SECBUFFER_EMPTY);
Interop.SECURITY_STATUS errorCode = (Interop.SECURITY_STATUS)SSPIWrapper.DecryptMessage(

This comment has been minimized.

@CIPop

CIPop Oct 24, 2016

Member

The wrapper was trying to provide an abstraction over the SSPI API and is used in Desktop for both the Schannel and Negotiate SSPIs.
@stephentoub are you planning on making these changes for NegotiateStream, NTAuthentication and NegoState as well?

@CIPop

CIPop Oct 24, 2016

Member

The wrapper was trying to provide an abstraction over the SSPI API and is used in Desktop for both the Schannel and Negotiate SSPIs.
@stephentoub are you planning on making these changes for NegotiateStream, NTAuthentication and NegoState as well?

@@ -6,7 +6,7 @@ namespace System.Net
{
internal static class GlobalSSPI
{
internal static SSPIInterface SSPIAuth = new SSPIAuthType();

This comment has been minimized.

@CIPop

CIPop Oct 24, 2016

Member

@stephentoub Would it be simple to add your test to a new PerformanceTests folder for S.N.Security?

@CIPop

CIPop Oct 24, 2016

Member

@stephentoub Would it be simple to add your test to a new PerformanceTests folder for S.N.Security?

This comment has been minimized.

@stephentoub

stephentoub Oct 24, 2016

Member

@stephentoub are you planning on making these changes for NegotiateStream, NTAuthentication and NegoState as well?

I'm not, though I'd have no problem seeing it done. I'm much more concerned with SslStream peformance than I am with NegotiateStream performance, at least at present. I completely understand the desire for having the wrapper as an SSPI API abstraction, but in its current form, it's hugely expensive. If the costs could be removed, it would make sense to continue using it everywhere; otherwise, I do think the direction should be to move off of it so that the right things can be done for each case perf-wise.

Would it be simple to add your test to a new PerformanceTests folder for S.N.Security?

I can look at doing that.

@stephentoub

stephentoub Oct 24, 2016

Member

@stephentoub are you planning on making these changes for NegotiateStream, NTAuthentication and NegoState as well?

I'm not, though I'd have no problem seeing it done. I'm much more concerned with SslStream peformance than I am with NegotiateStream performance, at least at present. I completely understand the desire for having the wrapper as an SSPI API abstraction, but in its current form, it's hugely expensive. If the costs could be removed, it would make sense to continue using it everywhere; otherwise, I do think the direction should be to move off of it so that the right things can be done for each case perf-wise.

Would it be simple to add your test to a new PerformanceTests folder for S.N.Security?

I can look at doing that.

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Oct 24, 2016

Member

Nice! Thanks @stephentoub!

😄

Please run the outerloop tests to ensure that the server and client cert paths are exercised.

Yup, already done (though I'll do it again when I make updates):
image

Member

stephentoub commented Oct 24, 2016

Nice! Thanks @stephentoub!

😄

Please run the outerloop tests to ensure that the server and client cert paths are exercised.

Yup, already done (though I'll do it again when I make updates):
image

stephentoub added some commits Oct 22, 2016

Make SslStream.Write/ReadAsync overrides delegate to base
When SslStream didn't override Begin/EndRead/Write, we had to implement Read/WriteAsync with Task.Factory.FromAsync wrapping its internal Begin/End methods.  But FromAsync has more overhead than the base Stream's ReadAsync/WriteAsync implementations, which uses TPL internals access to optimize the wrapping of the Begin/End methods.  Now that SslStream does override Begin/EndReadWrite, we can delegate back to the base implementations.
Remove lots of allocations from encrypt/decrypt on Windows
Every call to each of EncryptMessage and DecryptMessage was allocating:
- 4 SecurityBuffer instances
- A SecurityBuffer[4]
- A SecBuffer[4]
- A GCHandle[4]
- A byte[4][]
- A SecBufferDesc
as well as creating 3 pinned GCHandles for EncryptMessage and 1 pinned GCHandle for DecryptMessage.

This change removes all of that except for the 1 SecBufferDesc instance.
Address PR feedback
- Remove the Read/WriteAsync overloads that now just delegate to the base.  They're not listed explicitly in the contract, so they're not needed.  We can add them back if/when we do something more clever than the base does.
@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Oct 25, 2016

Member

@dotnet-bot Test Innerloop Ubuntu14.04 Release Build and Test please (#12467)
@dotnet-bot Test OuterLoop Ubuntu14.04 Debug please (#12467)

Member

stephentoub commented Oct 25, 2016

@dotnet-bot Test Innerloop Ubuntu14.04 Release Build and Test please (#12467)
@dotnet-bot Test OuterLoop Ubuntu14.04 Debug please (#12467)

@stephentoub stephentoub merged commit c30cb1f into dotnet:master Oct 25, 2016

12 checks passed

Innerloop CentOS7.1 Debug Build and Test Build finished.
Details
Innerloop CentOS7.1 Release Build and Test Build finished.
Details
Innerloop Linux ARM Emulator Debug Cross Build Build finished.
Details
Innerloop Linux ARM Emulator Release Cross Build Build finished.
Details
Innerloop OSX Debug Build and Test Build finished.
Details
Innerloop OSX Release Build and Test Build finished.
Details
Innerloop Ubuntu14.04 Debug Build and Test Build finished.
Details
Innerloop Ubuntu14.04 Release Build and Test Build finished.
Details
Innerloop Windows_NT Debug Build and Test Build finished.
Details
Innerloop Windows_NT Release Build and Test Build finished.
Details
OuterLoop Ubuntu14.04 Debug Build finished.
Details
OuterLoop Windows_NT Debug Build finished.
Details

@stephentoub stephentoub deleted the stephentoub:ssl_perf branch Oct 25, 2016

@stephentoub stephentoub referenced this pull request Jun 23, 2017

Open

Faster SslStream #21371

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment