New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added NBench support + End2End performance spec for TcpSocketChannel #95
Conversation
Hi @Aaronontheweb, I'm your friendly neighborhood Azure Pull Request Bot (You can call me AZPRBOT). Thanks for your contribution! TTYL, AZPRBOT; |
FYI - this spec touches a bunch of different features all at once (tried to do something I thought an end-user might do) and could probably stand to be optimized in a couple of areas. |
Initial results, which you can view on pull requests going forward by looking inside the "PerfResults" artifacts folder on each TeamCity build: DotNetty.Transport.Tests.Performance.Sockets.TcpChannelPerfSpecs+TcpChannel_Duplex_ThroughputMeasures how quickly and with how much GC overhead a TcpSocketChannel --> TcpServerSocketChannel connection can decode / encode realistic messages System InfoNBench=NBench, Version=0.2.1.0, Culture=neutral, PublicKeyToken=null
OS=Microsoft Windows NT 6.2.9200.0
ProcessorCount=4
CLR=4.0.30319.42000,IsMono=False,MaxGcGeneration=2
WorkerThreads=32767, IOThreads=4 NBench SettingsRunMode=Iterations, TestMode=Measurement
NumberOfIterations=13, MaximumRunTime=00:00:01 DataTotals
Per-second Totals
Raw DataTotalCollections [Gen0]
TotalCollections [Gen1]
TotalCollections [Gen2]
TotalBytesAllocated
[Counter] inbound ops
[Counter] outbound ops
|
var counterHandler = new CounterHandlerInbound(this._inboundThroughputCounter); | ||
this._signal = new ManualResetEventSlimReadFinishedSignal(this.ResetEvent); | ||
|
||
ServerBootstrap sb = new ServerBootstrap().Group(this.ServerGroup, this.WorkerGroup).Channel<TcpServerSocketChannel>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
configure pooled allocator, e.g.: https://github.com/Azure/azure-iot-protocol-gateway/blob/master/host/ProtocolGateway.Host.Common/Bootstrapper.cs#L147
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls line break the whole thing (including initializer below) for better readabilty. You get more LOC as well! 😄
Ran locally with the pooled bytebuf allocators at a message count of 1 million and the reported throughput is nearly identical to what it was before. I'll re-run it on the build server though, which thus far has posted consistently higher numbers than my development machine (which is what you'd expect from hardware that is reserved exclusively for builds and tests.) |
"Ran locally with the pooled bytebuf allocators" -- on both client and server? Well, at that scale, with no warm up it might be expected. Pooled allocates buffers as needed right now. Arena based allocator (coming in next few days) might change things a little. |
Yes - on both sides. There was a noticeable drop in gen 0 GC though. Gen 2 was about the same and I'll have to pull the numbers off of the build server for Gen 1. |
To run these locally, btw:
|
Ah, that's a good point - NBench automatically destroys the test fixture and recreates it in-place exactly like how XUnit does, so we're starting with a fresh |
this.signal = new ManualResetEventSlimReadFinishedSignal(this.ResetEvent); | ||
|
||
// reserve up to 10mb of 16kb buffers on both client and server; we're only sending about 700k worth of messages | ||
this.serverBufferAllocator = new PooledByteBufferAllocator(16 * 1024, 10 * 1024 * 1024 / Environment.ProcessorCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with this config there's only 640 buffers to use. Considering small message size, can you pls adjust to use 1 KB buffers or even 256 bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try 256
First results from the initial pooled byte buffer settings: DotNetty.Transport.Tests.Performance.Sockets.TcpChannelPerfSpecs+TcpChannel_Duplex_ThroughputMeasures how quickly and with how much GC overhead a TcpSocketChannel --> TcpServerSocketChannel connection can decode / encode realistic messages System InfoNBench=NBench, Version=0.2.1.0, Culture=neutral, PublicKeyToken=null
OS=Microsoft Windows NT 6.2.9200.0
ProcessorCount=4
CLR=4.0.30319.42000,IsMono=False,MaxGcGeneration=2
WorkerThreads=32767, IOThreads=4 NBench SettingsRunMode=Iterations, TestMode=Measurement
NumberOfIterations=13, MaximumRunTime=00:00:01 DataTotals
Per-second Totals
|
Definitely a drop in throughput, but also a drop in GC. @nayato what's the recommended way to warmup a pool allocator? We should shave off that overhead if possible. |
you'd need to iterate through event loops on event loop group and execute "take X buffers, then release them" on each of event loops. |
We'll need to play around with this some more - seems to be topping out around 45k messages per second no matter what i do today. TBH, I suspect part of the issue here is that we're using auto-scaling on Azure to provision build agents and there's no guarantee that we're going to get identical underlying hardware beneath the VM each time. Could very well be that the VM we used last night is running on newer hardware than the one we're running on right now. Ran a test locally comparing the pooled byte buffer allocator against the direct one again and the pool one is noticeably more performant even without warmup. |
again, lack of batching when sending is the biggest issue here I think. I'll have cycles to address it once I'm through with new byte buffer pool. |
just use Dv2 machines 👍 |
@Aaronontheweb, something failed on build. Worth merging anyhow. Pls break apart IReadFinishedSignal.cs (file per type) and add license header to all the new cs files and should be good to go. |
@nayato yeah, NBench exceeded 15 minutes of runtime so FAKE timed it out :p - I'm increasing the FAKE timeout and decreasing the run interval for NBench, since I'm breaking out the benchmark into 3 parts with different flush intervals. |
d4f582a
to
769e81c
Compare
Looks like we're using standard D12s. I'll change to using a Dv2. |
769e81c
to
ea678d6
Compare
Ok, added all copyright headers and applied last formatting changes. |
ea678d6
to
31864c6
Compare
Removed the additional 10 messages per flush and 1000 messages per flush options - would have taken like 45 minutes to run ( |
Added NBench support to the build scripts, a sample NBench project, and a port of a benchmark I wrote for Helios that measures duplex end-to-end performance using a realistic encoding pipeline.