-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC #21
Comments
You are right. Some type of buffer pooling will improve GC time and also avoid potential managed heap fragmentation. It becomes more important if the library is used to build service listener or broker type of applications. Overall I'd like to minimize the dependencies so that the library works on all platforms without too many platform specific code. But at the same time this rule should not affect useful features that are important for common scenarios. One way to achieve both is to have extension points where user can plug in their implementation as appropriate. A custom buffer manager may be one good example of this. I will think about how this can be done. Any suggestions/ideas are very welcome. Thanks. |
We use the listener portion of the library. We are sending large messages (1-2MB) and have increased the max frame size to 5MB from the discussion in issue #30. To isolate Amqp.Net Lite I have commented out all of our code in IMessageProcessor.Process. Depending on the number of clients and how frequently they send messages we are seeing the '% Time in GC' ranging from 10 to 40%. We would also like to minimize dependencies and would prefer not to introduce a dependency on System.ServiceModel.dll by using BufferManager. My preference is to implement buffer pooling within the library that can be used by default. I imagine that should work fine for most use cases. If it needs to be extended then a feature can be added to allow someone to inject their own buffer pool. |
I can donate a buffer manager (3 clause BSD licensed). to avoid rewriting On Thu, Oct 1, 2015 at 7:19 PM, Rob notifications@github.com wrote:
Studying for the Turing test |
@gregoryyoung I don't know if we can take BSD licensed code. Probably not. Writing a buffer manager is not difficult, but in order to use pool buffers, the library and the user code need to work together to ensure a buffer is appropriately reclaimed. Specifically the library checks out a buffer and the user needs to tell the library that the message is done so the buffer can be returned. @rr118 what is the body type of your messages (Data, AmqpValue of byte[])? For byte[] payload, the library now allocates the array twice (one for the frame buffer received from socket and one for the user payload decoded from the message body). With SSL there are even more inside the framework code. I am making the following changes and we can check how they can help GC.
Thanks. |
I can make it MIT if thats easier (I am the license owner) On Fri, Oct 2, 2015 at 4:59 PM, Xin Chen notifications@github.com wrote:
Studying for the Turing test |
@xinchen10 Our body type is AmqpValue. My tests don't use SSL yet, but we will be adding this soon. I haven't had time to determine if this is appropriate for this issue, but I found a potentially promising Microsoft supported library (originally from Bing) for reusing streams that I thought you might be interested in. I'm going to take a deeper look in the next few days. https://github.com/Microsoft/Microsoft.IO.RecyclableMemoryStream @gregoryyoung Thanks. I'll take a look at this option. We are more comfortable with MIT licenses too. |
@xinchen10 If you want me to test the buffer pooling to see how it changes the GC usage just let me know. I can test whenever you are ready. |
@rr118 that would be great. I need some time to make sure the ownership of the buffers are managed correctly during the lifetime of the message but I will let you know when it is ready for testing. |
I enabled buffer pooling on the receiving code path (including container host and listener). The code is published in the buffer-manager branch. I also included a simple buffer manager implementation as an example (Features.BufferManager).
You can call |
Thanks! I'll check it out today and let you know the results. |
@xinchen10 My apologies for the slow response. I had another issue come up that I had to prioritize. I've done some initial testing and see moderate improvements. For my initial test I focused on having the server call MessageContext.Complete() without any processing. In my test I have 32 clients (spread across many VMs) sending a message body with a List as fast a possible. The typical buffserSize in IBufferManager I see being requested on the server is 1.4MB. The max frame size is set to 5MB. I am not using SSL. For this test I used your example buffer manager with a minimum size of 128B, maximum buffer 2MB and total size of 128MB. I confirmed all buffer requests were able to reuse an array from the pool.
With the buffer manager I am seeing 25% time in GC, 240MB allocated per second and 20% CPU usage. Without the buffer manager I am seeing 35% time in GC, 447MB allocated per second and 30% CPU usage. So some decent improvements there. The total receive rate on the server is about 500 Mbps. I still need to dig in to further understand where the 25% is coming from and plan to tomorrow. At a very quick glance with perfview it appeared to be mainly from strings. My hunch is that this is from the StreamData.StreamID property. Now that I understand the performance transporting the message to the server I am adding deserialization to the test. I don't quite understand the suggestion of messageContext.Message.GetBody() to avoid a payload copy because I have to eventually get it into the form of List on the server. Right now I use messageContext.Message.GetBody<List>(). I need to dig into the source more to get a better handle on your suggestion. At a very initial look here is what I see for the same test when I add deserialization. With the buffer manager I am seeing 40% time in GC, 345MB allocated per second and 38% CPU usage. Without the buffer manager I am seeing 40% time in GC (not sure why this did not increase), 450MB allocated per second and 45% CPU usage. The total receive rate on the server is about 350 Mbps. From here my plan is to dig in more and understand where the 40% GC is coming from and determine if there are any ways to get it down further. Any suggestions are more than welcome. Admittedly I know this is a large load, but I need to understand the limitations and do what I can to make the application efficient to decrease the number of instances required. So far the buffer manager appears to have had a positive impact on the GC usage in terms of allocations and overall processor usage. I think it will be a nice feature addition. More to come on my findings. |
The GC time and memory allocation is due to the AmqpContract object deserialization. The serializer has to create the object and allocate a byte[] to copy the bytes from the transport buffer (which is from the buffer manager). I haven't enabled buffer manager for the serializer, but even if it does, it won't help because of the byte[] data member (memory has to be allocated and copied). One quick solution is to do your own serialization. From your clients you send message with byte array body |
Another solution is to send StreamId as message property and Data as the body.
|
Thanks for the ideas. Originally I did think of putting the stream ID in the header. However the client has many streams, each with a relatively low update frequency. Combining the data from all the streams is a lot of data though and that is why I am batching up data from many streams into a single list. Another option I considered is using a map where the key is the stream ID and the value is a List<byte[]> of the events for each stream. This would minimize that streamID strings to be processed and sent over the wire. I like the idea of the custom serialization and just sending the byte array. In fact that fits relatively well with the rest of the application and would not be hard to implement. I really like the type system in AMQP for making a nice client API, but it may have to be sacrificed in the name of performance. Here are the new test results now that I understand what you were getting at with accessing the ByteBuffer directly. 32 clients with buffer manager % time in GC is < 1% and allocation is 5MB/s. The total received data on the broker/server is 1.6 Gbps. Without buffer manager time in GC is 30% and allocation is 200MB/s. This is a huge improvement! I'd like to suggest you make the example buffer implementation part of the library. People could use it by setting the BufferManager property to BufferManager.Default(int min, int max, int maxTotal), similar to how you can set the SASL profile. Your buffer manager implementation should work the majority of the time out of the box and this way people don't have to rewrite their own implementation every time, but can if they want to. You could have BufferManager set to null by default so the library doesn't use more memory than expected by default. For reference here is my client and broker/server code. I commented out the line where I set buffer manager to test without a buffer manager. Server:
Client:
|
👍 On Fri, Oct 23, 2015 at 6:25 PM, Xin Chen notifications@github.com wrote:
Studying for the Turing test |
I think here: https://github.com/Azure/amqpnetlite/blob/master/src/Net/AsyncPump.cs#L59 could benefit from reusing memory as opposed to newing each time. I can send a PR for this and could use one of the libraries MS includes https://msdn.microsoft.com/en-us/library/ms405814.aspx but this is not available on all of the supported platforms. I could also include what we use https://github.com/EventStore/EventStore/blob/release-v3.2.0/src/EventStore.BufferManagement/BufferManager.cs but I am not sure this PR would be accepted at that point since I would be including code copied from a BSD project.
Thoughts?
The text was updated successfully, but these errors were encountered: