-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Outsourcing of network layer to 3rd party libraries #1372
Comments
The current network layer is battle tested, and I think, it haven't changed for the last year. So instead of throwing it away - let's just keep it among the others libs to come, so users could decide which lib suits their needs. And it can be thrown away at any time, if it's supporting cost will become too much. Also such choice will remove all concerns from the list. |
About opening up the Orleans messaging protocol - something worth looking at is https://github.com/apache/thrift. Maybe we can provide both protocols at the same time - current, heavily optimized, Orleans's internal, and thrift for external apps. |
Of course I totally agree with @wanderq - we would want to refactor the existing messaging protocol implementation to fit into the new "pluggable" layer. We will probably also keep it as the default one at least for some time, until any of the other ones proves better. Sounds like there won't be any objections to the above. It is obvious that this is a good idea and will benefit everyone. The only downsides I see are: What else will we be discussing in the virtual meetup? |
+1 to what @wanderq and @gabikliot said. I also would hate to loose the current "internal" messaging functionality which has proved itself to work well over several years worth of production usage. But some refactoring to a "pluggable" transport layer would be useful to allow greater extensibility for specific environments. Not least of those extensibility scenarios would be to allow someone to implement layered "secure transport" functionality -- say tunneling existing TCP socket messaging over TLS mutual auth network connection [for example #828]. So overriding of either/both "connection establishment" and/or "message transmission" parts of the messaging layer are likely useful extensibility points for the long term. One interesting example of true "messaging extensibility" would be to allow plug-in of native code / non-C# messaging implementations, such as MSRA's rDSN. One additional thing to bear in mind is there may be different extensibility options for (1) "gateway" (external facing) and (2) "silo-to-silo" (internal) communications. +1 to further discussions. |
We'll have a meetup to discuss the current implementation Thursday February 11th, 12:00-13:00 PST / 20:00-21:00 GMTThis is an online meeting for Skype for Business, the professional meetings and communications app formerly known as Lync. Conference ID: 215813217 |
Side note: for those who doesn't have SkypeForBiz installed yet - plugin is required to join and the download size is ~8Mb ( actually it's a full "Skype for Business" but it allows you to join anonymously) |
Sorry, but I can;t make it at this time. I can either before 13:00, or after 14:00, or Friday at 13:00. |
@gabikliot We definitely need you there I think. How about we move by an hour to 12:00-13:00? |
Yes, that would work well for me. |
Ok, moving to one hour earlier then. I'll update the comment above to minimize the potential confusion. |
And I can just call in via regular phone, right? I don't have to install Skype for business? |
Yes, you can simply dial the phone number and punch in the conference ID. Obviously, you'll miss all the visuals. |
So why not use hangouts, like we usually do? |
This is not a normal meetup. We are doing it out of band for people that are interested in this specific topic. We want to try Lync this time instead of Hangouts because of the limits it imposes and issues we've been having with hosting and joining them: two different links, inability to stop/resume, no phone access, camera issues, etc. |
I can't likely make to it, but the following are a few of my personal observations compiled here haphazardly, in case they provide additional ideas. Performance, latency and stabilityDo we have figures how much resources socket code (also kernel resources) makes use of in production loads on current implementations? Any really fast and stable implementation:
The tuning instructions vendor have look like the Mellanox one Performance Tuning Guidelines
The Technet page about RIO mentions
This is something I've found difficult to verify, but if true (could someone ask from sinde MS?), it looks like RIO can take a better advantange of the VT-x (I/O) virtualization extensions to provide a more stable and performant environment on Hyper-V. It looks to me any future, high-performance socket stack should make use of RIO and the abstraction in Orleans should be planned so as to make operating efficiently with it. Maybe even so as to lean towards a 3rd party implementation that is the likeliest to implement it. Testing and tuningIt would be great if Microsoft could provide the testing ground for setups that make use of NIC teaming when and if someone with bandwidth comes along that can provide the scripts to configure actual virtual machines. SecurityLikely a given in this world. A support for TLS is needed. This likely involves also an idea how to provide -- and store -- certificates and hence how to abstract that. Oh, I haven't programmed sockets for a long time. I haven't used RIO, so take with a large pinch of salt. :) |
Summary of the meetup: @jason-bragg presented detailed walk through Orleans messaging stack. Suggested plan:
|
Item 3 say we should focus on client->silo. I dont think so... We cant't have (mutual-)TLS silo-to-silo with current implementation. (Something @ReubenBond questioned me a while ago) RIO is not xplat and has nothing to do with TLS actualy. It is a very low level I/O layer tha some xplat socket library should use if there is the case. I agree with Gabi's strategy except that we should replace the current regular .Net socket library with something xplat and more specific for the job. This must be made in a configurable way so we can enable/disable TLS for example and other features as suggested by @jdom. I think we shoukd create a new issue with checkboxes and tasks + a milestone so we can work on that just like we did with .Net Core migration. |
I went through the recording. Maybe in order to clarify here about RIO is that I used it as a vehicle to expose and API that is different than one's regular socket and it takes into account concepts that I believe are important on stability and performance. In that way it looks like a good API surface to study when thinking what higher level abstractions to take. In other words, that part was not about TLS or even cross-platform more than to point out that it looks plausible that whatever cross-platform library implements it on Windows stands to gain a lot. The high-performane part will be something else on Linux, FreeBSD or Mac. Unfortunately I'm not actively following this space to state well-defined observation, but it looks to me <edit: Further interesting resources on user-space networking Cloudflare: Single RX queue kernel bypass in Netmap for high packet rate networking. <edit 2: Some Linux specific resources Monitoring and Tuning the Linux Networking Stack: Receiving Data. |
For posterity, here's the recording. |
We started a discussion with @nayato about potentially replacing the Orleans messaging stack with DotNetty. So far looks very promising. @jason-bragg is going to try to do that in the next couple of weeks. |
While I understand why your looking at DotNetty I'd prefer it if the team looked at libuv at the work already done by the ASP.NET core team with regards Kestrel and performance. It also makes sense to use/extend the libuv wrapper/library created to allow .net core to send/recive udp datagrams in a platform agnostic maner, rather than push network communication onto another different 3rd party library |
Still ramping up on DotNetty. The examples are clear but not very advanced. Anyone know of any open source projects using DotNetty (or Netty if necessary) that I can check out to see more advanced usages? |
The Akka .Net project is using DotNetty as a networking layer. |
@luomai It looks like Akka is moving to Aeron (via @dVakulen), so perhaps Akka.NET will move to its .NET equivalent if/when it is done. |
@luomai, I don't see the comment here, but I thought I read something you wrote about networking with high performance streams. We've done a fair amount of work on this, but something you may want to be mindful of is that, at it's heart, Orleans is an RPC based system. The networking models for such a system will -never- be as fast as direct dedicated high throughput architectures. Part of the streaming work we've done has considered using dedicated channels between silos for stream data. We've abstained from this approach for a number of reasons, but if the highest throughput is critical to the success of your project, you may want to implement a stream provider that uses dedicated network connections for streams and dispatch the events as immutable data in grain calls. This would avoid all unnecessary data copies and avoid messaging overhead of the RPC model. |
@jason-bragg Thank you very much for the detailed reply. I deleted my comments as I felt that the performance numbers that I used is not really an apple-to-apple comparison yet. So it might be unfair to the current networking layer design. :) Also, thanks a lot for the suggestion of using a throughput-optimized stream provider. It sounds like a pretty reasonable option for us. I will definitely look into it. |
@jason-bragg Following your suggestion, we have been playing with stream providers and move from the RPC based communication (only the throughput-intensive part) to a CPS-like model based on Orleans streams. This moment we are using the default SMS stream provider with a FireAndForget option as well as immutable data objects. We are also wondering if there is an easy way to configure (or easily extend) the SMS stream to use dedicated network connections instead of writing our own stream implementation which may require touching a lot of internal components in the runtime. We are a bit worrying about it seems that using streams that wrap dedicated network connections may break the virtual stream (or actor) interface in the Orleans, right? Can you share some thoughts? Thanks a lot! :) |
The qualification (in bold) is important, because writing a stream provider that uses dedicated connections is non-trivial. I don't see a straight forward way to modify the SMS stream provider to support this either. The prototyping I worked on for this used a networking layer that supported different 'channels' on a connection. I had one dedicated connection to each silo (modeled off of the existing Orleans networking) and a channel per stream. Where this became a problem was that this prototype did not support features like independent delivery of stream data per grain, guaranteed delivery, or recovery. In general, it violated some of the behaviors we'd established for our streaming infrastructure. It’s viable for a more specialized system and is necessary for the best performance, but it means giving up many of the advantages of the RPC model. It really all comes down to requirements, and I don’t know the requirements of your system. If the service has many high throughput streams that need processed immediately, where some data loss (due to transient errors) is acceptable, piping the events into dedicated connections will give you the fastest service, at the cost of development time. If performance requirements allow, optimizing the SMS stream provider may be a better approach. In addition to gains from micro-optimizations, bulking and batched delivery can be added, reducing the cost of the RPC model on high throughput streams. This, however, would not help much if there were large numbers of sparse streams. If the service requires reliability and high throughput, but response time is less important, a persistent stream provider can be used that stores the events in a backend queue and processes them as fast as the cluster can consume them. Can you provide some details regarding the performance requirements of the services streaming components?
|
Hi Jason, thank you very much for the detailed reply. It helps us a lot to clarify the design of our system in using Orleans in a proper way. We are prototyping the system right now and haven't had very clear numbers for the concurrent number of streams. To roughly estimate it: the target deployment cluster has 100 - 1000s servers and each server can have 100s grain instances (we are relying on the default grain deployment strategy this moment). A grain's ingress and egress streams are around the level of 10s. In total, this can sum up to millions of streams in a cluster. We can control the topology of the grains so that the numbers of consumers and producers can be adjusted to resolve single-node bottlenecks. The stream is expected to last for a long time because we aim for supporting continuous stream queries. We have attempted to address the data loss problem in the grain level by implementing a combination of techniques that include message acknowledgement, re-transmission, state checkpoint, and back pressure. Like you suggested, we actually have done quite a lot of work to improve the grain performance. A major part is to mitigate the networking layer's workload, including event batching and lazy-deserialization (i.e., all deserialization are pushed into the grain level using pre-registered types parsing over unsafe byte buffers). After doing this, the messages contain basically raw byte arrays that can be passed over network in a cheap manner. Regarding performance, in a single-server benchmark, 6 virtual CPU cores can read 10 - 20M events (20 to 32 bytes each) from a network and compute results in real-time. They are doing complex stream processing logic including stream aggregation and join. The benchmark generates around 1.5-3.2 Gbps network throughput. The current bottleneck is again the automatically generated Orleans deserializer which are heavily triggered between the Orleans clients and servers, where [Immutable] cannot help. It would be great to see if this number can be further improved when the new networking layer (DotNetty) comes into play. Thanks again for your detailed reply and suggestions. BTW: I noticed that in the Orleans roadmap. There is an opened to-do task of implementing declarative query processing using Orleans. This is similar to what we are doing here right now. Once our codebase become established and well tested. We will try to look at if we can port our code to support the example declarative query in the documentation, and eventually contribute back to the repository. |
This is great to hear. To my knowledge, this is the first time the SMS stream provider has been used in such an ambitious project under such strict performance requirements. Please let us know of any bottlenecks you encounter, as this is a great opportunity to further tune these systems.
The optimizations you've described, transmitting batches of raw binary data, is likely the best one can do. If these batches are large, you may be able to get some additional performance tuning some of the networking knobs, like the default buffer size (BufferPoolBufferSize). EDIT: I missed the "client to server" part, I was thinking grain to grain. The below is not helpful for client to server. Something worth considering is the use of stateless workers as stream consumers. This is not usually suggested, and I'm not actually suggesting it now, as much as raising it as something to consider if the performance of the current architecture is insufficient. Grains that are marked as stateless workers have some very interesting characteristics. First, they are always local to the caller, so no network copy is necessary. Since they are always local grains, they are not registered in the distributed grain directory, making them more resilient to silo outages and cheaper to activate/deactivate. Despite the name, their is nothing strictly prohibiting them from being stateful, they are merely optimized for grains that have no state. Stateless workers, and their performance benefits, can be used for stateful grains given the following conditions.
If your stream topology is such that stream producers and consumers can be limited to being on the same silo, using stateless workers for consumer grains may be a viable optimization. As an optimization, this is not a good general solution as it is viable for a rather narrow set of stream topologies, but for those situations it can make a significant difference. |
This question has been brought up more than once - should we 'outsource' the low level networking/socket layer of the Orleans stack to a 3rd party library instead of maintaining the code that deals with socket connections, sending/receiving messages, and buffer pool management within the Orleans codebase.
Potential benefits:
Concerns:
Another somewhat related question that's been discussed is about opening up the Orleans messaging protocol, so that requests to grains could be sent without the need for OrleansClient and from different software stacks and languages.
The first step is to have a virtual meeting to go over the existing implementation of the networking/messaging layer to get interested people more familiar with the status quo and to answer any questions about it.
This issue is intended for accumulating initial questions and ideas in preparation for the meeting. Please comment to add or correct.
The text was updated successfully, but these errors were encountered: