Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Become an official dotnet repo? #466
Rx is an essential .NET technology, that should get more attention and better care. There haven't been any change in this repo for months, the last stable release (3.1.1) was more than a year ago (Nov 2016), even the last pre-release (4.0 Preview 1) was like 9 months ago (May 2017)...
I think the .NET version of Rx should be moved under the official dotnet GitHub page, where it would get more recognition, more attention, and probably better care from the community.
On a side note, there should also be a proper documentation available on the official Microsoft Docs website.
I don't know, it's definitely part of the .NET Foundation, but that doesn't mean anything.
I was also wondering if by any chance the reason the development of Rx/Ix pretty much stopped is because of the upcoming Async Streams? That would be a perfect new home for Ix and many of the tricks that we are doing with Rx nowadays could be achieved with async enumerables as well with the same level of elegance.
But it would be nice to have some comment from the owners.
First, some context. Some 5 years ago, shortly after shipping Rx 2.x, I moved on to co-found an internal project called "Reactor" to build "Reactive-as-a-Service" to drive many 1st party workloads here at Microsoft. Today, we power various products including Cortana, some Bing and MSN workloads, various Office 365 experiences, etc.
"Reactor" pretty much became the de facto place within Microsoft for development of Rx, albeit in a different shape and form to meet the requirements for a high-density event processing service. Such deltas include checkpoint support to persist and recover query operator state, use of expression trees to ship computations across nodes and devices, interoperability with reliable messaging layers using sequence IDs to provide replay capability, different scheduler techniques, etc.
Shipping this work back to the Rx repo over here has proven to be challenging. Building an event processing service brings with it many aspects that have some conflicts with a library-first framework-centric world view, especially if compatibility with existing Rx implementations has to be maintained (in terms of interfaces, the set of operators, etc.). Some of these challenges include weeding out some dependencies, addressing some design aspects, but also more pedestrian issues around actively maintaining an OSS project while meeting 1st party goals and requirements.
This said, some small bits and pieces of Reactor have made it over here, including some optimizations to disposables, the async Rx prototype which is a library-centric makeover of some of Reactor's core assets, and some work on evolving
As one of the original Rx team members, I firmly believe that the future of Rx lies in (distributed) event processing, especially with many other aspects of asynchronous programming having been addressed by complimentary technologies. For example, single-value asynchrony is best dealt with using
What's left is multi-value push-based data sequence processing, which is where Rx shines. This world itself is split in a synchronous and asynchronous variant, where the former is "classic Rx" and the dual to
So where does this leave us now? In the very near term, we can expect to see a burst of activity around
In the medium term, we're evaluating where to go next with "Reactor" besides continuing to support and grow our 1st party workloads. Deep inside the Reactor technology stack sit many pieces of functionality that can be more widely useful in the shape of libraries, ranging from useful .NET BCL extensions, over advanced expression tree technologies, to our variant of Rx. Coming the other way, one could see us ship parts of Reactor as cloud services that open up to 3rd parties. The story could be either, or it could be both.
Over the next couple of weeks, a number of conversations are taking place on our end to figure out next steps. The immediate term and most relevant one is at the MVP summit in Redmond next week, where we have set up a "Meet the Reactor Team" event to interact with Rx community members that are attending the summit. This discussion thread will definitely be on the agenda, and we'll post a summary here afterwards. Further down on the agenda is a meetup with the .NET folks to check in on alignment with Ix-Async and to discuss where we see Rx land in the bigger .NET picture.
In the meantime, it'd be great to hear from you. What would you like to see happen in terms of the evolution of Rx? Any feedback is welcome: existing feature gaps, platform support, contribution model, future technology directions, etc. In addition, I can free up some time to process PRs to the repo over here.
@petroemil, that's right. During the very early days of Reactor, we had various interactions between the teams to come up with a set of async Rx interfaces, and we landed the one in Orleans to eventually align with the interfaces in Reactor and Rx vNext.
As design went on in the Reactor space, we also "discovered" that Rx really is all about a family of interfaces, each dealing with an orthogonal requirement. The overall shape tends to be the same across the board, but details differ in each such parallel universe:
The first three bullet points are more or less well-understood, but they have some caveats:
These require some design and some awaiting of the outcome of async streams in .NET and C#. This project has acted as input to this design process, because the combination of
On to reliability, the need for sequence identifiers is clear in order to support replay of events in the case of a failover. During the early days where we worked with Orleans, this was already understood, but we ended up with a vector in the design space that's not a base vector in the orthogonal basis. The sequence identifier ended up on
In parallel, in the Reactor world, we also found out that we need more phasing for the lifecycle of hot artifacts (subscriptions, subjects, and observers in Rx parlance). The need for this arises from requiring a separation between instantiating computation graphs (typically compositions of operators in Rx), restoring their state, and kicking off the computation. In Rx, all of these phases are clobbered together in
All of this led to a re-invention of the Rx interfaces with more phasing, so all of the orthogonal concerns described above really take place in a new world where
Other orthogonal concerns include the use of reified object representations, which is often a necessity in the context of a distributed event processing system, where each event has to be able to be persisted, so
A final orthogonal concern is the use of extrinsic identifiers to refer to reactive artifacts, rather than language/runtime intrinsic object references. This led to a universe of Rx interfaces that allow users to associate identifiers with artifacts, which is useful to later retrieve proxies to these artifacts (e.g. to dispose a subscription that was created a while back). This, too, ended up with some collaboration with Orleans, to land on the
Now, take a step back. We have a hyper-space of N orthogonal concerns, each applying to the Rx interfaces (and, in fact, to the dual enumerable interfaces as well, e.g. an "enumeration" can be identified using an extrinsic identifier, may be described using an expression tree, may be asynchronous, may have a sequence identifier to repeat enumeration from a certain position, and may support
The main trick in Reactor is to accept such a hyperspace of interfaces but make various layers of the system handle one aspect of these by introducing it or shaking it off, typically one at a time. For example, async is introduced or retracted on I/O boundaries; replay out of an in-memory buffer in some compute node takes away the async part. Identifiers may need to be specified by a caller, but end up being shaken off at a service boundary where logical name resolution can resolve these to in-memory objects to interact with. Expression trees may end up being persisted, associated with some identifier, allowing the reactive artifact instance to be re-instantiated whenever needed (in response to receiving an event or when recovering a node after failover). And so on.
This way, the heart of Reactor has remained a relatively straightforward Rx implementation that does not have to worry much about being hosted with a soup of expression trees, artifact identifiers, sequence identifiers, reified events, etc. around it. In fact, some typical concerns for event flow across nodes in a compute cluster have been able to handled entirely at the edges of compute nodes, rather than permeating into the Rx library (e.g. with all query operators having to take special measures).
Looking at this interface design philosophy, Reactor is a "supertype" (or even a "type class") to many event processing assets, where we tried to separate as many orthogonal concerns as possible, and while doing so, we kept discovering more of them. Some of this thinking has influenced other technologies, as some of you may have noticed, often representing a snapshot in time of our thinking.
Going forward, it will make sense for us to pick off these pieces one by one, and see what makes sense to transition back into mainstream Rx. As mentioned in my initial reply, "async Rx" with proper alignment with "async streams" in .NET and C# seems to be a promising first start. The two directly complement each other or align well with use requirements (e.g. using async within a query expression). After that, I'm hoping to infuse some rejuvenation juices into expression trees, to hopefully get to a point where expression shipping can be made widely available. Next up would be moving both "sync Rx" and "async Rx" into the direction of supporting operator state persistence, where doing so for the former is well-understood, but for the latter it requires some design work.
@marcpiechura, the work on Reactive Streams is interesting and does in fact have some overlap with Reactor constructs around reliable messaging (cf. my description in the previous reply). It would fit rather well in the picture over here as one of the concerns that are dealt with on the "edge" of a compute node, just like reification of events, dealing with sequence identifiers, splatting of batches, etc.
What is not entirely clear is the extent to which one would need flow control through arbitrary query operators, which may be higher-order in nature. In Reactor, we entirely deal with such things on the edge of the compute graph by monitoring queue lengths and whatnot, which then apply back pressure to senders, which come in from over the network and are often pull-to-push adapters to external sources (e.g. EventHub, Kafka, etc.).
One of the key realizations in Reactor was that it's worth separating the core from the satellites around it. Reliable messaging is one such "satellite", where letting sequence identifiers flow through operators has very limited mileage. The same occurred for many other concerns, so we eventually landed on a layering approach with a (purely functional) core piece with a bunch of satellites around it, all of which are dealt with by some hosting layer. It'd be good to apply this same thinking to Reactive Streams, to evaluate if there's anything truly unique about it, or it's merely another shape used to cast flow control (at which point it'd be trivial to bring in as a satellite, possibly just as a protocol adapter to existing notions of flow control in our stack).
@bartdesmet thx for the detailed answer.
Could you eloberate what you mean by “reliable messaging” ? I currently would read it as reliable over the network ( like via EventHub, Kafka ) and then I’m not sure how that relates to Reactive Streams.
Are you also looking at Rx on other platforms, such as RxJs and RxJava, or is that collaboration limited to just the sharing of concepts? I'm asking as a consumer of Rx in 3 languages.
It sounds like finding a vision for Rx that aligns with the .NET eco system alone will already prove challenging.
@marcpiechura, thanks for the fruitful discussion.
My reference to reliable messaging is mostly to paint the analogy with another satellite that bridges that gap between logical artifacts (e.g. "observables") and physical realizations thereof. Some other technologies paint this distinction using "virtual" nomenclature.
Either way, the way we interoperate with these is by virtue of introducing (when producing) or taking away sequence IDs (when consuming) in order to adapt to the core event processing framework and take this burden of its shoulders. To me, any backpressure other than the type natively supported by the runtime or framework (i.e. blocking synchronously or asynchronously through a not-yet-completed future), feels like a similar satellite, especially when (numeric) values are being exchanged between consumer and sender, akin to link credit.
At the same time, my reference to reliable messaging in the context of Reactor is also in relation to the common use of pull-to-push adapters, which naturally give rise to backpressure on the "edge" of the core engine. For example, when a core event processing engine performs a checkpoint of operator state that requires a short pause time of the computation, it suffices to carry out some for of cooperative pausing at a safe point, which will lead to a subsequent
Previous experiments with backpressure constructs in query operators have led to similar conclusions as for the experiments we did with reliable messaging and sequence IDs. Namely that composition over query operators quickly breaks down. Over "simple" unary operators, things look fine at first sight. For example, pushing events through a filter or a projection operator can carry forward sequence IDs (in case of filter leaving holes, which may or may not be fine depending on context) and/or carry backward backpressure requests. (I'm again referring to reliable messaging here as an example of pushing such a concern down into the core processing logic.) However, the latter is already a stretch, given it fails to account for computational cost of these operators or their parameterization (e.g. selector or predicate functions).
In Reactor, we've had many cases where we want backpressure due to the composition of N operators exceeding M amount of assigned compute capacity. No single operator can assess this "global" requirement at a fine-grained per-event "local" level, but the hosting layer (including the "edge" where events come in, and where timers are fired from) has a much better of view on this with a much simpler mental model. By mediating all internal queues through such a hosting layer, places where buffers tend to blow up become visible, which is already a necessity for high-density compute hosting. It's worth noting that Rx only has a few such queues with
Note that all of the above is even without mentioning n-ary operators where it's unclear how to distribute a downstream backpressure request of multiple upstream sources. With higher-order operators with a dynamic number of upstream or downstream sequences (e.g.
And note further that all of the above quickly breaks down when any form of temporal event processing is needed where events are ordered across streams by application time (a la StreamInsight or Azure Stream Analytics), and the only reasonable place to apply any form of backpressure seems to be on the input adapters, because the downstream computation has a well-defined non-negotiable evaluation order for events, no matter how much "branching" there is in the processing topology created through query operators. Again, it works somewhat okay for unary operators, but for n-ary or higher order operators, the wheels tend to come off quickly.
Now, this is not to say that there wouldn't be any use for carrying backpressure through operators in some scenarios. In Reactor, we already are going down a path of having multiple core query operator library implementations, e.g. for async versus sync. Expression tree based query optimization attempts to tile maximally synchronous islands of computation to bind these to the synchronous query operator implementations for efficiency sake, in order to go "async where necessary, sync were possible". In a similar manner, predicate pushdown of filters into the reliable messaging layer effectively pushes a filter operation into the domain where events have sequence IDs (possibly over compute node boundaries, closer to the source). All such tricks involve pushing an adapter like a barrier delineating different parts of a computation. It is totally conceivable to do something like this where the barrier for backpressure constructs can be moved as far inwards as possible in case that brings a tangible benefit with it. To this day, we haven't encountered such a case, which strengthened our thinking around the role of "satellites".
At the end of the day, I'd like to review these design points in the context of a more complete picture where we also have a final design for
One more thing I'll just toss out here are some thoughts on bi-directional iterators which, combined with async streams, could provide yet another way to flow credit from a consumer to a publisher.
@amoerie, this is definitely something for us to consider, at the very least to ensure some degree of consistency across all platforms where Rx is present. This may start off as sharing concepts, but there may equally well be some investment in bringing a cross-language story. For one thing, our own Reactor technology over here increasingly needs client libraries in different languages, and the lack of good quotation (expression tree) support in languages other than .NET is interesting to see.
One should note that the different platforms have historically had somewhat different reasons to use Rx, for example to unify different types of asynchronous computation or event processing. That's an ever-evolving landscape for sure, with a good example being the introduction of
Then there's also the case of client libraries and UI frameworks, where Rx has also gotten quite some love historically. In these spaces, not that much has been moving, and libraries with adapters have done a good job at providing minimal ceremony to bridge with Rx. Things could still be better with first-class events a la F#, but that's mostly a language integration aspect. The most interesting aspect to me is how and if we want an Rx library that scales down and up all the way from devices with sensors, to clients with UIs, and to clouds with reliable high-density compute. With Reactor, we got there to some extent; in fact, all Windows 8 and beyond devices ship with a mini-Reactor engine built-in, to perform processing on client-side signals for Cortana. The same engine (or at least the design blueprints thereof) is used in our cloud.
Thank you @bartdesmet for the lengthy and detailed explanation, this is extremely exciting stuff and I can't wait to hear more about it and maybe even start experimenting with it myself.
But... all of this sound like "future plans" and not something you are planning to reveal and make publicly available any time soon - though I hope I'm wrong here.
So to go back to the original issue - is there any short-term plan for Rx itself? There are a number of outstanding issues waiting to be fixed (or PRs fixing these issues waiting to be accepted), and there's a v4.0 waiting to be released.
Also, while it's not closely related to this particular topic, but if you already mentioned that you are working to make Rx (and potentially Ix) work in distributed environments, I have a strong assumption that it must involve an underlying actor framework either from the Orleans team or the Service Fabric team. What would be your suggestion to someone who is thinking about choosing one or the other? As of right now Orleans is more feature rich, and the 2.0 release is coming up, but if Microsoft is putting its weight behind Service Fabric (and its own reliable actors), then I should also probably look into learning and using that.
@bartdesmet I wanted to come back to adapting Reactive Streams as a ground layer - what is the actual state of this proposal?
I really appreciate that you’re taking the time to write down such a detailed explanation, even though I can’t get into every detail I would like too add a few things.
As @Horusiath already mentioned, most of the optimization’s are possible and have been implemented, furthermore all of the operators, including bidirectional flows ( which are the core building blocks of an entire http server ), are available too. Please don’t get me wrong, my intention is not to advocate Akka.Streams, I only want to point out that it’s possible to build those things based on RS.
I‘ve also build a simplified version of an AsyncEnumerable based on Reactive.Streams, see and even though the implementation itself isn’t that important it shows IMHO one of the key benefits of having a shared SPI that is implemented by many different libraries. You can combine different libraries based on the SPI which handles resource management and flow control completely transparent to the user over the complete stack.
In the end all I‘m trying to say is, there is a SPI, envolved in a much bigger and richer ecosystem which has produced many inovations in the past decade, when it comes to stream processing ( except of course for RX
referenced this issue
Mar 9, 2018
@bartdesmet thank you so much for this update on the future of Rx.
Over the last few years I have been developing a general-purpose visual programming language on top of ReactiveX expression trees, Bonsai, which straddles exactly these different levels, from devices with sensors, to VR/AR applications, networking events, etc.
It uses many of the concepts you discuss here, including query compilation (with type inference), higher-order operators, sharing of query expressions, etc. I am really interested in following up this discussion, as the language is now growing to be used by hundreds of research labs, and there may be a lot of interest in sharing/co-developing some of these concepts, at least to the extent where things could be folded into Rx.
What I find particularly interesting in the way we have been doing things is that we are now at the stage where entire applications, including signal processing, render and visualization engines, data logging, batching (and any combination of the former), can be developed inside the language directly. In fact, the whole application can be driven by a single reactive expression tree, with sub-modules and state workflow logic expressed using
I have been impressed by how much we have been able to express in this way, and how it is obviating the need for inversion of control architectures, while retaining modularity, especially when using type-agnostic expression trees. The exercise has been greatly illuminating and I wonder how much of the design patterns we have discovered are being reused by you guys behind the scenes. It would be incredible if we could compare notes in some form.
@petroemil, Reactor is not using an actor framework though it was something they explored early on. It's running on Service Fabric, but I don't think it'd be using the Reliable Services APIs. Bart has done a shit-tonne of brilliant Channel 9 talks on Reactor of the years—search for his name on there.
(FWIW I think both will be well-supported going forward. After using the SF actor implementation for the last couple of years... you'll probably get more joy out of Orleans, but the Service Fabric runtime comes with so many more things like killer orchestration. There's an Orleans-on-Service-Fabric effort if you want to go with the why-not-both option.)
See these three twitter threads for a status update on the future of
If you want to help out then speak up and do. Otherwise
I'll happily increase my efforts on this repo. I myself have around 5 inactive and abandoned open pull requests and became quite frustrated by the state of this repo. I strongly believe that there's enough people around willing to contribute if they just had the feeling to be heard. I can offer:
I'd rather not
Count me in. Just get in tough with me here or at reactivex.slack.com.
The overwhelming vote from folks is to move over to
Update 2018-May-09: I retract my earlier comment about v3.1.1 bad internal asm versions "breaking builds". I was not able to reproduce the issue with VS2017 15.7 preview with a standard 2-layer app. My apologies for complaining about my "corner case" in such a central discussion.