-
-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARCHIVED -- Event Store Improvements for v4 #1307
Comments
@jeremydmiller Thank's for this write-up! Here are my thoughts and other ideas that I was thinking on: Async DaemonInternals of it's implementation are still enigmatic for me, so to give more detailed answers. So at first I'd propose to provide good documentation and samples for that (as we still lack it, afaik it's still only your blog post about that).
I like the idea for the prebuilded apps and samples with some integration to other tools. Regarding which cloud? Dunno, I still believe that Azure is poor mans AWS, but on the other hand, .NET community is in love with the Microsoft tools like MSSQL and others, so probably it would be better to start with Azure. Maybe with some cooperation with Microsoft it would give us some grants or at least marketing? Partitioning/Multi TenancyImho that's must have. As I gathered recently people's fears about Event Sourcing - performance is one of them. Also when I was explaining Marten to new people there is always a question (how single events table will handle the big load). Although I think that those fears are exaggerated, then I see a point of making our store more performant and also giving people hard numbers that "yes, we can do it, see, there is no point of being afraid". ProjectionsI fully agree, current ViewProjection mechanism is hard to maitain. I'm currently working on #1302. I already started some small unification of projections mechanism to make it (at least from the abstractions perspective) more generic. I was thinking that maybe that would be good start for discussions around the potential refactoring? I could provide my first PoC and from that having some concrete proposal we could work to make it right? About Projection snapshotting - it would be nice to give some flexible way of snapshoting. I'm not sure if doing snapshot one per few times would be huge benefit, but if we give eg. possibility to define that she/he would like to have it once per day, or other custom filter expression - then imho that might be huge benefit. I think that also two other types of projections are low hanging fruits and would be good "marketing" for coexistance with/migration from the ORMs like:
I'm all up for making this pluggable for other solutions (eg. Liquid projections) 👍 . Event MetadataI think that it's must have. For sure it should be optional, but for the distributed systems things like I think that it would be worth to check how NEventStore is handling that - as I know they have quite good implementation of the Metadata. Other things that I considerIntegration with messaging systemsIt's not easy for those systems to always keep the ordering of events, and it's rare for those systems to have "exactly once delivery" semantic. Normally consumers need to handle indempotency by themselves. Currently Marten doesn't allow to put events out of order (so eg. 2nd, 1st, 3rd). We'd need to change the current versioning mechanism to allow that and projections rebuild. Imho it shouldn't be super hard to deliver first option to give user possibility to set the version number for imported events. We discussed some time ago that maybe mechanism simmilar to Async Daemon would be also some potential options for that. Integration points with other Event Stores / UII'd like to create the integration point as I described here: #1194 (comment) and discussed with @gregoryyoung. So start with exposing our event store features as atom feed. Then maybe provide some swagger like simple UI (that might be also used for document part). Long Version for Events#1080 - imho this is must have for the version 4.0 if we'd like to make it high scale. @jeremydmiller what ar your thoughts? I probably forgot about something, so I might add something later. |
@oskardudycz I say we just convert to long ids for 4.0. Will have to explicitly test for the migration scripts, but that was coming regardless. |
@jeremydmiller great 👍 |
More on the async daemon
Convention Based Projections ConceptThe main idea here is to allow users more flexibility to do whatever it is they need to do with less code ceremony and easier to author code. Drop mandatory base classes and interfaces (they're still there, just wrapped around it). Marten itself will use some kind of dynamic code generation (ala Jasper or Lamar from Jeremy's prior work) to create an The following shows some of the possible method signatures and the hopefully minimal set of optional attributes:
|
Anything that would increase the performance of rebuilding projections in the Async Daemon, would be huge for us. Snapshotting and the performance optimizations for rebuilding that @jeremydmiller was mentioning would be great. For context, we currently have >3 million events in our event store and are storing ~25k new events per day now. Our rebuilding performance has noticeably gotten worse as the number of events increase. I am definitely willing to help contribute to any of these event store improvements for v4. |
Some observations based on our usage of Marten.
I'm glad to help with some of these improvements. First one will probably be metadata. |
Re-booting V4 Event Store WorkAlright, time to get this thing rolling again. V4 is heavily in flight on the Document Db side, and it's shortly going to come down to just the Event Sourcing work. This comment pretty well only covers projections. I'll have to follow up another day with metadata improvements, snapshotting, archiving, sharding, up/downcasters, and everything else I missed here... Here's some miscellaneous things:
ProjectionsHere are what I think are the main points of change and goals for the projections in V4:
Projection PatternsI could really use some feedback from real users on this section please
Defining Projections
Projection "Modes"Projections are applied and calculated at different times, and there's some significant opportunities for optimization. From my notes, projections will be processed in these modes:
Now, one at a time: Inline Projections
Live
Async ProjectionsThis is the big one for v4. All of the notes here are for the "project a single aggregated document for a stream" type of projection so far:
When receiving a "page" of new events in the async projection runner:
Rebuild Mode
Async DaemonFor v4, let's assume that we're only building out a much improved version of the current Async Daemon that depends on polling the database. For v4+, we should consider alternatives using queueing, messaging, cloud technologies, CDC replication from Postgresql, etc. Some major things:
|
@jeremydmiller is there a way to break things early, like giving us ability to check how our API usage is "wrong" for the future and then future-proof a bit? Or is a "reset your database with 4.0" kind of a thing? We are building a system with the event store that should hit production in the coming months 🙂 I love the ideas and improvements, just worried about the migration path. |
@lahma Hang on, I wasn't finished yet with the previous comment yet;-) And we generally do worry about any breaking changes -- especially to the database structure, but this is a full point release. |
@jacobpovar If you're still interested, I think we're gonna have to talk about some of your items. I'd wanna know why you needed to customize so many things. |
@jeremydmiller I already had a call with @jacobpovar, he has shown me what they're doing and what are his pain points. I can try to pass you that knowledge ;) @lahma when we have public API and implementation stabilized then for sure we'll publish prerelease and try to search for early adopters to get the feedback 👍 |
One pain point for me is being able to externally define events and how to apply them to projections that exist in another assembly. IE.. Projection in Assembly A and Event is in Assembly B which depends on Assembly A Without inverting this dependency (A depends on B instead of B depending upon A) it is currently not possible to define how that projection should handle this new event. Looking at the ViewProjection API, and other suggestions, I don't see a method to do so. This may be that I'm just not familiar enough with ViewProjection API, but if it is possible, an example would be good. I would also like to see a way to initiate a rebuild via the AsyncDaemon through a IDocumentSession. |
@malscent Why are you needing to do that is my question. |
@jeremydmiller which issue?
Also, another thing i have been experimenting with is the ability to define event transforms for obsoleting events. Where i can define a method that will translate one event type to another event type, for the purpose of building a projection, should the replacement of an event be necessary. |
#1. Why do you need to build your assembly references that way? How are you wanting things to be plugged in? Would it be good enough if projections could use base types or interfaces? That negates some of the original optimizations I'd intended, but at least it would work. For #2, maybe introduce projection versions? That could be a way to trigger a rebuild when it's detected. for the translation, @oskardudycz has some ideas and thoughts on that one |
@jeremydmiller We built our software with a "Core" module that contains a large amount of functionality with core events and projections. From there, because we wanted our software installations to be highly customizable, we built "Modules" hosted in separate assemblies that can be added to the project or not, depending upon the necessary functionality. Marten works with this fairly well, as when it encounters an event on a stream that cannot be projected, it simply ignores it. So we can add/remove modules as necessary to add/remove functionality based on what our customers need. The real struggle however, is defining what a projection should do with an event, and to tackle this, I built a customized projection that uses inheritance to determine what "Apply" method to use. However, this is severely limited, in that all my events must behave in a pre-defined ways, or else the projection doesn't know how to use them. I would like to be able to define how an event is applied to a projection with the event definition. Even if that is just an "Apply" method on the Event that takes the projection as a parameter and returns the projection. (Granted this will struggle with private members of a projection, but if that is a requirement, reflection can always be used?) |
Having some kind of Don't use reflection. If need be, use [InternalsVisibleTo]. In .Net Core world, that just takes the assembly name. No strong naming necessary anymore. |
Absolutely.. The idea is to try to divorce the logic of applying events from the projection and allow it to be defined with the events. So having some kind of interface that I can implement on the event that would override the projection's apply event would be great. |
This will break some of our code. But that is fine, I hate that code anyway. As long as there are ways for us to do the same things then I certainly wont mind not having to use this clunky interface.
I think that the answer is a resounding yes. Cross-stream aggregates are very common, at least for us. We have more of them than we have 1 stream-1 document projections. We also have projections that project 1 event to several documents. Imho, 1stream-1document projections are the "easy mode" that obviously have to work well. But being able to do cross-stream projections or fan-out projections in a nice way is a must for the projection support to be actually useable IRL. |
And as for 2-stage projections to solve cross-stream aggregation....eh...I dont think thats gonna be very nice to work with? But Id need an example to understand what you are getting at because I might be misunderstanding you @jeremydmiller |
@wastaz Hey man, that's the kind of feedback I needed. For the aggregation across streams, how do you identify the proper aggregate identity for the individual events? Some kind of well known field within the event? Something arbitrary? Could you determine that through the event json, or does it have to be within code? For the 2-stage aggregation, I was thinking about finally adding some kind of background map/reduce process, but I havent' thought through the mechanics much. The 1 event to 1 document pattern is easy at least. And I had it in mind that you hated the |
@jeremydmiller Good that its useful! :)
In basically all cases we have some identifying field in the events that we can use to "twist the angle" of the projection. To give you some examples so you understand a bit more of some common cases we have. We have an application that handles accounts. A customer can have several accounts, and each account contains 1-n periods. We have (among some others as well ofc) these projections
As you can see, we do a lot of projecting in different dimensions. Part of this is because of a choice we made early that we are now looking at eventually at some time maybe refactoring when it comes to the stream boundaries. However, being able to "twist the angle" of projections in order to do cross-stream, 1-1, 1-n projections etc has been very helpful in getting to the point where we can now talk about refactoring some of these things. And looking at we are doing and how our problems look Im convinced that even after the refactoring we will still want to do some of these things. What is common though is that in basically every case I list, we determine the aggregate identity via some field in the event json except in case 4 where when we project to multiple documents from the same event we generate new ids for the projections (since the document ids need to be unique) but have an indexed
I think we could make do with some kind of map-reduce. It's not impossible at least.
Honestly "hate" is a strong word. It's my escape hatch. We try to use the "simplest" way to do each projection there is. ITransform, AggregateStream, ViewProjection etc. And IProjection is what we reach for when everything else fails. So for case 4, that is an IProjection. It's an annoying thing to work with, but one thing that has been very nice with Marten is that there are ways to project on different levels of abstraction. There's not just a raw low-level IProjection, or just a high level AggregateStream, there is several "steps" of abstraction and you can start at the highest and successively drop lower as you need, until you are down at IProjection. I know "there are many ways to do it" isnt usually hailed as the best design (and is a hassle to maintain), but it is very pragmatic when done right. I wont blame you for trying to unify the projection stuff a bit more though. Id probably also try to do that in your shoes. |
Okay, couple thoughts here. "Exploding" the root event into multiple documents:I think this is gonna be relatively simple for the main async / inline / live flow, you'd just have a method with a signature like:
That'd be a little problematic for optimizing the rebuilding of the projection, but I'm not sure you'd be doing that very often. That might be a case where it'd be easiest to "rebuild" the projection by running a document transform instead -- which wouldn't hurt if we added more documentation, examples, and possibly some support for that in V4. If you did want to rebuild the projected data, I think we could do a producer/consumer that fetches the raw events in the "producer", then the "consumer" explodes that out into the proper One of the next things I wanna design out is how big event stores can have their projected views rebuilt with no down time. I have some thoughts, but it's too late in my night to get much out here. One event to one documentI think this is pretty easy, but I'd still want it to be as declarative as possible so we can get some optimizations around it. Aggregate across the streamsIf we can do something like this for the definition:
Then we could do similar of optimizations in the async daemon that I was outlining in the "aggregate by stream" notes. In rebuilds, we could parallelize the different aggregates by maybe first reaching into the events and finding the unique aggregate ids. If we were really good, and in some cases there were a finite number of unique aggregate ids (like by region, or country, or something where there's not a lot of cardinality), you could run parallel async daemons for each aggregate id to do much more in parallel. |
I am still digesting a lot of this, but a couple things.
This would be huge for us. Right now it is totally infeasible for us to rebuild any projections. We have 20+ million events in our store and it can take hours to rebuild anything. Our current approach is creating new versions of the projection and letting those catch up before switching the code over use them. So all of the performance improvements to the async daemon sound |
1.) Yeah, got that from @wastaz. It's just a matter of thinking through how to optimize that pattern |
THIS ISSUE HAS BEEN REPLACED BY #1608.
WIP: I just got back from a vacation and got to thinking about the event store after getting enough rest for once
Big Existing Issues
Other ideas for improvements
Maybe the async daemon gets completely rewritten with RxExtensions as opposed to the TPL Dataflow. I like the Dataflow lib personally, the an easy way to deal with the async daemon and multi-tenancy is to split streams by tenant and use a separate document session per tenant as necessary. Either way, I want the async daemon a bit more optimized for rebuilds and regular projections
Possibly do either a sample app or a pre-built app that hosts the async daemon process. We could go super slim or build out full blown Azure and/or AWS infrastructure for monitoring and maybe an admin UI. Some kind of support for clustering the daemon with failover. Some kind of support for triggering rebuilds? I'm already dreading the arguments over exactly what technology stack to use, but oh well.
Possibly a different pre-built application that incorporates some kind of service bus or queuing mechanism to pipe events captures in the
DocumentSession
through a listener to a queue where the projections would be built by some kind of lightweight (or just the real thing in a slightly different mode) async daemon. We'd have to deal with some message sequencing to make that work, but it's possible. Not a slam dunk 'cause some projection types have to be singletons because they're statefulProjection snapshotting. Like maybe you take a snapshot of an aggregate every 5 events and store that so that on domain aggregations are much faster. Kind of a hybrid between live and on demand. Plenty of folks have asked for this over the years.
Add extra, extending interfaces on top of
IProjection
that could refine the behavior of the async daemon for better efficiency. Stuff like, "does it need event metadata at all, or just the event data?" or "does it aggregate one stream at a time" that might change how the async daemon would work, especially for rebuilds.I would like to see us do a full replacement for both the existing Aggregator implementation and possible
ViewProjection
. I've got some ideas for this, but haven't written anything down yet. Don't scream at me yet;)Possibly do adapters so you can use existing projection libs like Liquid Projections from within Marten
The text was updated successfully, but these errors were encountered: