Feature: The ability to fold historical events into one another (to keep the Event Store size managable) #179

ReinderReinders · 2023-01-18T18:37:34Z

This one is perhaps not a direct feature request but more of a concept I am thinking about and I thought I might share it here. Perhaps it doesn't really fit into the design and philosophy of Eventuous, but I'm curious what others might think about it.

I am currently building a multi-tenant cloud product that will use an Event Store (PostgreSQL) as one of the components within the system. Multiple applications will have subscriptions to this Event Store (and will also each push Aggregates). The requirements demand that the Event Store is a historical system, since other applications might be added to the product later and these will need to be able to subscribe to the Event Store and build a full read model (including Aggregates that have not been updated in years). This means that I cannot use the 'Archive' feature as it is described in this section of the documentation, as the documentation warns against it:

To take the use case from the documentation as an illustration: certain products might not care about the history (the Booking system is not interested in Bookings from five years ago) but for my product the history is a requirement. But perhaps not the entire history (see below for my idea).
My concern is that, especially because the product will be multi-tenant, the Event Store will grow to a huge size and this will impact the performance (especially in the case described above where a new application is added and it will have to catch-up the entire history. That could eventually stretch into an import lasting days). We are already discussing mitigating strategies, such as for instance having a separate Event Store for each tenant (which is a good idea in any case), but even then I am expecting a great number of events for each tenant. I might further subdivide the data for a single tenant into multiple Event Stores (for instance, splitting the Aggregates 50-50 between two stores) but this would be difficult to configure/implement later on if the system was already taken into Production, and it would be hard to predict beforehand which domain Aggregates are likely to receive the most events (which would make for instance a 50-50 split basically a shot in the dark). And even with all mitigating strategies in place (1 Event Store for 1 tenant with only 1 of several domain sections from an application), the nature of a historical system means that twenty years from now, there will still be a huge history.
However, my requirements only demand that a new application must rebuild the current state of the Aggregates; each individual event from five years ago (configurable, of course) is not interesting, only the result (current state). So I am thinking about a concept of 'folding' old events into a single event that contains only the end result at that time.
An example (using Create, Update and Delete events and Entities instead of Aggregates since I don't (yet) use true DDD in the product):

jan 1st, 2020: Entity 1 Created (Name=Demo)
sep 1st, 2021: Entity 1 Updated (Description=Later Update)
jan 1st, 2022: Entity 1 Deleted

Execute the 'fold' over the Event Store with jan 1st, 2022 (midnight) as the parameter provided (i.e. all history before that date may be folded).
Result:

jan 1st, 2022: Entity 1 Created (Name=Demo, Description=Later Update, HistoryFolded=jan 1st, 2022)
jan 1st, 2022 (but later; timestamp excluded for brevity): Entity 1 Deleted

The HistoryFolded field (or something like that) would tell a consuming application that no historical events are known from before jan 1st, 2022. This would be enough for the needs of my product.
The reason I want to retain a Created event is twofold: one, I need a place for the HistoryFolded field, and two, one of my consuming applications is interested in retaining some fields even for deleted Entities (so the user might for instance be shown a view with deleted entries: "Entity 1, Named "Demo", was deleted on feb 1st, 2022." ). In other words, my consuming applications might still be interested to know that there once existed an Entity named Demo, but it has since been deleted.

I could implement something like this by creating an application that reads and folds events from Store 1 and write my 'folded' Entities to Store 2, but this would still require downtime (in order to switch all applications over from using Store 1 to Store 2; and you would technically have to turn the entire system off during the operation in order to avoid missing new events that were written to Store 1 after a 'fold' has already been executed. i.e. no application can append events to Store 1 while a fold is occurring). This would not be my chosen solution.

Could Eventuous possibly support something like a folding feature, or have I just pitched one of the cardinals sins of Event Sourcing (I am still learning the concept, and have not read all there is to read about the topic)?
One of the concerns I could identify would be, what would happen to a subscription that is currently (re)building a view model while a stream is being folded? You can't really shut down subscriptions at runtime. So you can't really get around the issue of requiring downtime.

I'm curious to see what others might think about this.

alexeyzimarev · 2023-01-19T09:10:33Z

It might be related to logical or physical snapshots. The logical snapshotting is the easiest to implement as it's just an event. I think you can do it already now. Just fold the state to an event-like object and apply it as normal. Add a handler to the state record to apply the compacted state event to the state itself (basically, replacing all the information in the state).

The only part is the stream truncation that you'd need to do manually. I still haven't decided how to automate it, but the initial discussion happened in this old issue #82

mytresor · 2024-03-15T14:13:26Z

There is an issue currently with the snapshotting implementation (like the one you've described). It affects the Optimistic Concurrency. Currently, OC depends on these props:

public int OriginalVersion => Original.Length - 1;
public int CurrentVersion => OriginalVersion + Changes.Count;

This way Aggregate always requires you to have the entire event history to keep the Original version in sync. So even if we have a snapshot event with the current state, we still need to load all of the previous events to have a valid array Length.
We just got a need to implement snapshotting and stopped here.

Maybe it's a good idea to have OriginalVersion virtual or at least internally changeable? This way we can store the position in the snapshot, read only the snapshot and events after the snapshot, and restore the position during the snapshot unfold. It will keep ESDB optimistic concurrency control happy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: The ability to fold historical events into one another (to keep the Event Store size managable) #179

Feature: The ability to fold historical events into one another (to keep the Event Store size managable) #179

ReinderReinders commented Jan 18, 2023

alexeyzimarev commented Jan 19, 2023

mytresor commented Mar 15, 2024

Feature: The ability to fold historical events into one another (to keep the Event Store size managable) #179

Feature: The ability to fold historical events into one another (to keep the Event Store size managable) #179

Comments

ReinderReinders commented Jan 18, 2023

alexeyzimarev commented Jan 19, 2023

mytresor commented Mar 15, 2024