Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isn't replaying domain message events by default slow as hell #144

Closed
justageek opened this issue Nov 8, 2021 · 15 comments
Closed

Isn't replaying domain message events by default slow as hell #144

justageek opened this issue Nov 8, 2021 · 15 comments

Comments

@justageek
Copy link

As the list of events grows for an aggregate root, just loading the aggregate root and having all those events replayed is just going to get slower and slower. If I am missing something please help me understand.

@frankdejonge
Copy link
Member

Not really, firstly because PHP is blazing fast. But, even when you're slowing down (and I'll promise you it's not as early as you might think), there is a thing called snapshotting which effectively allows you role a set of events into a combined state and use that as a starting point. In general it's more of a mental safety net than there is the actual need for one in most cases.

@justageek
Copy link
Author

Thanks for the reply, I only ask because in my limited use of the tools it seems we do already notice loading older aggregates roots is slower than newer ones, although it is hard to measure in our environment. There are any number of reasons why it is slower I'm sure. I would love to see an example of persisting a snapshot to mysql, in case we need to set that up.

@frankdejonge
Copy link
Member

frankdejonge commented Nov 8, 2021

You can checkout the interface provided by the library, it's not difficult to imagine what a MySQL implementation would look like.

@justageek
Copy link
Author

The aggregate root in question has 12,000+ domain messages, so what number would you consider large enough to slow things down?

@frankdejonge
Copy link
Member

That's a sizeable amount, although there are cases where even up to 50k you'd still have an acceptable latency. It very much depends on the infrastructure of course and what your latency limits are. Do you have a domain where there is a natural snapshot? In finance you have things like closing the books on a year, which can act as a natural cut-off point and snapshot payload. How much is your case being slowed down?

@justageek
Copy link
Author

justageek commented Nov 8, 2021

The scenario is a merchant running credit card transactions for a stored customer, and this merchant has been around for a while, so it definitely has doubled the time it takes to load the merchant and run a transaction. I also realize that the underlying code in the aggregate root could be part of the problem, I inherited this code base and will not claim they used your library in a good way. The aggregate root is part of a Laravel application, and each instance ends up with a number of Laravel collections as private properties, and each collection can have some number of objects, so this could be slowing things down as well, like this merchant has 125 customers, and one Customer data object gets stored in the collection for every "customerWasAdded" event that has been recorded. So far it has been very hard to troubleshoot the bottleneck, but the data is definitely the thing that is different, the same code works much faster for new merchants with little historical data (events, customers, etc).

@frankdejonge
Copy link
Member

Are they using the aggregate root for display purposes? I usually only store the information needed for the next decision, I doubt they need to re-hydrate the full transaction history to do that. Additionally, I think they probably choose the wrong aggregate root. I tend to limit my aggregate root to what needs internal consistency. I don't expect such far stretching consistency to be needed. I work in FinTech, so know something about transaction processing.

@justageek
Copy link
Author

I do think it was done poorly, and no it doesn't get used for display, but I think a crap ton of stuff was piled into this one aggregate root which doesn't belong, like all the customer stuff should be its own agg root. the merchant stores collections of customers, custom data field definitions (these are used to customize the transaction form), a different thing called info fields, similar to custom fields, just definitions of which fields are used, available subscription plans, available credit card processors, this is a small list typically less than 10, buy and sell rates for things they resell, all in all a bunch of data in lists / collections. Again I don't know for sure this is where the bottleneck lies, this is just one possibility.

@frankdejonge
Copy link
Member

Oh wow, yeah. They basically overloaded that single aggregate with a TON of responsibilities. In general it's better to model things as processes and look for consistency requirements during those processes. If there are no consistency requirements across several of the processes it's better to split them. Sounds like you've got your work cut out for you. A benefit is that you can decorate your message storage, but a migration process on a background process to filter effectively separate the processes and split aggregates. It's work, but it's doable.

@justageek
Copy link
Author

What do you mean by "decorate your message storage", sorry if that is a basic question.

@justageek
Copy link
Author

You are saying to try to write some sort of process that will split merchant logic and data and customer logic and data into 2 agg roots instead of 1? That sounds daunting for sure, I'm not sure I'd even know where to begin.

@justageek
Copy link
Author

Also do you think it is fairly straightforward to "convert" an existing aggregate root class to one with snapshots, but changing its interface and adding the new snapshot methods that are needed?

@frankdejonge
Copy link
Member

frankdejonge commented Nov 9, 2021

I had a night of sleep in between :)

The decoration suggestion would be to compose a repository that reads from old and new while you migrate. Pseudo code would look like this:

$messageRepository = new MessageRepositoryThatTriedOldFirstAndNewLater(
    $oldMessageRepository,
    $newMessageRepository,
);

MessageRepositoryThatTriedOldFirstAndNewLater {
  persist -> only store messages in the new repo
  retrieveAll -> read from old, upon last message, use retrieveAllAfterVersion to retrieve latest from new repository
}

This would allow you to have a background process that only transports related event to a new repository, filtering out and reducing the size for reconstitution.

@justageek
Copy link
Author

Thanks very much for all your input, I guess I have one basic question, is event sourcing useful in a single monolithic application, or is it only really helpful where you are using a message broker servicing multiple microservices. We are pretty much a monolith and I don't think it is really providing us with any benefits at this time.

@frankdejonge
Copy link
Member

It's useful in many cases and is not limited to mircroservices or monoliths. The cases where it's especially useful is in business processes that are not crud, and things where change over time is an aspect., which is usually the case for any process modelling. If the model is effectively storage only or there is not a lot of interesting events you often get ThisWasCreates/Updated/Deleted events which are generally not so useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants