-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Akka.Persistence.Sql SqlJournal caching all Persistence Ids in memory does not scale #4524
Comments
SqlJournal is based on the levelbd one in the JVM, so it definitely has its limitations. I wonder if at this point what you really need is a different journal. @Aaronontheweb this one would be interesting to port in the long run. |
@Arkatufus and I have been discussing this recently since we think the current in-memory query architecture is a little.... weird. Looks like the way they do it in the JDBC implementation is to just run a live query and not store anything in memory at all. That makes sense, now that I think about it - the way the PersistenceId queries were implemented for LevelDb was essentially the same model as SQLite - all of the data stored by that journal is local to that node since it all gets persisted on the file system. That approach will not work for long-uptime database implementations. We should probably rewrite this to do queries from the database. We should probably re-model these queries as such:
@Arkatufus what do you make of this? |
@ismaelhamed @Aaronontheweb I think the JDBC implementation solves at least one other issue: BatchingSQLJournal currently batches both deletes and writes together in the same batch. In SQL Server this leads to heavy risk of Deadlock contention that can do very unkind things to your persisted data. It looks like in this implementation writes are still batched at some level but deletes are still kept separate, which is a vast improvement. |
yeah, we should implement that change too @to11mtm |
I'll try to revamp how |
Would appreciate all of your inputs on this implementation. |
Definitely take a look at #4531 as a fix for this. I'll be reviewing it today or tomorrow. |
closed via #4531 - should see a version of this in Akka.Persistence.SqlServer shortly after v1.4.10 is released. |
Amazing, thank you @Aaronontheweb and especially @Arkatufus for such a quick turnaround on this not so trivial problem! Our path to the release is now clear. I am a bit nervous upgrading from v1.3 (especially with the persistence and cluster changes), but hopefully 1.4 is stable by now :) Thanks again! |
@ondrejpialek happy to help! Akka.Persistence.SqlServer 1.4.10 will be released with these changes in a few moments: akkadotnet/Akka.Persistence.SqlServer#170 |
Hello,
I was diagnosing
OutOfMemory
exceptions we've been seeing recently and discovered that theSqlJournal
stores allPersistenceId
s in memory. We have over 10GB of data in our event stream, with millions of uniquePersistenceId
s. At the time of writing these ids take about 700MB of memory, and I am not sure how long it takes for them to be read from the DB - it must have noticeable impact on startup time...I would argue that storing all this data in not scalable. Additionally, it seems that this is only there to support the
(All/Live)PersistenceIds
queries, which not everyone uses (we don't for example). I wonder if this could somehow be improved - I have some ideas at the end.Some background about our setup:
PersistenceId
s on each VM take over 2GB of memory, leaving only 1.5GB for OS and our user codeSoon we will be adding one more service type, leading to up to 4 nodes per VM. We cannot do this right now, as we are out of memory already. This new release will also create a new space of persistence IDs with many thousands added in, so will likely increase the memory needed by at least another 100 MB.
Ideas for addressing this issue:
off
queries to get livePersistenceId
s will not work, queries to get allPersistenceId
will read hot off the serverPersistenceId
set fromSQLJournal
(already feels like not the right place for it) and have it ping aPersistenceIdProvider
with everyPersistenceId
encountered AND allow the user to replace a defaultPersistenceIdProvider
(that would cache in memory) with one that is a Cluster Singleton for example.I think that overall this functionality can be useful, but since it may not be used by everyone I feel it's cost right now too high and should therefore be made opt in (or at least opt out).
Is there anything I missed? Depending on your preferred approach to deal with this I might submit a PR for this.
Many thanks,
Ondrej.
The text was updated successfully, but these errors were encountered: