Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PostgreSQL database bloat #1254

Closed
OKaluza opened this issue Oct 19, 2022 · 6 comments
Closed

PostgreSQL database bloat #1254

OKaluza opened this issue Oct 19, 2022 · 6 comments

Comments

@OKaluza
Copy link
Contributor

OKaluza commented Oct 19, 2022

Hi Piero, just opening this issue to get some help tracking down a couple of related problems we're having. In some cases a processing ODM job gets stuck without throwing an error, it seems to have a few possible causes, often erroneous input data or bad parameters, not exactly sure yet because it's hard to debug and usually only happens with large datasets, the process just gets stuck doing nothing, until killed.
This is the first issue, but not asking for help with that as I don't have a clear way of duplicating it yet.

The more pressing problem is that when this happens, the database can sometimes start blowing up until it fills the disk and errors out bringing everything down. In order to recover I have to get the db container back up with a larger volume size, then issue "vacuum full;' and the db returns to it's normal size.
From what I can find out this is caused by TOAST records in pg_toast, at a guess the app is stuck in a very fast loop of updating the status of the stuck task and these updates are faster than the autovacuum rate, but I'm no DBA.
(https://dba.stackexchange.com/questions/274100/monitoring-for-and-dealing-with-sudden-jump-in-toast-bloat,
https://medium.com/compass-true-north/dealing-with-significant-postgres-database-bloat-what-are-your-options-a6c1814a03a5)

So, have you seen this happen before? Do you think I'm right about it being rapid updates to the task status causing this, if so, is there a limit we can put in to status updates of some sort?
If not, I'll keep working on creating a test case to duplicate the underlying issue and report back.

@pierotofy
Copy link
Member

Hi @OKaluza,

It looks like there is some information missing from your issue that will be needed in order to diagnose and fix the problem at hand. Please take a look at the Issue Template, which will tell you exactly what your issue has to contain in order to be processable.

Also, double check that this is the right place. If you are just asking for information, reporting feedback or proposing a few feature, the right place to ask is the Community Forum, not here.

I'm marking this one now as needing some more information. Please understand that if you do not provide that information within the next week (until 2022-10-19 23:15) I'll close this issue so it doesn't clutter the bug tracker.

Cheers!
~ Your friendly GitIssueBot

PS: I'm just an automated script, not a human being.

@Saijin-Naib
Copy link
Contributor

Hey, @OKaluza !

We're tracking the database bloat issue here:
#1128

I'm not a PGSQL person (geopackage/SQLite) and I never run projects big enough to hit this issue, but I think we can tune some things. We're also looking to make a massive PGSQL database version upgrade in the nearish future that may resolve this behavior as a matter of course, without needing our tuning.

@OKaluza
Copy link
Contributor Author

OKaluza commented Oct 19, 2022

Thanks @Saijin-Naib didn't see that! I'll keep an eye on that issue, great to hear about the pgsql upgrade.
I still think it might be worth rate limiting something on the app side, as I've only had problems when it gets into a confused state, there shouldn't be that many frequent updates to the db and usually it seems fine with the default autovacuum settings, some edge case is making it go crazy though. I wish I could pin down the cause.
Feel free to close if you like, I can re-open if I find something more specific.

@Saijin-Naib
Copy link
Contributor

Would you maybe like to add your thoughts/experience from above into Issue #1128 before I close this?

Any more hands-on experience and thoughts on how we can get an STR (Steps To Reproduce) is hugely valuable, as this doesn't get hit all that often by many folks.

@OKaluza
Copy link
Contributor Author

OKaluza commented Oct 19, 2022

Sure thing, I'll try and duplicate it with a known problem job and collect some more info first

@pierotofy
Copy link
Member

Since apparently some of the required information is still missing, I'm closing this now, sorry. Feel free to reopen this or create a new issue providing all required information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants