Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor backend_xmin #201

Closed
anayrat opened this issue Aug 31, 2018 · 6 comments
Closed

Monitor backend_xmin #201

anayrat opened this issue Aug 31, 2018 · 6 comments
Assignees

Comments

@anayrat
Copy link
Collaborator

anayrat commented Aug 31, 2018

Hi,

When you use hot_standby_feedback you may want to monitor xmin delta between primary and secondary to be alerted in case of long running transaction on secondary which cause bloat on primary.

Regards,

@ioguix ioguix added this to the release 2.5 milestone Jan 29, 2019
@ioguix ioguix removed this from the release 2.5 milestone Dec 19, 2019
@ioguix
Copy link
Member

ioguix commented Dec 19, 2019

Can you elaborate here?
What's the point as the same xact running on the primary would result in the same bloat?
What would be the threshold criteria?
How would you set it?

I would rather add in backends_status the label xact_age (whatever its status) to check long running xact on both primary or secondary.

@anayrat
Copy link
Collaborator Author

anayrat commented Dec 19, 2019

Can you elaborate here?
What's the point as the same xact running on the primary would result in the same bloat?

The issue is similar to have a long running xact on a primary : we can't vacuum tuple. For example if you have a long running pg_dump on a standby server, this could degrade primary performance: index containing old tuple, HOT chain not cleaned etc

I've seen write spikes on primary server when the xact is "freed", backends can prune record by themselves and when you have a big workload this could lead to production incident.

What would be the threshold criteria?
How would you set it?

Hard to say, I mostly depends on your workload. Maybe 100K transactions is enough?

I would rather add in backends_status the label xact_age (whatever its status) to check long running xact on both primary or secondary.

That's a good idea but I think we should also check backend_xmin in pg_stat_replication from primary.

@rjuju
Copy link
Member

rjuju commented Dec 19, 2019

I agree that retained xmin horizon can be a real problem in some environment.

Counter proposal: a new check, say oldest_xmin, that would give the oldest age(xmin) and oldest xmin for usual causes:

  • single query
  • transaction (when xact_start != query_start)
  • 2PC
  • replication slot
  • walsender
  • the overall oldest. This way you can easily know if and why bloat is accumulating

@ioguix
Copy link
Member

ioguix commented Dec 19, 2019

@anayrat @rjuju: Good for me. It would requires some more specs now. And who want to do it now?

@rjuju
Copy link
Member

rjuju commented Dec 19, 2019

I'm volunteering to work on that.

@ioguix
Copy link
Member

ioguix commented Nov 4, 2020

done

@ioguix ioguix closed this as completed Nov 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants