Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bitnami/postgresql-ha] watchdog capabilities #22529

Closed
DorBreger opened this issue Jan 20, 2024 · 8 comments
Closed

[bitnami/postgresql-ha] watchdog capabilities #22529

DorBreger opened this issue Jan 20, 2024 · 8 comments

Comments

@DorBreger
Copy link
Contributor

DorBreger commented Jan 20, 2024

Name and Version

bitnami/postgesql-ha

What is the problem this feature will solve?

Currently the chart offers a Deployment that is able to create multiple Pgpool-II instances. Unfortunately the pgpools aren't clustered, despite Pgpool-II itself having that capability. This can cause multiple Pgpool-II instances not working together to promote different standby servers to primary in case of the primary node failing, or even a single Pgpool-II instance losing connection to the primary node due to in-cluster networking problems, causing a split-brains scenario.

What is the feature you are proposing to solve the problem?

To enable watchdog, the Pgpool-II clustering capability. the pgpool will have to switch from a Deployment to a Statefulset, To have deterministic DNS names, or allow the choice between Deployment and Statefulset to be configured in the values.yaml. I would be very glad to open a PR for the change, as I already have something cooking locally.

@github-actions github-actions bot added the triage Triage is needed label Jan 20, 2024
@DorBreger DorBreger changed the title watchdog capabilities [bitnami\postgresql-ha] watchdog capabilities Jan 20, 2024
@DorBreger DorBreger changed the title [bitnami\postgresql-ha] watchdog capabilities [bitnami/postgresql-ha] watchdog capabilities Jan 20, 2024
@github-actions github-actions bot removed the triage Triage is needed label Jan 20, 2024
@github-actions github-actions bot assigned CeliaGMqrz and unassigned carrodher Jan 20, 2024
Copy link

github-actions bot commented Feb 5, 2024

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

@github-actions github-actions bot added the stale 15 days without activity label Feb 5, 2024
@CeliaGMqrz CeliaGMqrz removed the stale 15 days without activity label Feb 5, 2024
@CeliaGMqrz
Copy link
Contributor

Hi @DorBreger

Thanks for your feature request!

We are glad to hear that you are willing to open a PR to make this change. It sounds like a great improvement to enable watchdog and I appreciate your efforts to make this happen. The Bitnami team will be happy to review it and provide feedback. Here you can find the contributing guidelines.

@DorBreger
Copy link
Contributor Author

I have experimented a bit and I was wondering if such a feature would be necessary? It seems the pgpools don't promote a new primary themselves, rather they only look for a new one in case of a failed primary. Watchdog would mean that only one pgpool is active at a time, meaning if that pod becomes inactive for any reason (even a node drain being ran) the database is inaccessible until failover happens at the pgpool/watchdog level. On the other hand, watchdog will help in some corner cases of network partition. Waiting to hear your feedback.

Copy link

github-actions bot commented Mar 4, 2024

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

@github-actions github-actions bot added the stale 15 days without activity label Mar 4, 2024
@CeliaGMqrz
Copy link
Contributor

Hi @DorBreger,

Sorry for the delay.

Thanks for your feedback and your initiative to contribute to this feature. To better understand the impact and necessity of this improvement, we need more information about the split-brains scenarios you mentioned, especially in the context of a single pg pool.

We acknowledge that this feature would be beneficial to improve high availability. However, if implemented, it should be made optional. Also, we think it's important to include a clear warning about the drawbacks you mentioned, specifically regarding the database becoming unreachable.

We would greatly appreciate it if you could provide us with more details on this matter.

@CeliaGMqrz CeliaGMqrz removed the stale 15 days without activity label Mar 6, 2024
@DorBreger
Copy link
Contributor Author

Hi @CeliaGMqrz , thank you for responding.
In certain possible configurations, a pgpool initiates a failover when it finds the primary node backend unavailable. When using watchdog, the pgpools vote on whether the primary node is actually available.
However, it seems like in our configuration it is repmgr that initiates failovers and when a failover is detected the pgpool simply looks for a new primary. Therefore if we will be using Watchdog we would either need to have it initiate failovers instead of repmgr in order to benefit from it. Otherwise, the current possible configuration of multiple pgpools that aren't clustered works well enough, and it has the advantage that if a pod goes down for any reason, even a planned worker outage, the database stays available, unlike in a watchdog configuration in which the primary pgpool is the only one that can serve the application database services.

@DorBreger
Copy link
Contributor Author

I think that the other issue I opened, in regards to a pgpool exporter is more beneficial to improving the availability of the services this helm chart provides. it is rather easy to implement and I'm willing to do so, I only need some guidance from youm

@CeliaGMqrz
Copy link
Contributor

Hi @DorBreger

After considering the incompatibilities and discussing them with the team, we have decided to focus on the other open request. Thanks for your feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants