New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PGO v5.3.1, shutdown master DB server, new master in recovery state for 20 min #3673
Comments
Hello @stanleyyzhu, Thank you for the detailed description of your scenario. One other thing that might be helpful would be a copy of your PostgresCluster manifest, particularly if you are using any customized Postgres settings, as shown in our documentation. |
Thanks a lot tjmoore4 for your guide.
|
@stanleyyzhu Thank you for providing your manifest. Two separate options might be worth exploring in more detail. Certain Postgres configuration adjustments would likely minimize the cluster's wait time. For example, setting That said, a more robust option would be to define another pgBackRest repo in addition to your local repo-host. As described in the "Set Up Multiple Backup Repositories" section of the documentation, this would allow you to store backups in cloud based storage in addition to your local backups which I would expect to minimize your recovery time as well. Hope this helps! Feel free to let us know how any further testing goes. |
Many thanks @tjmoore4 for your suggestions I'll test Regarding multiple repos, I actually thought about this but it might not work. Reason is that the 2 repos will be managed by the same repo-host if my understanding is correct. (please correct me if I'm wrong) We are not allowed to use public clouds but I did a test with 2 local repos. IMHO one local PV repo + one remote repo would have the same thing: Two repos are all managed by the same repo-host (i.e. one pod), and the local repo is a block device with "RWO", this local PVC cannot be mounted on to multiple instances. When the worker node is down, K8S/OpenShift has no time to unmount the PVC from the old repo-host so the old repo-host will be stuck at status "TERMINATING". Consequently, K8S/OpenShift won't be able to create a new one to take over the resources for repo-host (doesn't matter the 2nd repo is local or remote). Correct me if I'm wrong. In the meantime, speaking of remote backup, I actually also have a side question: I'm wondering if there is any plan to have SFTP added in as an option? or if there is any reason PGO doesn't want to use SFTP? I think this could be a moderate supplementary for the environment where public cloud is not allowed. Thanks a lot for your help |
This is a quick update that I have changed The difficulty here is that log doesn't show me what is running during the 20 min so it's hard to find out the right parameter to adjust :(
Logs from the new master:
|
Hello @stanleyyzhu. Sorry to hear that your adjustments so far haven't resulted in any improvement. A couple of thoughts to consider. You mentioned that public cloud solutions are not an option, but would a self hosted S3-compatible system (like MinIO) be an option for testing? That should also allow for backups without requiring an additional PVC. Also, we were wondering, in your original scenario, what your Pods and Nodes looked like when things started working again and your primary was back up. |
Hi @tjmoore4 , sorry for the delay. I was tied up. I'll deploy a MinIO server and share the test result ASAP (please bear with me as I have no experience with MinIO, not sure if it would be easy to bring up one). And thanks a lot for introducing MinIO, I didn't know we could deploy S3-compatible system locally. About pods/nodes allocation/status change:
Below is the test result:Pod allocation before shutting down a work node:
The master DB instance and local repo-host were assigned to w03. Then I shut down the w03 worker node at 21:53:36 UTC (w03 work node "STATUS" is "NotReady")
PGO detected the unreachable of master DB, instance "vdmfpgo-instance-wwq8-0" was promoted as a new master
At this point, the new master DB was read-only because it tried to connect to the repo-host which was also not reachable
About 5-6 min later (~ 21:59 UTC), OpenShift started to try to terminate the pods running on w03 worker node
This status will remain forever (below I will post the status after waiting for 1 hour). About 20 min later since shutting down w03 worker node (~22:14 UTC), the new master DB became read-write (i.e. recovered)
The pods with "STATUS=Terminating" remained. And I double checked the new master DB logs, confirmed that "unable to find a valid repository" error still popped up (to confirm that local repo-host was still down)
Left system there for about 1 hour, checked again at about 23:17 UTC, pods with "STATUS=Terminating" were still there, and local repo-host was still down
Then I started up the worker node w03, waited less than 1 min, those "STATUS=Terminating" pods were all recreated:
And then all went back to normal. |
@stanleyyzhu Thank you for the detailed information, this has been very helpful in understanding your scenario. We've done a bit of testing, but we haven't been able to replicate your exact scenario. One thing we were curious about was the describe output from your repo-host Pod after you shut-down your w03 node, i.e. |
Thank you very much @tjmoore4 and PGO team, Below are the describe outputs of repo-host-0 at different stages. As you mentioned you couldn't replicate the scenario, wondering if your repo-host was stuck at "Terminating" status? Basically my OpenShift environment uses "ocs-storagecluster-ceph-rbd" as storageclass and the accessmode is "ReadWriteOnce". BTW, it might be unfair to let you and your team spends too much time on this (as this is the community version). I'll be more than happy to continue to test and return info if your team is interested in this scenario. But if your team doesn't have much time, It's totally understandable and I'm OK as well (at least service can still be back after about 20 min)
Summary of what I saw from the describe of repo-host-0After shutting down the worker node, repo-host-0 popped up a warning event "NodeNotReady" and "Conditions.Ready" became "False" Test procedure and test resultBefore I shutdown any worker node, master DB pod and repo-host pod are running on w01
At this moment, the describe output of repo-host is: Then I shut down w01, wait for about 30 sec, the describe output of repo-host is: Then I checked the describe again about 4 min later (before OpenShift terminating the repo-host-0): Same as the previous describe About 5-6 min later, OpenShift changed the repo-host-0 status to "Terminating",
At this moment, the describe of repo-host-0 was: About 10 min later, checked the describe of repo-host-0 again: About 20 min later, when DB new primary became read-write, checked the describe of repo-host-0 again: |
Hi @stanleyyzhu As we haven't received any updates on this issue for some time, we are closing it now. If you require further assistance or if this issue is still relevant to the latest CPK v5 release, please feel free to reopen this issue or ask a question in our Discord server.. For additional information about Crunchy Postgres for Kubernetes v5, including guidance for upgrading to the latest version of CPK v5, please refer to the latest documentation: https://access.crunchydata.com/documentation/postgres-operator/latest/ |
Problem description
PGO v5.3.1, three worker nodes run with 3 postgres DB pods + 1 local backup repository so the local backup repository and one of the 3 DB pods must be running on the same worker node
When I shutdown the worker node where the master Postgres DB is running (to simulate power outage or network issue), if the backup repo pod also runs on this worker node, then the backup repo will be out of service also. In this situation, the new promoted mater will stay in recovery mode for 20 min.
My question is whether this 20 min can be reduced?
I understand that the best solution is to have one more worker node so postgres DB pods and backup repo pod can be deployed to independent worker nodes then shutting down the worker node with master DB won't impact backup repo.
However, this is not doable in my environment. So the alternative could be: whether I can control the timer or threshold to let the new promoted master DB pod quit earlier from the state of waiting for reconnecting to backup repo?
Environment
Steps to reproduce the issue
PostgreSQL DB has 3 replicas + 1 local backup repo running on 3 worker nodes
The master DB (jltz) is running on w03, which has backup repo (vdmfpgo-repo-host-0) running on as well
Shut down the w03 worker node at 14:11:31 UTC, wait a bit and check pods status again:
PODs "STATUS" has not been changed yet (it will take about 5 min, which is normal) but the old DB master pod (jltz) and backup repo (vdmfpgo-repo-host-0) are all unavailable at this point. The new promoted master is vdmfpgo-instance-kpmd-0
Log in to the new master pod (kpmd), the "pg_is_in_recovery()" returns t (i.e. the DB is in recovery)
Kept checking the state of the new prompted DB, "pg_is_in_recovery()" turns to "f" after about 20 min.
Check the logs on the new master DB pod (kpmd):
14:11:45 UTC: w03 worker node was shut down, so the connection to primary DB was terminated
Then this pod received promote request
However, from DB log, I can see that the DB on this pod actually was trying to find a valid backup repository
Until 14:31:39, the new master pod completed the archive recovery (looks like a timeout and no more waiting for the valid backup repo) and then DB on this pod was changed to "read-write" mode
If the backup repo is online (i.e. backup repo run on different worker node from the old master db), then db failover is fairly quick.
The text was updated successfully, but these errors were encountered: