-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: pg_rewind: error: restore_command is not set in the target cluster #3698
Comments
Can you please share the content of:
(They are inside PGDATA). It strikes me that you are running with |
I think this is related to #3680 |
I'm fully aware of what |
And what about the former primary? |
They look quite similar. I should note, nothing is particularly custom about my setup. |
Have the same issue after deleting the PostgreSQL pods to update a service account. One of two instances can no longer boot back up. Downgrading cloudnativepg does not solve the problem. |
Seeing same running a custom image (timescale+jsquery+pgsql 16 base / gke 1.26 / cnpg 1.22). The system comes up and runs fine but if we start failover testing there seems to be a 50/50 chance that a pod becomes unrecoverable. |
This bug is a regression caused by a bad interaction between #3535 plus #3545 and a bug in pg_rewind behavior The patch in #3728 will fix it. We will prepare a patch release next week to address the issue. |
As a workaround until their systems are upgraded, can users enable the ALTER SYSTEM command for now @mnencia? |
@gbartolini, this is correct, but only if the bug has not yet hit. Once you see the error, the only simple procedure you can do is to delete the node. |
pg_rewind needs to be able to write all the files in the PostgreSQL data directory. For this reason, we always set `postgresql.auto.conf` mode to 600 before running it. After the PostgreSQL data directory is ready to be used, we revert the permission to be coherent with what the user specified in the `enableAlterSystem` configuration parameter. Closes: #3698 Signed-off-by: Tao Li <tao.li@enterprisedb.com> Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> Co-authored-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
pg_rewind needs to be able to write all the files in the PostgreSQL data directory. For this reason, we always set `postgresql.auto.conf` mode to 600 before running it. After the PostgreSQL data directory is ready to be used, we revert the permission to be coherent with what the user specified in the `enableAlterSystem` configuration parameter. Closes: #3698 Signed-off-by: Tao Li <tao.li@enterprisedb.com> Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> Co-authored-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> (cherry picked from commit 9ff7942)
pg_rewind needs to be able to write all the files in the PostgreSQL data directory. For this reason, we always set `postgresql.auto.conf` mode to 600 before running it. After the PostgreSQL data directory is ready to be used, we revert the permission to be coherent with what the user specified in the `enableAlterSystem` configuration parameter. Closes: #3698 Signed-off-by: Tao Li <tao.li@enterprisedb.com> Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> Co-authored-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> (cherry picked from commit 9ff7942)
pg_rewind needs to be able to write all the files in the PostgreSQL data directory. For this reason, we always set `postgresql.auto.conf` mode to 600 before running it. After the PostgreSQL data directory is ready to be used, we revert the permission to be coherent with what the user specified in the `enableAlterSystem` configuration parameter. Closes: #3698 Signed-off-by: Tao Li <tao.li@enterprisedb.com> Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> Co-authored-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> (cherry picked from commit 9ff7942)
Thank you! @leonardoce @gbartolini @mnencia |
Is there an existing issue already for this bug?
I have read the troubleshooting guide
I am running a supported version of CloudNativePG
Contact Details
skre@skre.me
Version
Helm
0.20.0
What version of Kubernetes are you using?
K8s
1.29.0
What is your Kubernetes environment?
Talos Linux
How did you install the operator?
HelmRelease: https://github.com/buroa/k8s-gitops/blob/master/kubernetes/apps/databases/cloudnative-pg/app/helmrelease.yaml
Cluster: https://github.com/buroa/k8s-gitops/blob/master/kubernetes/apps/databases/cloudnative-pg/cluster/cluster.yaml
What happened?
Upon rebooting a node that hosts the primary instance, it fails to come back up with that error.
Cluster resource
No response
Relevant log output
Code of Conduct
The text was updated successfully, but these errors were encountered: