-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: prevent stuck cluster when primary database is down but pod is up #2966
Conversation
Build Error! No Linked Issue found. Please link an issue or mention it in the body using #<issue_id> |
❗ By default, the pull request is configured to backport to all release branches.
|
Signed-off-by: Jaime Silvela <jaime.silvela@enterprisedb.com>
Build Error! No Linked Issue found. Please link an issue or mention it in the body using #<issue_id> |
/test |
@jsilvela, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/6383672533 |
Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
593520e
to
91cbd7c
Compare
/test |
@leonardoce, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/6391306590 |
#2966) Fixes a condition that could result in the primary database being down while the corresponding pod is still up. When this happens, the reconciliation of replication slots and managed roles will error out, preventing the instance manager from shutting down the primary and preventing switch-overs and fail-overs from finishing correctly. Signed-off-by: Jaime Silvela <jaime.silvela@enterprisedb.com> Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> Co-authored-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> (cherry picked from commit 5e36af8)
#2966) Fixes a condition that could result in the primary database being down while the corresponding pod is still up. When this happens, the reconciliation of replication slots and managed roles will error out, preventing the instance manager from shutting down the primary and preventing switch-overs and fail-overs from finishing correctly. Signed-off-by: Jaime Silvela <jaime.silvela@enterprisedb.com> Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> Co-authored-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Fixes a condition that could happen when the primary database
is down, but the pod is still up.
In such case, the reconciliation of replication slots or managed roles
could error out, and prevent the instance manager from shutting down
the primary. This would leave the cluster unable to fail over.