Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prevent stuck cluster when primary database is down but pod is up #2966

Merged
merged 2 commits into from
Oct 3, 2023

Conversation

jsilvela
Copy link
Collaborator

@jsilvela jsilvela commented Oct 2, 2023

Fixes a condition that could happen when the primary database
is down, but the pod is still up.
In such case, the reconciliation of replication slots or managed roles
could error out, and prevent the instance manager from shutting down
the primary. This would leave the cluster unable to fail over.

@github-actions github-actions bot added backport-requested ◀️ This pull request should be backported to all supported releases release-1.19 release-1.20 labels Oct 2, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2023

Build Error! No Linked Issue found. Please link an issue or mention it in the body using #<issue_id>

@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2023

❗ By default, the pull request is configured to backport to all release branches.

  • To stop backporting this pr, remove the label: backport-requested ◀️ or add the label 'do not backport'
  • To stop backporting this pr to a certain release branch, remove the specific branch label: release-x.y

Signed-off-by: Jaime Silvela <jaime.silvela@enterprisedb.com>
@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2023

Build Error! No Linked Issue found. Please link an issue or mention it in the body using #<issue_id>

@jsilvela jsilvela changed the title fix: prevent deadlock when database is down fix: prevent deadlock when primary database is down Oct 2, 2023
@jsilvela
Copy link
Collaborator Author

jsilvela commented Oct 2, 2023

/test

@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2023

@jsilvela, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/6383672533

@jsilvela jsilvela changed the title fix: prevent deadlock when primary database is down fix: prevent stuck cluster when primary database is down but pod is up Oct 3, 2023
Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
@leonardoce
Copy link
Contributor

/test

@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2023

@leonardoce, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/6391306590

@leonardoce leonardoce merged commit 5e36af8 into main Oct 3, 2023
21 of 22 checks passed
@leonardoce leonardoce deleted the dev/cnp-4198 branch October 3, 2023 12:15
cnpg-bot pushed a commit that referenced this pull request Oct 3, 2023
#2966)

Fixes a condition that could result in the primary database being down
while the corresponding pod is still up.

When this happens, the reconciliation of replication slots and managed
roles will error out, preventing the instance manager from shutting down
the primary and preventing switch-overs and fail-overs from finishing
correctly.

Signed-off-by: Jaime Silvela <jaime.silvela@enterprisedb.com>
Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>

Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
Co-authored-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
(cherry picked from commit 5e36af8)
leonardoce added a commit that referenced this pull request Oct 3, 2023
#2966)

Fixes a condition that could result in the primary database being down
while the corresponding pod is still up.

When this happens, the reconciliation of replication slots and managed
roles will error out, preventing the instance manager from shutting down
the primary and preventing switch-overs and fail-overs from finishing
correctly.

Signed-off-by: Jaime Silvela <jaime.silvela@enterprisedb.com>
Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>

Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
Co-authored-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-requested ◀️ This pull request should be backported to all supported releases no-issue release-1.19 release-1.20
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants