Skip to content

[DPE-9010] fix(backups): reject standby backup actions and harden async replication#1602

Merged
marceloneppel merged 5 commits into16/edgefrom
fix/1329-standby-backup-jubilant
Apr 14, 2026
Merged

[DPE-9010] fix(backups): reject standby backup actions and harden async replication#1602
marceloneppel merged 5 commits into16/edgefrom
fix/1329-standby-backup-jubilant

Conversation

@marceloneppel
Copy link
Copy Markdown
Member

@marceloneppel marceloneppel commented Apr 8, 2026

Issue

In async replication setups, backup actions on standby clusters should fail with clear guidance to run on the primary cluster. Today those actions can still run and fail later with less actionable errors.

Also, create-replication could hit an uncaught StopIteration when the relation exists but no remote units have published addresses yet.

Solution

  • Add standby-cluster detection in PostgreSQLBackups and fail early for create-backup, list-backups, and restore with explicit action-specific messages.
  • Harden async replication relation handling by failing early when the relation is missing or has no remote units yet, and by guarding relation-changed re-emission when there are no related units.
  • Add a Juju3/Ceph-backed integration test that verifies backup works on the primary cluster and is rejected on the standby cluster with the expected message.
  • Update the microceph fixture to use a non-loopback host IP for TLS SAN/certificate usage (this allows local testing in more environments).

Checklist

  • I have added or updated any relevant documentation.
  • I have cleaned any remaining cloud resources from my accounts.

Fixes #1329.

Fail create-backup, list-backups, and restore on standby clusters with
explicit guidance to run those actions on the primary cluster.

Also guard async replication relation handling when no remote units are
present yet, preventing uncaught StopIteration during create-replication.

Add unit coverage for both fixes, add a Juju3 Ceph-backed integration test
for async replication backup behavior, and switch microceph TLS setup to a
non-loopback host IP.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
@github-actions github-actions Bot added the Libraries: OK The charm libs used are OK and in-sync label Apr 8, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 86.20690% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.59%. Comparing base (9af522d) to head (87581fe).
⚠️ Report is 2 commits behind head on 16/edge.

Files with missing lines Patch % Lines
src/relations/async_replication.py 66.66% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           16/edge    #1602      +/-   ##
===========================================
+ Coverage    70.43%   70.59%   +0.16%     
===========================================
  Files           15       15              
  Lines         4282     4309      +27     
  Branches       694      700       +6     
===========================================
+ Hits          3016     3042      +26     
+ Misses        1057     1056       -1     
- Partials       209      211       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@marceloneppel marceloneppel added the bug Something isn't working as expected label Apr 8, 2026
@marceloneppel marceloneppel changed the title fix(backups): reject standby backup actions and harden async replication [DPE-9010] fix(backups): reject standby backup actions and harden async replication Apr 8, 2026
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…ackup-jubilant

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…dentials

Wait for s3-integrator units to exist and agents to be idle after deploy/config
before running sync-s3-credentials, preventing CI races that fail with
"no actions defined on charm".

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
@marceloneppel marceloneppel marked this pull request as ready for review April 13, 2026 20:24
@marceloneppel marceloneppel requested a review from a team as a code owner April 13, 2026 20:24
@marceloneppel marceloneppel requested review from carlcsaposs-canonical, dragomirp, juju-charm-bot and taurus-forever and removed request for a team April 13, 2026 20:24
Copy link
Copy Markdown
Contributor

@taurus-forever taurus-forever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THANK YOU! ❤️

@marceloneppel marceloneppel merged commit a09d1ef into 16/edge Apr 14, 2026
697 of 741 checks passed
@marceloneppel marceloneppel deleted the fix/1329-standby-backup-jubilant branch April 14, 2026 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working as expected Libraries: OK The charm libs used are OK and in-sync

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants