Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-2897] Cross-region async replication #447

Merged
merged 9 commits into from
May 3, 2024

Conversation

marceloneppel
Copy link
Member

@marceloneppel marceloneppel commented Apr 17, 2024

Issue

It's not possible to replicate data between regions.

Solution

Implement cross-region async replication. This PR is a rebranded and more stable version of #368.

With this PR, it's no longer necessary to remove the relation and relate again when a switchover is needed.

Also, the names of the relations can easily be changed to others, like cluster-one and cluster-two, for example, to avoid confusing users.

Important changes:

  • src/relations/async_replication.py contains the logic to make one cluster the primary and the other the standby. To make the standby cluster follow the primary cluster, the candidate for the standby cluster needs to be restarted.

    • The most important part of the logic is handled by the _on_async_relation_changed, which takes care of restarting the standby cluster units databases in order to make them replicate data from the primary cluster.
  • The 127.0.0.6/32 address added to the Patroni configuration file is needed to allow Envoy to make different clusters communicate when using Istio.

  • Passwords update will be implemented in another PR, as this one is already huge.

  • If the standby cluster has its relation removed, it goes to a read-only mode and can be promoted later to a normal cluster through the promote-cluster action.

How to deploy: https://discourse.charmhub.io/t/charmed-postgresql-k8s-deploy-async-replication/13895
How to trigger a switchover: https://discourse.charmhub.io/t/charmed-postgresql-k8s-deploy-async-replication/13895

Additional instructions:

Integration tests: #448

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…ation

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Copy link
Contributor

@taurus-forever taurus-forever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have finished the initial testing phase, let's continue as followup UX improvements here.

filename = f"{POSTGRESQL_DATA_PATH}-{str(datetime.now()).replace(' ', '-').replace(':', '-')}.tar.gz"
self.container.exec(
f"tar -zcf {filename} {POSTGRESQL_DATA_PATH}".split()
).wait_output()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create followup tickets:

  • pack archive in background (otherwise promotion to Standby will take a LOT of time if local DB is huge)
  • warn users about available backups to clean (free disk space topic), goss is a good match here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…ation

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
poetry.lock Outdated
@@ -1645,7 +1645,6 @@ files = [
{file = "PyYAML-6.0.1-cp311-cp311-win_amd64.whl", hash = "sha256:bf07ee2fef7014951eeb99f56f39c9bb4af143d8aa3c21b1677805985307da34"},
{file = "PyYAML-6.0.1-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:855fb52b0dc35af121542a76b9a84f8d1cd886ea97c84703eaa6d88e37a2ad28"},
{file = "PyYAML-6.0.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:40df9b996c2b73138957fe23a16a4f0ba614f4c0efce1e9406a184b6d07fa3a9"},
{file = "PyYAML-6.0.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:a08c6f0fe150303c1c6b71ebcd7213c2858041a7e01975da3a99aed1e7a378ef"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can avoid those flyby changes by running poetry cache clear PyPI --all and then poetry lock --no-update

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It did the trick.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
@marceloneppel marceloneppel merged commit a0e0562 into main May 3, 2024
45 checks passed
@marceloneppel marceloneppel deleted the dpe-2897-async-replication branch May 3, 2024 02:29
BON4 pushed a commit to BON4/postgresql-k8s-operator that referenced this pull request May 20, 2024
* Add async replication implementation

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Backup standby pgdata folder

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Improve comments and logs

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Remove unused constant

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Remove warning log call and add optional type hint

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Revert poetry.lock

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Revert poetry.lock

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

---------

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants