New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
streamingest: support reversing replication direction #117656
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
First couple commits are #117636 so I'd review that PR first. |
dt
force-pushed
the
pcr/failback
branch
4 times, most recently
from
January 11, 2024 14:14
ca711ec
to
92993b4
Compare
rebased on #117636 and RFAL |
msbutler
approved these changes
Jan 12, 2024
dt
force-pushed
the
pcr/failback
branch
2 times, most recently
from
January 12, 2024 21:18
7f5a862
to
8e9da4e
Compare
Release note: none. Epic: none.
Release note: none. Epic: none.
Release note: none. Epic: none.
Release note: none. Epic: none.
After promoting a standby that was replicating from some primary to be its own active cluster, turning it into the new primary, it is often desirable to reverse the replication direction, so that changes made to this now-primary cluster are replicated _back_ to the former primary, now operating as a standby. Turning a formerly active, primary cluster into a replicating standby cluster is particularly common during "failback" flows, where the once primary cluster is returned to primary status after the standby had temporarily been made the active cluster. Re-promoting the primary in such cases requires it have a virtual cluster that is fully caught up with the promoted standby cluster that is serving traffic, then performing cut-over from that standby back to the primary. This _could_ be performed by creating a completely new virtual cluster in the primary cluster from a replication stream of the temporarily active standby; just like the creation of a normal secondary replicating cluster this would start by backfilling all data from the source -- the promoted standby -- and then continuously applying changes as they are streamed to it. However, in cases where this is being done on a cluster _that previously was the primary cluster_, the cluster may still have a nearly up to date copy of the virtual cluster, with only those writes that have been applied by the promoted standby after cutover missing from it. In such cases, backfilling a completely new virtual cluster from the promoted standby involves copying far more data than needed; most of that data is _already on the primary_. Instead, the new syntax `ALTER VIRTUAL CLUSTER a START REPLICATION FROM a ON x` can be used to indicate the virtual cluster 'a' should be rewound back to the time at which virtual cluster 'a' on physical cluster 'x' -- the promoted standby -- diverged from it. This will check with cluster x to confirm that its virtual cluster a was indeed replicated from the cluster running the command, and then communicate the time after which they diverged, once cluster x was made active and started accepting new writes. The cluster rewinds virtual cluster x back to that timestamp, then starts replicating from cluster x at that timestamp. Release note (enterprise change): A virtual cluster which was previously being used as the source for physical cluster replication into a standby in another cluster which has since been activated can now be reconfigured to become a standby of that now-promoted cluster, reversing the direction of the replication stream, and does so by reusing the existing data as much as possible. Epic: CRDB-34233.
TFTR! bors r=msbutler |
Build succeeded: |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After promoting a standby that was replicating from some primary to be
its own active cluster, turning it into the new primary, it is often
desirable to reverse the replication direction, so that changes made to
this now-primary cluster are replicated back to the former primary,
now operating as a standby.
Turning a formerly active, primary cluster into a replicating standby
cluster is particularly common during "failback" flows, where the once
primary cluster is returned to primary status after the standby had
temporarily been made the active cluster.
Re-promoting the primary in such cases requires it have a virtual cluster
that is fully caught up with the promoted standby cluster that is serving
traffic, then performing cut-over from that standby back to the primary.
This could be performed by creating a completely new virtual cluster
in the primary cluster from a replication stream of the temporarily active
standby; just like the creation of a normal secondary replicating cluster
this would start by backfilling all data from the source -- the promoted
standby -- and then continuously applying changes as they are streamed
to it.
However, in cases where this is being done on a cluster that previously
was the primary cluster, the cluster may still have a nearly up to date
copy of the virtual cluster, with only those writes that have been applied
by the promoted standby after cutover missing from it. In such cases,
backfilling a completely new virtual cluster from the promoted standby
involves copying far more data than needed; most of that data is already
on the primary.
Instead, the new syntax
ALTER VIRTUAL CLUSTER a START REPLICATION FROM a ON x
can be used to indicate the virtual cluster 'a' should be rewound back to
the time at which virtual cluster 'a' on physical cluster 'x' -- the promoted
standby -- diverged from it. This will check with cluster x to confirm that
its virtual cluster a was indeed replicated from the cluster running the
command, and then communicate the time after which they diverged, once
cluster x was made active and started accepting new writes. The cluster
rewinds virtual cluster x back to that timestamp, then starts replicating
from cluster x at that timestamp.
Release note (enterprise change): A virtual cluster which was previously being
used as the source for physical cluster replication into a standby in another
cluster which has since been activated can now be reconfigured to become a
standby of that now-promoted cluster, reversing the direction of the replication
stream, and does so by reusing the existing data as much as possible.
Epic: CRDB-34233.