Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback: Physical Cluster Replication - wrong cluster specified for pausing or canceling schedule on changefeeds after cutover #18601

Open
hand-crdb opened this issue May 28, 2024 · 7 comments
Assignees

Comments

@hand-crdb
Copy link

hand-crdb commented May 28, 2024

Steven Hand (hand-crdb) commented:

Page: https://cockroachlabs.com/docs/v24.1/physical-cluster-replication-overview.html

What is the reason for your feedback?

[ ] Missing the information I need

[ ] Too complicated

[ ] Out of date

[x] Something is broken

[ ] Other

Additional details

The page specifies the wrong cluster on which to pause or cancel the schedule of changefeeds after cutover. It says:

After the cutover process for physical cluster replication, scheduled changefeeds will continue on the promoted cluster. You will need to manage pausing or canceling the schedule on the promoted standby cluster to avoid two clusters running the same changefeed to one sink. #123776

Per the linked issue #123776, it should not say (my emph.):

You will need to manage pausing or canceling the schedule on the promoted standby cluster

Instead, it should say (my emph.):

You will need to manage pausing or canceling the schedule on the original primary cluster

Jira Issue: DOC-10376

@msbutler
Copy link

@hand-crdb the docs are actually correct. we recommend pausing or cancelling the the newly promoted cluster scheduled changefeed (backup schedules pause automatically on the newly promoted cluster). PCR was designed under the assumption that the user does not need to do anything on the original source cluster.

After pausing the schedule, to avoid the conflict problem, the user should then decide if they want to create a new schedule on the newly promoted cluster or pause the original source schedule and unpause the schedule they just paused.

We want users to assume all replicated schedules will (or should) be paused. In 24.2, the changefeed schedule on the promoted standby cluster will autometically pause, just like backup schedules.

@alicia-l2
Copy link

@msbutler In a split brain situation where the primary is still up (maybe graceful failover scenario), technically changefeeds and backups would still be running as background jobs on the primary right?

@msbutler
Copy link

In a split brain situation where the primary is still up (maybe graceful failover scenario), technically changefeeds and backups would still be running as background jobs on the primary right?

yes, correct.

@alicia-l2
Copy link

alicia-l2 commented Jun 25, 2024

@kathancox as we had discussed in our sync let's document the two cases: when primary is down (dr), and if the primary is still up (graceful failover).

@hand-crdb The docs were written from the perspective of the primary cluster being offline, but we realize there may be scenarios where the primary is still running and therefore would still have background jobs. Thanks for calling this out, we will document both scenarios :)

@kathancox
Copy link
Contributor

@alicia-l2 Just wanted to double-check: aren't we trying to avoid two sets of schedules running? Even in the graceful failover scenario you mention, if you leave the schedules running on both will there not be conflicts?

@hand-crdb
Copy link
Author

FYI I added a comment with my perspective on the larger context to #123776 "Update scheduled changefeeds to pause if resumed on a different cluster"

@alicia-l2
Copy link

@kathancox yup, especially when the changefeeds are going to the same place for example. Oh I see what you mean, ok maybe we don't need to separate the scenario, we could just blanket statement say you don't want two sets of schedules running

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants