Feedback: Physical Cluster Replication - wrong cluster specified for pausing or canceling schedule on changefeeds after cutover #18601

hand-crdb · 2024-05-28T22:37:47Z

Steven Hand (hand-crdb) commented:

Page: https://cockroachlabs.com/docs/v24.1/physical-cluster-replication-overview.html

What is the reason for your feedback?

[ ] Missing the information I need

[ ] Too complicated

[ ] Out of date

[x] Something is broken

[ ] Other

Additional details

The page specifies the wrong cluster on which to pause or cancel the schedule of changefeeds after cutover. It says:

After the cutover process for physical cluster replication, scheduled changefeeds will continue on the promoted cluster. You will need to manage pausing or canceling the schedule on the promoted standby cluster to avoid two clusters running the same changefeed to one sink. #123776

Per the linked issue #123776, it should not say (my emph.):

You will need to manage pausing or canceling the schedule on the promoted standby cluster

Instead, it should say (my emph.):

You will need to manage pausing or canceling the schedule on the original primary cluster

Jira Issue: DOC-10376

msbutler · 2024-06-10T14:32:05Z

@hand-crdb the docs are actually correct. we recommend pausing or cancelling the the newly promoted cluster scheduled changefeed (backup schedules pause automatically on the newly promoted cluster). PCR was designed under the assumption that the user does not need to do anything on the original source cluster.

After pausing the schedule, to avoid the conflict problem, the user should then decide if they want to create a new schedule on the newly promoted cluster or pause the original source schedule and unpause the schedule they just paused.

We want users to assume all replicated schedules will (or should) be paused. In 24.2, the changefeed schedule on the promoted standby cluster will autometically pause, just like backup schedules.

alicia-l2 · 2024-06-25T15:22:10Z

@msbutler In a split brain situation where the primary is still up (maybe graceful failover scenario), technically changefeeds and backups would still be running as background jobs on the primary right?

msbutler · 2024-06-25T15:49:57Z

In a split brain situation where the primary is still up (maybe graceful failover scenario), technically changefeeds and backups would still be running as background jobs on the primary right?

yes, correct.

alicia-l2 · 2024-06-25T16:44:43Z

@kathancox as we had discussed in our sync let's document the two cases: when primary is down (dr), and if the primary is still up (graceful failover).

@hand-crdb The docs were written from the perspective of the primary cluster being offline, but we realize there may be scenarios where the primary is still running and therefore would still have background jobs. Thanks for calling this out, we will document both scenarios :)

kathancox · 2024-07-11T19:37:00Z

@alicia-l2 Just wanted to double-check: aren't we trying to avoid two sets of schedules running? Even in the graceful failover scenario you mention, if you leave the schedules running on both will there not be conflicts?

hand-crdb · 2024-07-11T21:47:28Z

FYI I added a comment with my perspective on the larger context to #123776 "Update scheduled changefeeds to pause if resumed on a different cluster"

alicia-l2 · 2024-07-12T14:25:52Z

@kathancox yup, especially when the changefeeds are going to the same place for example. Oh I see what you mean, ok maybe we don't need to separate the scenario, we could just blanket statement say you don't want two sets of schedules running

exalate-issue-sync bot assigned kathancox Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feedback: Physical Cluster Replication - wrong cluster specified for pausing or canceling schedule on changefeeds after cutover #18601

Feedback: Physical Cluster Replication - wrong cluster specified for pausing or canceling schedule on changefeeds after cutover #18601

hand-crdb commented May 28, 2024 •

edited by exalate-issue-sync bot

Loading

msbutler commented Jun 10, 2024

alicia-l2 commented Jun 25, 2024

msbutler commented Jun 25, 2024

alicia-l2 commented Jun 25, 2024 •

edited

Loading

kathancox commented Jul 11, 2024

hand-crdb commented Jul 11, 2024

alicia-l2 commented Jul 12, 2024

Feedback: Physical Cluster Replication - wrong cluster specified for pausing or canceling schedule on changefeeds after cutover #18601

Feedback: Physical Cluster Replication - wrong cluster specified for pausing or canceling schedule on changefeeds after cutover #18601

Comments

hand-crdb commented May 28, 2024 • edited by exalate-issue-sync bot Loading

What is the reason for your feedback?

Additional details

msbutler commented Jun 10, 2024

alicia-l2 commented Jun 25, 2024

msbutler commented Jun 25, 2024

alicia-l2 commented Jun 25, 2024 • edited Loading

kathancox commented Jul 11, 2024

hand-crdb commented Jul 11, 2024

alicia-l2 commented Jul 12, 2024

hand-crdb commented May 28, 2024 •

edited by exalate-issue-sync bot

Loading

alicia-l2 commented Jun 25, 2024 •

edited

Loading