diff --git a/src/current/_data/redirects.yml b/src/current/_data/redirects.yml
index 74bc3ffcd40..74b54ef703f 100644
--- a/src/current/_data/redirects.yml
+++ b/src/current/_data/redirects.yml
@@ -233,11 +233,11 @@
- destination: dev/failover-replication.md
sources: ['cutover-replication.md']
- versions: ['24.3']
+ versions: ['24.3', '24.1']
- destination: failover-replication.md
sources: ['cutover-replication.md']
- versions: ['24.3']
+ versions: ['24.3', '24.1']
- destination: fips.md
sources: ['fips-compliance.md']
diff --git a/src/current/_includes/v24.1/known-limitations/pcr-scheduled-changefeeds.md b/src/current/_includes/v24.1/known-limitations/pcr-scheduled-changefeeds.md
index 31fbf83187c..3d6b8aa8628 100644
--- a/src/current/_includes/v24.1/known-limitations/pcr-scheduled-changefeeds.md
+++ b/src/current/_includes/v24.1/known-limitations/pcr-scheduled-changefeeds.md
@@ -1 +1 @@
-After the [cutover process]({% link {{ page.version.version }}/cutover-replication.md %}) for [physical cluster replication]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}), [scheduled changefeeds]({% link {{ page.version.version }}/create-schedule-for-changefeed.md %}) will continue on the promoted cluster. You will need to manage [pausing]({% link {{ page.version.version }}/pause-schedules.md %}) or [canceling]({% link {{ page.version.version }}/drop-schedules.md %}) the schedule on the promoted standby cluster to avoid two clusters running the same changefeed to one sink. [#123776](https://github.com/cockroachdb/cockroach/issues/123776)
\ No newline at end of file
+After the [failover process]({% link {{ page.version.version }}/failover-replication.md %}) for [physical cluster replication]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}), [scheduled changefeeds]({% link {{ page.version.version }}/create-schedule-for-changefeed.md %}) will continue on the promoted cluster. You will need to manage [pausing]({% link {{ page.version.version }}/pause-schedules.md %}) or [canceling]({% link {{ page.version.version }}/drop-schedules.md %}) the schedule on the promoted standby cluster to avoid two clusters running the same changefeed to one sink. [#123776](https://github.com/cockroachdb/cockroach/issues/123776)
\ No newline at end of file
diff --git a/src/current/_includes/v24.1/known-limitations/physical-cluster-replication.md b/src/current/_includes/v24.1/known-limitations/physical-cluster-replication.md
index 7deb5acd139..4fe4c51174c 100644
--- a/src/current/_includes/v24.1/known-limitations/physical-cluster-replication.md
+++ b/src/current/_includes/v24.1/known-limitations/physical-cluster-replication.md
@@ -1,5 +1,4 @@
- Physical cluster replication is supported in CockroachDB {{ site.data.products.core }} clusters on v23.2 or later. The primary cluster can be a [new]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#step-1-create-the-primary-cluster) or [existing]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#set-up-pcr-from-an-existing-cluster) cluster. The standby cluster must be a [new cluster started with the `--virtualized-empty` flag]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#step-2-create-the-standby-cluster).
-- Read queries are not supported on the standby cluster before [cutover]({% link {{ page.version.version }}/cutover-replication.md %}).
-- The primary and standby clusters must have the same [zone configurations]({% link {{ page.version.version }}/configure-replication-zones.md %}).
-- Before cutover to the standby, the standby cluster does not support running [backups]({% link {{ page.version.version }}/backup-and-restore-overview.md %}) or [changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}).
+- Read queries are not supported on the standby cluster before [failover]({% link {{ page.version.version }}/failover-replication.md %}).
+- In CockroachDB {{ site.data.products.core }}, the primary and standby clusters must have the same [zone configurations]({% link {{ page.version.version }}/configure-replication-zones.md %}) in order to respect data placement configurations.
diff --git a/src/current/_includes/v24.1/physical-replication/fast-cutback-latest-timestamp.md b/src/current/_includes/v24.1/physical-replication/fast-cutback-latest-timestamp.md
index 51dfd6d90dc..a950221bb9b 100644
--- a/src/current/_includes/v24.1/physical-replication/fast-cutback-latest-timestamp.md
+++ b/src/current/_includes/v24.1/physical-replication/fast-cutback-latest-timestamp.md
@@ -1 +1 @@
-When you [cut back]({% link {{ page.version.version }}/cutover-replication.md %}#cutback) to a cluster that was previously the primary cluster, you should cut over to the `LATEST` timestamp. Using a [historical timestamp]({% link {{ page.version.version }}/as-of-system-time.md %}) may lead to the cutback failing. {% if page.name == "cutover-replication.md" %} Refer to the [PCR known limitations]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}#known-limitations).{% endif %}
+When you [fail back]({% link {{ page.version.version }}/failover-replication.md %}#failback) to a cluster that was previously the primary cluster, you should fail over to the `LATEST` timestamp. Using a [historical timestamp]({% link {{ page.version.version }}/as-of-system-time.md %}) may lead to the failback failing. {% if page.name == "failover-replication.md" %} Refer to the [PCR known limitations]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}#known-limitations).{% endif %}
diff --git a/src/current/_includes/v24.1/physical-replication/interface-virtual-cluster.md b/src/current/_includes/v24.1/physical-replication/interface-virtual-cluster.md
index 02890c3fc83..6bfae39096e 100644
--- a/src/current/_includes/v24.1/physical-replication/interface-virtual-cluster.md
+++ b/src/current/_includes/v24.1/physical-replication/interface-virtual-cluster.md
@@ -1,2 +1,2 @@
- The system virtual cluster manages the cluster's control plane and the replication of the cluster's data. Admins connect to the system virtual cluster to configure and manage the underlying CockroachDB cluster, set up PCR, create and manage a virtual cluster, and observe metrics and logs for the CockroachDB cluster and each virtual cluster.
-- Each other virtual cluster manages its own data plane. Users connect to a virtual cluster by default, rather than the system virtual cluster. To connect to the system virtual cluster, the connection string must be modified. Virtual clusters contain user data and run application workloads. When PCR is enabled, the non-system virtual cluster on both primary and secondary clusters is named `main`.
+- The application virtual cluster manages the cluster’s data plane. Application virtual clusters contain user data and run application workloads.
diff --git a/src/current/_includes/v24.1/physical-replication/retention.md b/src/current/_includes/v24.1/physical-replication/retention.md
index ed2089bc033..303fe6ebc79 100644
--- a/src/current/_includes/v24.1/physical-replication/retention.md
+++ b/src/current/_includes/v24.1/physical-replication/retention.md
@@ -1 +1 @@
-We do not recommend setting `RETENTION` much higher than the 24-hour default on the standby cluster. Accumulated data from an excessive [retention (cutover) window]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}#cutover-and-promotion-process) could affect queries running on the standby cluster that is active following a [cutover]({% link {{ page.version.version }}/cutover-replication.md %}).
\ No newline at end of file
+We do not recommend setting `RETENTION` much higher than the 24-hour default on the standby cluster. Accumulated data from an excessive [retention (failover) window]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}#failover-and-promotion-process) could affect queries running on the standby cluster that is active following a [failover]({% link {{ page.version.version }}/failover-replication.md %}).
\ No newline at end of file
diff --git a/src/current/_includes/v24.1/physical-replication/show-virtual-cluster-responses.md b/src/current/_includes/v24.1/physical-replication/show-virtual-cluster-responses.md
index 545fc73058c..97c962a2547 100644
--- a/src/current/_includes/v24.1/physical-replication/show-virtual-cluster-responses.md
+++ b/src/current/_includes/v24.1/physical-replication/show-virtual-cluster-responses.md
@@ -2,14 +2,14 @@ Field | Response
---------+----------
`id` | The ID of a virtual cluster.
`name` | The name of the standby (destination) virtual cluster.
-`data_state` | The state of the data on a virtual cluster. This can show one of the following: `initializing replication`, `ready`, `replicating`, `replication paused`, `replication pending cutover`, `replication cutting over`, `replication error`. Refer to [Data state](#data-state) for more detail on each response.
+`data_state` | The state of the data on a virtual cluster. This can show one of the following: `initializing replication`, `ready`, `replicating`, `replication paused`, `replication pending failover`, `replication failing over`, `replication error`. Refer to [Data state](#data-state) for more detail on each response.
`service_mode` | The service mode shows whether a virtual cluster is ready to accept SQL requests. This can show `none` or `shared`. When `shared`, a virtual cluster's SQL connections will be served by the same nodes that are serving the system virtual cluster.
`source_tenant_name` | The name of the primary (source) virtual cluster.
`source_cluster_uri` | The URI of the primary (source) cluster. The standby cluster connects to the primary cluster using this URI when [starting a replication stream]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#step-4-start-replication).
-`replicated_time` | The latest timestamp at which the standby cluster has consistent data — that is, the latest time you can cut over to. This time advances automatically as long as the replication proceeds without error. `replicated_time` is updated periodically (every `30s`).
-`retained_time` | The earliest timestamp at which the standby cluster has consistent data — that is, the earliest time you can cut over to.
+`replicated_time` | The latest timestamp at which the standby cluster has consistent data — that is, the latest time you can fail over to. This time advances automatically as long as the replication proceeds without error. `replicated_time` is updated periodically (every `30s`).
+`retained_time` | The earliest timestamp at which the standby cluster has consistent data — that is, the earliest time you can fail over to.
`replication_lag` | The time between the most up-to-date replicated time and the actual time. Refer to the [Technical Overview]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}) for more detail.
-`cutover_time` | The time at which the cutover will begin. This can be in the past or the future. Refer to [Cut over to a point in time]({% link {{ page.version.version }}/cutover-replication.md %}#cut-over-to-a-point-in-time).
-`status` | The status of the replication stream. This can show one of the following: `initializing replication`, `ready`, `replicating`, `replication paused`, `replication pending cutover`, `replication cutting over`, `replication error`. Refer to [Data state](#data-state) for more detail on each response.
+`failover_time` | The time at which the failover will begin. This can be in the past or the future. Refer to [Fail over to a point in time]({% link {{ page.version.version }}/failover-replication.md %}#fail-over-to-a-point-in-time).
+`status` | The status of the replication stream. This can show one of the following: `initializing replication`, `ready`, `replicating`, `replication paused`, `replication pending failover`, `replication failing over`, `replication error`. Refer to [Data state](#data-state) for more detail on each response.
`capability_name` | The [capability]({% link {{ page.version.version }}/create-virtual-cluster.md %}#capabilities) name.
`capability_value` | Whether the [capability]({% link {{ page.version.version }}/create-virtual-cluster.md %}#capabilities) is enabled for a virtual cluster.
diff --git a/src/current/_includes/v24.1/sidebar-data/self-hosted-deployments.json b/src/current/_includes/v24.1/sidebar-data/self-hosted-deployments.json
index 8722b6ebc2a..d3fb4a6e8e6 100644
--- a/src/current/_includes/v24.1/sidebar-data/self-hosted-deployments.json
+++ b/src/current/_includes/v24.1/sidebar-data/self-hosted-deployments.json
@@ -655,9 +655,9 @@
]
},
{
- "title": "Cut Over from a Primary to a Standby Cluster",
+ "title": "Fail Over from a Primary to a Standby Cluster",
"urls": [
- "/${VERSION}/cutover-replication.html"
+ "/${VERSION}/failover-replication.html"
]
},
{
diff --git a/src/current/_includes/v24.3/known-limitations/physical-cluster-replication.md b/src/current/_includes/v24.3/known-limitations/physical-cluster-replication.md
index 5131f578df1..243dbed8fe5 100644
--- a/src/current/_includes/v24.3/known-limitations/physical-cluster-replication.md
+++ b/src/current/_includes/v24.3/known-limitations/physical-cluster-replication.md
@@ -1,4 +1,5 @@
-- Physical cluster replication is supported in CockroachDB {{ site.data.products.core }} clusters on v23.2 or later. The primary cluster can be a [new]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#step-1-create-the-primary-cluster) or [existing]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#set-up-pcr-from-an-existing-cluster) cluster. The standby cluster must be a [new cluster started with the `--virtualized-empty` flag]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#step-2-create-the-standby-cluster).
-- The primary and standby clusters must have the same [zone configurations]({% link {{ page.version.version }}/configure-replication-zones.md %}).
-- Before failover to the standby, the standby cluster does not support running [backups]({% link {{ page.version.version }}/backup-and-restore-overview.md %}) or [changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}).
+- Physical cluster replication is supported in:
+ - CockroachDB {{ site.data.products.core }} clusters on v23.2 or later. The primary cluster can be a [new]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#step-1-create-the-primary-cluster) or [existing]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#set-up-pcr-from-an-existing-cluster) cluster. The standby cluster must be a [new cluster started with the `--virtualized-empty` flag]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#step-2-create-the-standby-cluster).
+ - [CockroachDB {{ site.data.products.advanced }} clusters]({% link cockroachcloud/physical-cluster-replication.md %}) on v24.3 or later.
+- In CockroachDB {{ site.data.products.core }}, the primary and standby clusters must have the same [zone configurations]({% link {{ page.version.version }}/configure-replication-zones.md %}) in order to respect data placement configurations.
diff --git a/src/current/_includes/v24.3/physical-replication/interface-virtual-cluster.md b/src/current/_includes/v24.3/physical-replication/interface-virtual-cluster.md
index 02890c3fc83..6bfae39096e 100644
--- a/src/current/_includes/v24.3/physical-replication/interface-virtual-cluster.md
+++ b/src/current/_includes/v24.3/physical-replication/interface-virtual-cluster.md
@@ -1,2 +1,2 @@
- The system virtual cluster manages the cluster's control plane and the replication of the cluster's data. Admins connect to the system virtual cluster to configure and manage the underlying CockroachDB cluster, set up PCR, create and manage a virtual cluster, and observe metrics and logs for the CockroachDB cluster and each virtual cluster.
-- Each other virtual cluster manages its own data plane. Users connect to a virtual cluster by default, rather than the system virtual cluster. To connect to the system virtual cluster, the connection string must be modified. Virtual clusters contain user data and run application workloads. When PCR is enabled, the non-system virtual cluster on both primary and secondary clusters is named `main`.
+- The application virtual cluster manages the cluster’s data plane. Application virtual clusters contain user data and run application workloads.
diff --git a/src/current/_includes/v25.2/known-limitations/physical-cluster-replication.md b/src/current/_includes/v25.2/known-limitations/physical-cluster-replication.md
index fd48773c4b2..bcb768cbc87 100644
--- a/src/current/_includes/v25.2/known-limitations/physical-cluster-replication.md
+++ b/src/current/_includes/v25.2/known-limitations/physical-cluster-replication.md
@@ -1,6 +1,4 @@
- Physical cluster replication is supported in:
- CockroachDB {{ site.data.products.core }} clusters on v23.2 or later. The primary cluster can be a [new]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#step-1-create-the-primary-cluster) or [existing]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#set-up-pcr-from-an-existing-cluster) cluster. The standby cluster must be a [new cluster started with the `--virtualized-empty` flag]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#step-2-create-the-standby-cluster).
- - [CockroachDB {{ site.data.products.advanced }} in clusters]({% link cockroachcloud/physical-cluster-replication.md %}) on v24.3 or later.
-- The primary and standby clusters must have the same [zone configurations]({% link {{ page.version.version }}/configure-replication-zones.md %}) in CockroachDB self-hosted.
-- The primary and standby clusters must have the same [zone configurations]({% link {{ page.version.version }}/configure-replication-zones.md %}).
-- Before failover to the standby, the standby cluster does not support running [backups]({% link {{ page.version.version }}/backup-and-restore-overview.md %}) or [changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}).
+ - [CockroachDB {{ site.data.products.advanced }} clusters]({% link cockroachcloud/physical-cluster-replication.md %}) on v24.3 or later.
+- In CockroachDB {{ site.data.products.core }}, the primary and standby clusters must have the same [zone configurations]({% link {{ page.version.version }}/configure-replication-zones.md %}) in order to respect data placement configurations.
diff --git a/src/current/_includes/v25.2/physical-replication/interface-virtual-cluster.md b/src/current/_includes/v25.2/physical-replication/interface-virtual-cluster.md
index 02890c3fc83..6bfae39096e 100644
--- a/src/current/_includes/v25.2/physical-replication/interface-virtual-cluster.md
+++ b/src/current/_includes/v25.2/physical-replication/interface-virtual-cluster.md
@@ -1,2 +1,2 @@
- The system virtual cluster manages the cluster's control plane and the replication of the cluster's data. Admins connect to the system virtual cluster to configure and manage the underlying CockroachDB cluster, set up PCR, create and manage a virtual cluster, and observe metrics and logs for the CockroachDB cluster and each virtual cluster.
-- Each other virtual cluster manages its own data plane. Users connect to a virtual cluster by default, rather than the system virtual cluster. To connect to the system virtual cluster, the connection string must be modified. Virtual clusters contain user data and run application workloads. When PCR is enabled, the non-system virtual cluster on both primary and secondary clusters is named `main`.
+- The application virtual cluster manages the cluster’s data plane. Application virtual clusters contain user data and run application workloads.
diff --git a/src/current/v24.1/alter-virtual-cluster.md b/src/current/v24.1/alter-virtual-cluster.md
index e95f81fbd45..4dcd6b29193 100644
--- a/src/current/v24.1/alter-virtual-cluster.md
+++ b/src/current/v24.1/alter-virtual-cluster.md
@@ -9,7 +9,7 @@ docs_area: reference.sql
{% include feature-phases/preview.md %}
{{site.data.alerts.end}}
-The `ALTER VIRTUAL CLUSTER` statement initiates a [_cutover_](#start-the-cutover-process) or [_cutback_](#start-the-cutback-process) in a [**physical cluster replication (PCR)** job]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}) and manages a virtual cluster.
+The `ALTER VIRTUAL CLUSTER` statement initiates a [_failover_](#start-the-failover-process) or [_failback_](#start-the-failback-process) in a [**physical cluster replication (PCR)** job]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}) and manages a virtual cluster.
{% include {{ page.version.version }}/physical-replication/phys-rep-sql-pages.md %}
@@ -40,9 +40,9 @@ Parameter | Description
`virtual_cluster_spec` | The virtual cluster's name.
`PAUSE REPLICATION` | Pause the replication stream.
`RESUME REPLICATION` | Resume the replication stream.
-`COMPLETE REPLICATION TO` | Set the time to complete the replication. Use:
- `SYSTEM TIME` to specify a [timestamp]({% link {{ page.version.version }}/as-of-system-time.md %}). Refer to [Cut over to a point in time]({% link {{ page.version.version }}/cutover-replication.md %}#cut-over-to-a-point-in-time) for an example.
- `LATEST` to specify the most recent replicated timestamp. Refer to [Cut over to a point in time]({% link {{ page.version.version }}/cutover-replication.md %}#cut-over-to-the-most-recent-replicated-time) for an example.
-`START REPLICATION OF virtual_cluster_spec ON physical_cluster` | Reset a virtual cluster to the time when the virtual cluster on the promoted standby diverged from it. To reuse as much of the existing data on the original primary cluster as possible, you can run this statement as part of the [cutback]({% link {{ page.version.version }}/cutover-replication.md %}#cutback) process. This command fails if the virtual cluster was not originally replicated from the original primary cluster.
-`START SERVICE SHARED` | Start a virtual cluster so it is ready to accept SQL connections after cutover.
+`COMPLETE REPLICATION TO` | Set the time to complete the replication. Use:
- `SYSTEM TIME` to specify a [timestamp]({% link {{ page.version.version }}/as-of-system-time.md %}). Refer to [Fail over to a point in time]({% link {{ page.version.version }}/failover-replication.md %}#fail-over-to-a-point-in-time) for an example.
- `LATEST` to specify the most recent replicated timestamp. Refer to [Fail over to a point in time]({% link {{ page.version.version }}/failover-replication.md %}#fail-over-to-the-most-recent-replicated-time) for an example.
+`START REPLICATION OF virtual_cluster_spec ON physical_cluster` | Reset a virtual cluster to the time when the virtual cluster on the promoted standby diverged from it. To reuse as much of the existing data on the original primary cluster as possible, you can run this statement as part of the [failback]({% link {{ page.version.version }}/failover-replication.md %}#failback) process. This command fails if the virtual cluster was not originally replicated from the original primary cluster.
+`START SERVICE SHARED` | Start a virtual cluster so it is ready to accept SQL connections after failover.
`RENAME TO virtual_cluster_spec` | Rename a virtual cluster.
`STOP SERVICE` | Stop the `shared` service for a virtual cluster. The virtual cluster's `data_state` will still be `ready` so that the service can be restarted.
`GRANT ALL CAPABILITIES` | Grant a virtual cluster all [capabilities]({% link {{ page.version.version }}/create-virtual-cluster.md %}#capabilities).
@@ -52,13 +52,13 @@ Parameter | Description
## Examples
-### Start the cutover process
+### Start the failover process
-To start the [cutover]({% link {{ page.version.version }}/cutover-replication.md %}) process, use `COMPLETE REPLICATION` and provide the timestamp to restore as of:
+To start the [failover]({% link {{ page.version.version }}/failover-replication.md %}) process, use `COMPLETE REPLICATION` and provide the timestamp to restore as of:
{% include_cached copy-clipboard.html %}
~~~ sql
-ALTER VIRTUAL CLUSTER main COMPLETE REPLICATION TO {cutover time specification};
+ALTER VIRTUAL CLUSTER main COMPLETE REPLICATION TO {failover time specification};
~~~
You can use either:
@@ -66,7 +66,7 @@ You can use either:
- `SYSTEM TIME` to specify a [timestamp]({% link {{ page.version.version }}/as-of-system-time.md %}).
- `LATEST` to specify the most recent replicated timestamp.
-When a virtual cluster is [`ready`]({% link {{ page.version.version }}/show-virtual-cluster.md %}#responses) after initiating the cutover process, you must start the service so that the virtual cluster is ready to accept SQL connections:
+When a virtual cluster is [`ready`]({% link {{ page.version.version }}/show-virtual-cluster.md %}#responses) after initiating the failover process, you must start the service so that the virtual cluster is ready to accept SQL connections:
{% include_cached copy-clipboard.html %}
~~~ sql
@@ -80,19 +80,19 @@ To stop the `shared` service for a virtual cluster and prevent it from accepting
ALTER VIRTUAL CLUSTER main STOP SERVICE;
~~~
-### Start the cutback process
+### Start the failback process
-To [cut back]({% link {{ page.version.version }}/cutover-replication.md %}#cutback) to a cluster that was previously the primary cluster, use the `ALTER VIRTUAL CLUSTER` syntax:
+To [fail back]({% link {{ page.version.version }}/failover-replication.md %}#failback) to a cluster that was previously the primary cluster, use the `ALTER VIRTUAL CLUSTER` syntax:
{% include_cached copy-clipboard.html %}
~~~ sql
ALTER VIRTUAL CLUSTER {original_primary_vc} START REPLICATION OF {promoted_standby_vc} ON {connection_string_standby};
~~~
-The original primary virtual cluster may be almost up to date with the promoted standby's virtual cluster. The difference in data between the two virtual clusters will include only the writes that have been applied to the promoted standby after cutover from the primary cluster.
+The original primary virtual cluster may be almost up to date with the promoted standby's virtual cluster. The difference in data between the two virtual clusters will include only the writes that have been applied to the promoted standby after failover from the primary cluster.
{{site.data.alerts.callout_info}}
-If you started the original PCR stream on an existing cluster without virtualization enabled, refer to the [Cut back after PCR from an existing cluster]({% link {{ page.version.version }}/cutover-replication.md %}) section for instructions.
+If you started the original PCR stream on an existing cluster without virtualization enabled, refer to the [Fail back after PCR from an existing cluster]({% link {{ page.version.version }}/failover-replication.md %}) section for instructions.
{{site.data.alerts.end}}
## See also
diff --git a/src/current/v24.1/create-virtual-cluster.md b/src/current/v24.1/create-virtual-cluster.md
index 8da40eb4662..1b5859b1fba 100644
--- a/src/current/v24.1/create-virtual-cluster.md
+++ b/src/current/v24.1/create-virtual-cluster.md
@@ -54,7 +54,7 @@ To form a connection string similar to the example, include the following values
Value | Description
----------------+------------
-`{replication user}` | The user on the primary cluster that has the `REPLICATION` system privilege. Refer to the [Create a replication user and password]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#create-a-replication-user-and-password) for more detail.
+`{replication user}` | The user on the primary cluster that has the `REPLICATION` system privilege. Refer to [Create a user with replication privileges]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#create-a-user-with-replication-privileges) for more detail.
`{password}` | The replication user's password.
`{node ID or hostname}` | The node IP address or hostname of any node from the primary cluster.
`options=ccluster=system` | The parameter to connect to the system virtual cluster on the primary cluster.
diff --git a/src/current/v24.1/cutover-replication.md b/src/current/v24.1/failover-replication.md
similarity index 69%
rename from src/current/v24.1/cutover-replication.md
rename to src/current/v24.1/failover-replication.md
index 446b9daf236..8bf9262890c 100644
--- a/src/current/v24.1/cutover-replication.md
+++ b/src/current/v24.1/failover-replication.md
@@ -1,6 +1,6 @@
---
-title: Cut Over from a Primary Cluster to a Standby Cluster
-summary: A guide to complete physical cluster replication and cut over from a primary to a standby cluster.
+title: Fail Over from a Primary Cluster to a Standby Cluster
+summary: A guide to complete physical cluster replication and fail over from a primary to a standby cluster.
toc: true
docs_area: manage
---
@@ -9,45 +9,48 @@ docs_area: manage
Physical cluster replication is supported in CockroachDB {{ site.data.products.core }} clusters.
{{site.data.alerts.end}}
-_Cutover_ in [**physical cluster replication (PCR)**]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}) allows you to switch from the active primary cluster to the passive standby cluster that has ingested replicated data. When you complete the replication stream to initiate a cutover, the job stops replicating data from the primary, sets the standby [virtual cluster]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}) to a point in time (in the past or future) where all ingested data is consistent, and then makes the standby virtual cluster ready to accept traffic.
+_Failover_ in [**physical cluster replication (PCR)**]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}) allows you to switch from the active primary cluster to the passive standby cluster that has ingested replicated data. When you complete the replication stream to initiate a failover, the job stops replicating data from the primary, sets the standby [virtual cluster]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}) to a point in time (in the past or future) where all ingested data is consistent, and then makes the standby virtual cluster ready to accept traffic.
-_Cutback_ in PCR switches operations back to the original primary cluster (or a new cluster) after a cutover event. When you initiate a cutback, the job ensures the original primary is up to date with writes from the standby that happened after cutover. The original primary cluster is then set as ready to accept application traffic once again.
+_Failback_ in PCR switches operations back to the original primary cluster (or a new cluster) after a failover event. When you initiate a failback, the job ensures the original primary is up to date with writes from the standby that happened after failover. The original primary cluster is then set as ready to accept application traffic once again.
This page describes:
-- [**Cutover**](#cutover) from the primary cluster to the standby cluster.
-- [**Cutback**](#cutback):
- - From the original standby cluster (after it was promoted during cutover) to the original primary cluster.
+- [**Failover**](#failover) from the primary cluster to the standby cluster.
+- [**Failback**](#failback):
+ - From the original standby cluster (after it was promoted during failover) to the original primary cluster.
- After the PCR stream used an existing cluster as the primary cluster.
-- [**Job management**](#job-management) after a cutover or cutback.
+- [**Job management**](#job-management) after a failover or failback.
{{site.data.alerts.callout_danger}}
-Cutover and cutback do **not** redirect traffic automatically to the standby cluster. Once the cutover or cutback is complete, you must redirect application traffic to the standby (new) cluster. If you do not redirect traffic manually, writes to the primary (original) cluster may be lost.
+Failover and failback do **not** redirect traffic automatically to the standby cluster. Once the failover or failback is complete, you must redirect application traffic to the standby (new) cluster. If you do not redirect traffic manually, writes to the primary (original) cluster may be lost.
{{site.data.alerts.end}}
-## Cutover
+## Failover
-The cutover is a two-step process on the standby cluster:
+The failover is a two-step process on the standby cluster:
-1. [Initiating the cutover](#step-1-initiate-the-cutover).
-1. [Completing the cutover](#step-2-complete-the-cutover).
+1. [Initiating the failover](#step-1-initiate-the-failover).
+1. [Completing the failover](#step-2-complete-the-failover).
### Before you begin
-During PCR, jobs running on the primary cluster will replicate to the standby cluster. Before you cut over to the standby cluster, or cut back to the original primary cluster, consider how you will manage running (replicated) jobs between the clusters. Refer to [Job management](#job-management) for instructions.
+During PCR, jobs running on the primary cluster will replicate to the standby cluster. Before you fail over to the standby cluster, or fail back to the original primary cluster, consider how you will manage running (replicated) jobs between the clusters. Refer to [Job management](#job-management) for instructions.
-### Step 1. Initiate the cutover
+### Step 1. Initiate the failover
-To initiate a cutover to the standby cluster, you can specify the point in time for the standby's promotion. That is, the standby cluster's live data at the point of cutover. Refer to the following sections for steps:
+To initiate a failover to the standby cluster, you can specify the point in time for the standby's promotion. That is, the standby cluster's live data at the point of failover. Refer to the following sections for steps:
-- [`LATEST`](#cut-over-to-the-most-recent-replicated-time): The most recent replicated timestamp.
-- [Point-in-time](#cut-over-to-a-point-in-time):
- - Past: A past timestamp within the [cutover window]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}#cutover-and-promotion-process).
- - Future: A future timestamp for planning a cutover.
+- [`LATEST`](#fail-over-to-the-most-recent-replicated-time): The most recent replicated timestamp.
+- [Point-in-time](#fail-over-to-a-point-in-time):
+ - Past: A past timestamp within the [failover window]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}#failover-and-promotion-process) of up to 4 hours in the past.
+ {{site.data.alerts.callout_success}}
+ Failing over to a past point in time is useful if you need to recover from a recent human error.
+ {{site.data.alerts.end}}
+ - Future: A future timestamp for planning a failover.
-#### Cut over to the most recent replicated time
+#### Fail over to the most recent replicated time
-To initiate a cutover to the most recent replicated timestamp, you can specify `LATEST` when you start the cutover. The latest replicated time may be behind the actual time if there is [_replication lag_]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}#cutover-and-promotion-process) in the stream. Replication lag is the time between the most up-to-date replicated time and the actual time.
+To initiate a failover to the most recent replicated timestamp, specify `LATEST`. Due to [_replication lag_]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}#failover-and-promotion-process), the most recent replicated time may be behind the current actual time. Replication lag is the time difference between the most recent replicated time and the actual time.
1. To view the current replication timestamp, use:
@@ -58,7 +61,7 @@ To initiate a cutover to the most recent replicated timestamp, you can specify `
{% include_cached copy-clipboard.html %}
~~~
- id | name | source_tenant_name | source_cluster_uri | retained_time | replicated_time | replication_lag | cutover_time | status
+ id | name | source_tenant_name | source_cluster_uri | retained_time | replicated_time | replication_lag | failover_time | status
-----+------+--------------------+-------------------------------------------------+---------------------------------+------------------------+-----------------+--------------+--------------
3 | main | main | postgresql://user@hostname or IP:26257?redacted | 2024-04-18 10:07:45.000001+00 | 2024-04-18 14:07:45+00 | 00:00:19.602682 | NULL | replicating
(1 row)
@@ -68,25 +71,25 @@ To initiate a cutover to the most recent replicated timestamp, you can specify `
You can view the [**Replication Lag** graph]({% link {{ page.version.version }}/ui-physical-cluster-replication-dashboard.md %}#replication-lag) in the standby cluster's DB Console.
{{site.data.alerts.end}}
-1. Run the following from the standby cluster's SQL shell to start the cutover:
+1. Run the following from the standby cluster's SQL shell to start the failover:
{% include_cached copy-clipboard.html %}
~~~ sql
ALTER VIRTUAL CLUSTER main COMPLETE REPLICATION TO LATEST;
~~~
- The `cutover_time` is the timestamp at which the replicated data is consistent. The cluster will revert any replicated data above this timestamp to ensure that the standby is consistent with the primary at that timestamp:
+ The `failover_time` is the timestamp at which the replicated data is consistent. The cluster will revert any replicated data above this timestamp to ensure that the standby is consistent with the primary at that timestamp:
~~~
- cutover_time
+ failover_time
----------------------------------
1695922878030920020.0000000000
(1 row)
~~~
-#### Cut over to a point in time
+#### Fail over to a point in time
-You can control the point in time that the PCR stream will cut over to.
+You can control the point in time that the PCR stream will fail over to.
1. To select a [specific time]({% link {{ page.version.version }}/as-of-system-time.md %}) in the past, use:
@@ -95,10 +98,10 @@ You can control the point in time that the PCR stream will cut over to.
SHOW VIRTUAL CLUSTER main WITH REPLICATION STATUS;
~~~
- The `retained_time` response provides the earliest time to which you can cut over.
+ The `retained_time` response provides the earliest time to which you can fail over.
~~~
- id | name | source_tenant_name | source_cluster_uri | retained_time | replicated_time | replication_lag | cutover_time | status
+ id | name | source_tenant_name | source_cluster_uri | retained_time | replicated_time | replication_lag | failover_time | status
-----+------+--------------------+-------------------------------------------------+-------------------------------+------------------------+-----------------+--------------+--------------
3 | main | main | postgresql://user@hostname or IP:26257?redacted | 2024-04-18 10:07:45.000001+00 | 2024-04-18 14:07:45+00 | 00:00:19.602682 | NULL | replicating
(1 row)
@@ -113,14 +116,14 @@ You can control the point in time that the PCR stream will cut over to.
Refer to [Using different timestamp formats]({% link {{ page.version.version }}/as-of-system-time.md %}#using-different-timestamp-formats) for more information.
- Similarly, to cut over to a specific time in the future:
+ Similarly, to fail over to a specific time in the future:
{% include_cached copy-clipboard.html %}
~~~ sql
ALTER VIRTUAL CLUSTER main COMPLETE REPLICATION TO SYSTEM TIME '+5h';
~~~
- A future cutover will proceed once the replicated data has reached the specified time.
+ A future failover will proceed once the replicated data has reached the specified time.
{{site.data.alerts.callout_info}}
To monitor for when the replication stream completes, do the following:
@@ -129,7 +132,7 @@ To monitor for when the replication stream completes, do the following:
1. Run `SHOW JOB WHEN COMPLETE job_id`. Refer to the `SHOW JOBS` page for [details]({% link {{ page.version.version }}/show-jobs.md %}#parameters) and an [example]({% link {{ page.version.version }}/show-jobs.md %}#show-job-when-complete).
{{site.data.alerts.end}}
-### Step 2. Complete the cutover
+### Step 2. Complete the failover
1. The completion of the replication is asynchronous; to monitor its progress use:
@@ -138,9 +141,9 @@ To monitor for when the replication stream completes, do the following:
SHOW VIRTUAL CLUSTER main WITH REPLICATION STATUS;
~~~
~~~
- id | name | source_tenant_name | source_cluster_uri | retained_time | replicated_time | replication_lag | cutover_time | status
+ id | name | source_tenant_name | source_cluster_uri | retained_time | replicated_time | replication_lag | failover_time | status
---+------+--------------------+-------------------------------------------------+-------------------------------+------------------------------+-----------------+--------------------------------+--------------
- 3 | main | main | postgresql://user@hostname or IP:26257?redacted | 2023-09-28 16:09:04.327473+00 | 2023-09-28 17:41:18.03092+00 | 00:00:19.602682 | 1695922878030920020.0000000000 | replication pending cutover
+ 3 | main | main | postgresql://user@hostname or IP:26257?redacted | 2023-09-28 16:09:04.327473+00 | 2023-09-28 17:41:18.03092+00 | 00:00:19.602682 | 1695922878030920020.0000000000 | replication pending failover
(1 row)
~~~
@@ -170,29 +173,29 @@ To monitor for when the replication stream completes, do the following:
At this point, the primary and standby clusters are entirely independent. You will need to use your own network load balancers, DNS servers, or other network configuration to direct application traffic to the standby (now primary). To manage replicated jobs on the promoted standby, refer to [Job management](#job-management).
-To enable PCR again, from the new primary to the original primary (or a completely different cluster), refer to [Cut back to the primary cluster](#cut-back-to-the-original-primary-cluster).
+To enable PCR again, from the new primary to the original primary (or a completely different cluster), refer to [Fail back to the primary cluster](#fail-back-to-the-original-primary-cluster).
-## Cutback
+## Failback
-After cutting over to the standby cluster, you may need to cut back to the original primary-standby cluster setup cluster to serve your application. Depending on the configuration of the primary cluster in the original PCR stream, use one of the following workflows:
+After failing over to the standby cluster, you may need to fail back to the original primary-standby cluster setup cluster to serve your application. Depending on the configuration of the primary cluster in the original PCR stream, use one of the following workflows:
-- [From the original standby cluster (after it was promoted during cutover) to the original primary cluster](#cut-back-to-the-original-primary-cluster).
-- [After the PCR stream used an existing cluster as the primary cluster](#cut-back-after-pcr-from-an-existing-cluster).
+- [From the original standby cluster (after it was promoted during failover) to the original primary cluster](#fail-back-to-the-original-primary-cluster). If this failback is initiated within 24 hours of the failover, PCR replicates the net-new changes from the standby cluster to the primary cluster, rather than fully replacing the existing data in the primary cluster.
+- [After the PCR stream used an existing cluster as the primary cluster](#fail-back-after-replicating-from-an-existing-primary-cluster).
{{site.data.alerts.callout_info}}
To move back to a different cluster that was not involved in the original PCR stream, set up a new PCR stream following the PCR [setup]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}) guide.
{{site.data.alerts.end}}
-### Cut back to the original primary cluster
+### Fail back to the original primary cluster
-This section illustrates the steps to cut back to the original primary cluster from the promoted standby cluster that is currently serving traffic.
+This section illustrates the steps to fail back to the original primary cluster from the promoted standby cluster that is currently serving traffic.
- **Cluster A** = original primary cluster
- **Cluster B** = original standby cluster
-**Cluster B** is serving application traffic after the [cutover](#step-2-complete-the-cutover).
+**Cluster B** is serving application traffic after the [failover](#step-2-complete-the-failover).
-1. To begin the cutback to **Cluster A**, the virtual cluster must first stop accepting connections. Connect to the system virtual on **Cluster A**:
+1. To begin the failback to **Cluster A**, the virtual cluster must first stop accepting connections. Connect to the system virtual on **Cluster A**:
{% include_cached copy-clipboard.html %}
~~~ shell
@@ -268,7 +271,7 @@ This section illustrates the steps to cut back to the original primary cluster f
(2 rows)
~~~
-1. From **Cluster A**, start the cutover:
+1. From **Cluster A**, start the failover:
{% include_cached copy-clipboard.html %}
~~~ sql
@@ -279,10 +282,10 @@ This section illustrates the steps to cut back to the original primary cluster f
{% include {{ page.version.version }}/physical-replication/fast-cutback-latest-timestamp.md %}
{{site.data.alerts.end}}
- The `cutover_time` is the timestamp at which the replicated data is consistent. The cluster will revert any replicated data above this timestamp to ensure that the standby is consistent with the primary at that timestamp:
+ The `failover_time` is the timestamp at which the replicated data is consistent. The cluster will revert any replicated data above this timestamp to ensure that the standby is consistent with the primary at that timestamp:
~~~
- cutover_time
+ failover_time
----------------------------------
1714497890000000000.0000000000
(1 row)
@@ -304,11 +307,11 @@ This section illustrates the steps to cut back to the original primary cluster f
At this point, **Cluster A** is once again the primary and **Cluster B** is once again the standby. The clusters are entirely independent. To direct application traffic to the primary (**Cluster A**), you will need to use your own network load balancers, DNS servers, or other network configuration to direct application traffic to **Cluster A**. To enable PCR again, from the primary to the standby (or a completely different cluster), refer to [Set Up Physical Cluster Replication]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}).
-### Cut back after PCR from an existing cluster
+### Fail back after replicating from an existing primary cluster
{% include_cached new-in.html version="v24.1" %} You can replicate data from an existing CockroachDB cluster that does not have [cluster virtualization]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}) enabled to a standby cluster with cluster virtualization enabled. For instructions on setting up PCR in this way, refer to [Set up PCR from an existing cluster]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#set-up-pcr-from-an-existing-cluster).
-After a [cutover](#cutover) to the standby cluster, you may want to then set up PCR from the original standby cluster, which is now the primary, to another cluster, which will become the standby. There are couple of ways to set up a new standby, and some considerations.
+After a [failover](#failover) to the standby cluster, you may want to set up PCR from the original standby cluster, which is now the primary, to another cluster, which will become the standby. There are multiple ways to set up a new standby, and some considerations.
In the example, the clusters are named for reference:
@@ -316,7 +319,7 @@ In the example, the clusters are named for reference:
- **B** = The original standby cluster, which started with virtualization.
1. You run PCR from cluster **A** to cluster **B**.
-1. You initiate a cutover from cluster **A** to cluster **B**.
+1. You initiate a failover from cluster **A** to cluster **B**.
1. You promote the `main` virtual cluster on cluster **B** and start serving application traffic from **B** (that acts as the primary).
1. You need to create a standby cluster for cluster **B** to replicate changes to. You can do one of the following:
- [Create a new virtual cluster]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#step-4-start-replication) (`main`) on cluster **A** from the replication of cluster **B**. Cluster **A** is now virtualized. This will start an initial scan because the PCR stream will ignore the former workload tables in the system virtual cluster that were [originally replicated to **B**]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#set-up-pcr-from-an-existing-cluster). You can [drop the tables]({% link {{ page.version.version }}/drop-table.md %}) that were in the system virtual cluster, because the new virtual cluster will now hold the workload replicating from cluster **B**.
@@ -324,22 +327,22 @@ In the example, the clusters are named for reference:
## Job management
-During a replication stream, jobs running on the primary cluster will replicate to the standby cluster. Once you have [completed a cutover](#step-2-complete-the-cutover) (or a [cutback](#cut-back-to-the-original-primary-cluster)), refer to the following sections for details on resuming jobs on the promoted cluster.
+During a replication stream, jobs running on the primary cluster will replicate to the standby cluster. Once you have [completed a failover](#step-2-complete-the-failover) (or a [failback](#fail-back-to-the-original-primary-cluster)), refer to the following sections for details on resuming jobs on the promoted cluster.
### Backup schedules
-[Backup schedules]({% link {{ page.version.version }}/manage-a-backup-schedule.md %}) will pause after cutover on the promoted cluster. Take the following steps to resume jobs:
+[Backup schedules]({% link {{ page.version.version }}/manage-a-backup-schedule.md %}) will pause after failover on the promoted cluster. Take the following steps to resume jobs:
1. Verify that there are no other schedules running backups to the same [collection of backups]({% link {{ page.version.version }}/take-full-and-incremental-backups.md %}#backup-collections), i.e., the schedule that was running on the original primary cluster.
1. Resume the backup schedule on the promoted cluster.
{{site.data.alerts.callout_info}}
-If your backup schedule was created on a cluster in v23.1 or earlier, it will **not** pause automatically on the promoted cluster after cutover. In this case, you must pause the schedule manually on the promoted cluster and then take the outlined steps.
+If your backup schedule was created on a cluster in v23.1 or earlier, it will **not** pause automatically on the promoted cluster after failover. In this case, you must pause the schedule manually on the promoted cluster and then take the outlined steps.
{{site.data.alerts.end}}
### Changefeeds
-[Changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}) will fail on the promoted cluster immediately after cutover to avoid two clusters running the same changefeed to one sink. We recommend that you recreate changefeeds on the promoted cluster.
+[Changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}) will fail on the promoted cluster immediately after failover to avoid two clusters running the same changefeed to one sink. We recommend that you recreate changefeeds on the promoted cluster.
[Scheduled changefeeds]({% link {{ page.version.version }}/create-schedule-for-changefeed.md %}) will continue on the promoted cluster. You will need to manage [pausing]({% link {{ page.version.version }}/pause-schedules.md %}) or [canceling]({% link {{ page.version.version }}/drop-schedules.md %}) the schedule on the promoted standby cluster to avoid two clusters running the same changefeed to one sink.
diff --git a/src/current/v24.1/known-limitations.md b/src/current/v24.1/known-limitations.md
index 53286f0239e..c42ebef7e9a 100644
--- a/src/current/v24.1/known-limitations.md
+++ b/src/current/v24.1/known-limitations.md
@@ -31,7 +31,7 @@ Limitations will be added as they are discovered.
- Routines cannot be created if they return fewer columns than declared. For example, `CREATE FUNCTION f(OUT sum INT, INOUT a INT, INOUT b INT) LANGUAGE SQL AS $$ SELECT (a + b, b); $$;`. [#121247](https://github.com/cockroachdb/cockroach/issues/121247)
- A `RECORD`-returning UDF cannot be created without a `RETURN` statement in the root block, which would restrict the wildcard type to a concrete one. [#122945](https://github.com/cockroachdb/cockroach/issues/122945)
-### Physical cluster replication cut back to primary cluster
+### Physical cluster replication fail back to primary cluster
{% include {{ page.version.version }}/known-limitations/fast-cutback-latest-timestamp.md %}
diff --git a/src/current/v24.1/physical-cluster-replication-monitoring.md b/src/current/v24.1/physical-cluster-replication-monitoring.md
index 05ffb3aa7e8..c531d8fab34 100644
--- a/src/current/v24.1/physical-cluster-replication-monitoring.md
+++ b/src/current/v24.1/physical-cluster-replication-monitoring.md
@@ -11,13 +11,13 @@ You can monitor a [**physical cluster replication (PCR)**]({% link {{ page.versi
- The [**Physical Cluster Replication** dashboard]({% link {{ page.version.version }}/ui-physical-cluster-replication-dashboard.md %}) on the [DB Console](#db-console).
- [Prometheus and Alertmanager](#prometheus) to track and alert on replication metrics.
-When you complete a [cutover]({% link {{ page.version.version }}/cutover-replication.md %}), there will be a gap in the primary cluster's metrics whether you are monitoring via the [DB Console](#db-console) or [Prometheus](#prometheus).
+When you complete a [failover]({% link {{ page.version.version }}/failover-replication.md %}), there will be a gap in the primary cluster's metrics whether you are monitoring via the [DB Console](#db-console) or [Prometheus](#prometheus).
-The standby cluster will also require separate monitoring to ensure observability during the cutover period. You can use the DB console to track the relevant metrics, or you can use a tool like [Grafana]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}#step-5-visualize-metrics-in-grafana) to create two separate dashboards, one for each cluster, or a single dashboard with data from both clusters.
+The standby cluster will also require separate monitoring to ensure observability during the failover period. You can use the DB console to track the relevant metrics, or you can use a tool like [Grafana]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}#step-5-visualize-metrics-in-grafana) to create two separate dashboards, one for each cluster, or a single dashboard with data from both clusters.
## SQL Shell
-In the standby cluster's SQL shell, you can query `SHOW VIRTUAL CLUSTER ... WITH REPLICATION STATUS` for detail on status and timestamps for planning [cutover]({% link {{ page.version.version }}/cutover-replication.md %}):
+In the standby cluster's SQL shell, you can query `SHOW VIRTUAL CLUSTER ... WITH REPLICATION STATUS` for detail on status and timestamps for planning [failover]({% link {{ page.version.version }}/failover-replication.md %}):
{% include_cached copy-clipboard.html %}
~~~ sql
@@ -27,7 +27,7 @@ SHOW VIRTUAL CLUSTER main WITH REPLICATION STATUS;
Refer to [Responses](#responses) for a description of each field.
~~~
-id | name | source_tenant_name | source_cluster_uri | retained_time | replicated_time | replication_lag | cutover_time | status
+id | name | source_tenant_name | source_cluster_uri | retained_time | replicated_time | replication_lag | failover_time | status
---+------+--------------------+-------------------------------------------------+-------------------------------+------------------------------+-----------------+--------------------------------+--------------
3 | main | main | postgresql://user@hostname or IP:26257?redacted | 2023-09-28 16:09:04.327473+00 | 2023-09-28 17:41:18.03092+00 | 00:00:19.602682 | 1695922878030920020.0000000000 | replicating
(1 row)
@@ -56,8 +56,7 @@ You can use Prometheus and Alertmanager to track and alert on PCR metrics. Refer
We recommend tracking the following metrics:
- `physical_replication.logical_bytes`: The logical bytes (the sum of all keys and values) ingested by all PCR jobs.
-- `physical_replication.sst_bytes`: The [SST]({% link {{ page.version.version }}/architecture/storage-layer.md %}#ssts) bytes (compressed) sent to the KV layer by all PCR jobs.
-- `physical_replication.replicated_time_seconds`: The [replicated time]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}#cutover-and-promotion-process) of the physical replication stream in seconds since the Unix epoch.
+- `physical_replication.replicated_time_seconds`: The [replicated time]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}#failover-and-promotion-process) of the physical replication stream in seconds since the Unix epoch.
## Data verification
diff --git a/src/current/v24.1/physical-cluster-replication-overview.md b/src/current/v24.1/physical-cluster-replication-overview.md
index 468d1b504c6..57f448c696a 100644
--- a/src/current/v24.1/physical-cluster-replication-overview.md
+++ b/src/current/v24.1/physical-cluster-replication-overview.md
@@ -7,7 +7,7 @@ docs_area: manage
CockroachDB **physical cluster replication (PCR)** continuously sends all data at the cluster level from a _primary_ cluster to an independent _standby_ cluster. Existing data and ongoing changes on the active primary cluster, which is serving application data, replicate asynchronously to the passive standby cluster.
-You can [_cut over_]({% link {{ page.version.version }}/cutover-replication.md %}) from the primary cluster to the standby cluster. This will stop the replication stream, reset the standby cluster to a point in time (in the past or future) where all ingested data is consistent, and make the standby ready to accept application traffic.
+You can [_fail over_]({% link {{ page.version.version }}/failover-replication.md %}) from the primary cluster to the standby cluster. This will stop the replication stream, reset the standby cluster to a point in time (in the past or future) where all ingested data is consistent, and make the standby ready to accept application traffic.
For a list of requirements for PCR, refer to the [Before you begin]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#before-you-begin) section of the [setup tutorial]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}).
@@ -18,15 +18,16 @@ You can use PCR to:
- Meet your RTO (Recovery Time Objective) and RPO (Recovery Point Objective) requirements. PCR provides lower RTO and RPO than [backup and restore]({% link {{ page.version.version }}/backup-and-restore-overview.md %}).
- Automatically replicate everything in your primary cluster to recover quickly from a control plane or full cluster failure.
- Protect against region failure when you cannot use individual [multi-region clusters]({% link {{ page.version.version }}/multiregion-overview.md %})—for example, if you have a two-datacenter architecture and do not have access to three regions; or, you need low-write latency in a single region. PCR allows for an active-passive (primary-standby) structure across two clusters with the passive cluster in a different region.
-- Quickly recover from user error (for example, dropping a database) by [failing over]({% link {{ page.version.version }}/cutover-replication.md %}) to a time in the near past.
+- Quickly recover from user error (for example, dropping a database) by [failing over]({% link {{ page.version.version }}/failover-replication.md %}) to a time in the near past.
- Create a [blue-green deployment model](https://en.wikipedia.org/wiki/Blue%E2%80%93green_deployment) by using the standby cluster for testing upgrades and hardware changes.
## Features
- **Asynchronous cluster-level replication**: When you initiate a replication stream, it will replicate byte-for-byte all of the primary cluster's existing user data and associated metadata to the standby cluster asynchronously. From then on, it will continuously replicate the primary cluster's data and metadata to the standby cluster. PCR will automatically replicate changes related to operations such as [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}), user and [privilege]({% link {{ page.version.version }}/security-reference/authorization.md %}#managing-privileges) modifications, and [zone configuration]({% link {{ page.version.version }}/show-zone-configurations.md %}) updates without any manual work.
- **Transactional consistency**: Avoid conflicts in data after recovery; the replication completes to a transactionally consistent state as of a certain point in time.
-- **Improved RPO and RTO**: Depending on workload and deployment configuration, [replication lag]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}) between the primary and standby is generally in the tens-of-seconds range. The cutover process from the primary cluster to the standby should typically happen within five minutes when completing a cutover to the latest replicated time using [`LATEST`]({% link {{ page.version.version }}/alter-virtual-cluster.md %}#synopsis).
-- **Cutover to a timestamp in the past or the future**: In the case of logical disasters or mistakes, you can [cut over]({% link {{ page.version.version }}/cutover-replication.md %}) from the primary to the standby cluster to a timestamp in the past. This means that you can return the standby to a timestamp before the mistake was replicated to the standby. Furthermore, you can plan a cutover by specifying a timestamp in the future.
+- **Improved RPO and RTO**: Depending on workload and deployment configuration, [replication lag]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}) between the primary and standby is generally in the tens-of-seconds range. The failover process from the primary cluster to the standby should typically happen within five minutes when completing a failover to the latest replicated time using [`LATEST`]({% link {{ page.version.version }}/alter-virtual-cluster.md %}#synopsis).
+- **Failover to a timestamp in the past or the future**: In the case of logical disasters or mistakes, you can [fail over]({% link {{ page.version.version }}/failover-replication.md %}) from the primary to the standby cluster to a timestamp in the past. This means that you can return the standby to a timestamp before the mistake was replicated to the standby. Furthermore, you can plan a failover by specifying a timestamp in the future.
+- **Fast failback**: Switch back from the promoted standby cluster to the original primary cluster after a failover event by replicating net-new changes rather than fully replacing existing data for an initial scan.
- **Monitoring**: To monitor the replication's initial progress, current status, and performance, you can use metrics available in the [DB Console]({% link {{ page.version.version }}/ui-overview.md %}) and [Prometheus]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}). For more detail, refer to [Physical Cluster Replication Monitoring]({% link {{ page.version.version }}/physical-cluster-replication-monitoring.md %}).
## Known limitations
@@ -46,9 +47,10 @@ This section is a quick overview of the initial requirements to start a replicat
For more comprehensive guides, refer to:
+- [Cluster Virtualization Overview]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}): for information on enabling cluster virtualization, a requirement for setting up PCR.
- [Set Up Physical Cluster Replication]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}): for a tutorial on how to start a replication stream.
- [Physical Cluster Replication Monitoring]({% link {{ page.version.version }}/physical-cluster-replication-monitoring.md %}): for detail on metrics and observability into a replication stream.
-- [Cut Over from a Primary Cluster to a Standby Cluster]({% link {{ page.version.version }}/cutover-replication.md %}): for a guide on how to complete a replication stream and cut over to the standby cluster.
+- [Fail Over from a Primary Cluster to a Standby Cluster]({% link {{ page.version.version }}/failover-replication.md %}): for a guide on how to complete a replication stream and fail over to the standby cluster.
- [Technical Overview]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}): to understand PCR in more depth before setup.
### Manage replication in the SQL shell
@@ -60,14 +62,14 @@ Statement | Action
[`CREATE VIRTUAL CLUSTER ... FROM REPLICATION OF ...`]({% link {{ page.version.version }}/create-virtual-cluster.md %}) | Start a replication stream.
[`ALTER VIRTUAL CLUSTER ... PAUSE REPLICATION`]({% link {{ page.version.version }}/alter-virtual-cluster.md %}) | Pause a running replication stream.
[`ALTER VIRTUAL CLUSTER ... RESUME REPLICATION`]({% link {{ page.version.version }}/alter-virtual-cluster.md %}) | Resume a paused replication stream.
-[`ALTER VIRTUAL CLUSTER ... START SERVICE SHARED`]({% link {{ page.version.version }}/alter-virtual-cluster.md %}) | Initiate a [cutover]({% link {{ page.version.version }}/cutover-replication.md %}).
+[`ALTER VIRTUAL CLUSTER ... START SERVICE SHARED`]({% link {{ page.version.version }}/alter-virtual-cluster.md %}) | Initiate a [failover]({% link {{ page.version.version }}/failover-replication.md %}).
[`SHOW VIRTUAL CLUSTER`]({% link {{ page.version.version }}/show-virtual-cluster.md %}) | Show all virtual clusters.
[`DROP VIRTUAL CLUSTER`]({% link {{ page.version.version }}/drop-virtual-cluster.md %}) | Remove a virtual cluster.
## Cluster versions and upgrades
-{{site.data.alerts.callout_danger}}
-The standby cluster must be at the same version as, or one version ahead of, the primary's virtual cluster.
+{{site.data.alerts.callout_info}}
+The entire standby cluster must be at the same version as, or one version ahead of, the primary's virtual cluster.
{{site.data.alerts.end}}
When PCR is enabled, upgrade with the following procedure. This upgrades the standby cluster before the primary cluster. Within the primary and standby CockroachDB clusters, the system virtual cluster must be at a cluster version greater than or equal to the virtual cluster:
diff --git a/src/current/v24.1/physical-cluster-replication-technical-overview.md b/src/current/v24.1/physical-cluster-replication-technical-overview.md
index ed9d553c1da..3d899e6e052 100644
--- a/src/current/v24.1/physical-cluster-replication-technical-overview.md
+++ b/src/current/v24.1/physical-cluster-replication-technical-overview.md
@@ -5,11 +5,11 @@ toc: true
docs_area: manage
---
-[**Physical cluster replication (PCR)**]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}) automatically and continuously streams data from an active _primary_ CockroachDB cluster to a passive _standby_ cluster. Each cluster contains: a _system virtual cluster_ and an application [virtual cluster]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}):
+[**Physical cluster replication (PCR)**]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}) continuously and asynchronously replicates data from an active _primary_ CockroachDB cluster to a passive _standby_ cluster. When both clusters are virtualized, each cluster contains a _system virtual cluster_ and an application [virtual cluster]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}) during the PCR stream:
{% include {{ page.version.version }}/physical-replication/interface-virtual-cluster.md %}
-This separation of concerns means that the replication stream can operate without affecting work happening in a virtual cluster.
+If you utilize the read on standby feature in PCR, the standby cluster has an additional reader virtual cluster that safely serves read requests on the replicating virtual cluster.
### Replication stream start-up sequence
@@ -20,7 +20,7 @@ This separation of concerns means that the replication stream can operate withou
The stream initialization proceeds as follows:
-1. The standby's consumer job connects via its system virtual cluster to the primary cluster and starts the primary cluster's physical stream producer job.
+1. The standby's consumer job connects to the primary cluster via the standby's system virtual cluster and starts the primary cluster's `REPLICATION STREAM PRODUCER` job.
1. The primary cluster chooses a timestamp at which to start the physical replication stream. Data on the primary is protected from [garbage collection]({% link {{ page.version.version }}/architecture/storage-layer.md %}#garbage-collection) until it is replicated to the standby using a [protected timestamp]({% link {{ page.version.version }}/architecture/storage-layer.md %}#protected-timestamps).
1. The primary cluster returns the timestamp and a [job ID]({% link {{ page.version.version }}/show-jobs.md %}#response) for the replication job.
1. The standby cluster retrieves a list of all nodes in the primary cluster. It uses this list to distribute work across all nodes in the standby cluster.
@@ -31,7 +31,7 @@ The stream initialization proceeds as follows:
### During the replication stream
-The replication happens at the byte level, which means that the job is unaware of databases, tables, row boundaries, and so on. However, when a [cutover](#cutover-and-promotion-process) to the standby cluster is initiated, the replication job ensures that the cluster is in a transactionally consistent state as of a certain point in time. Beyond the application data, the job will also replicate users, privileges, basic zone configuration, and schema changes.
+The replication happens at the byte level, which means that the job is unaware of databases, tables, row boundaries, and so on. However, when a [failover](#failover-and-promotion-process) to the standby cluster is initiated, the replication job ensures that the cluster is in a transactionally consistent state as of a certain point in time. Beyond the application data, the job will also replicate users, privileges, basic zone configuration, and schema changes.
During the job, [rangefeeds]({% link {{ page.version.version }}/create-and-configure-changefeeds.md %}#enable-rangefeeds) are periodically emitting resolved timestamps, which is the time where the ingested data is known to be consistent. Resolved timestamps provide a guarantee that there are no new writes from before that timestamp. This allows the standby cluster to move the [protected timestamp]({% link {{ page.version.version }}/architecture/storage-layer.md %}#protected-timestamps) forward as the replicated timestamp advances. This information is sent to the primary cluster, which allows for [garbage collection]({% link {{ page.version.version }}/architecture/storage-layer.md %}#garbage-collection) to continue as the replication stream on the standby cluster advances.
@@ -39,18 +39,18 @@ During the job, [rangefeeds]({% link {{ page.version.version }}/create-and-confi
If the primary cluster does not receive replicated time information from the standby after 24 hours, it cancels the replication job. This ensures that an inactive replication job will not prevent garbage collection.
{{site.data.alerts.end}}
-### Cutover and promotion process
+### Failover and promotion process
-The tracked replicated time and the advancing protected timestamp allows the replication stream to also track _retained time_, which is a timestamp in the past indicating the lower bound that the replication stream could cut over to. Therefore, the _cutover window_ for a replication job falls between the retained time and the replicated time.
+The tracked replicated time and the advancing protected timestamp allow the replication stream to also track _retained time_, which is a timestamp in the past indicating the lower bound that the replication stream could fail over to. The retained time can be up to 4 hours in the past, due to the protected timestamp. Therefore, the _failover window_ for a replication job falls between the retained time and the replicated time.
-
+
_Replication lag_ is the time between the most up-to-date replicated time and the actual time. While the replication keeps as current as possible to the actual time, this replication lag window is where there is potential for data loss.
-For the [cutover process]({% link {{ page.version.version }}/cutover-replication.md %}), the standby cluster waits until it has reached the specified cutover time, which can be in the [past]({% link {{ page.version.version }}/cutover-replication.md %}#cut-over-to-a-point-in-time) (retained time), the [`LATEST`]({% link {{ page.version.version }}/cutover-replication.md %}#cut-over-to-the-most-recent-replicated-time) timestamp, or in the [future]({% link {{ page.version.version }}/cutover-replication.md %}#cut-over-to-a-point-in-time). Once that timestamp has been reached, the replication stream stops and any data in the standby cluster that is **above** the cutover time is removed. Depending on how much data the standby needs to revert, this can affect the duration of RTO (recovery time objective).
+For the [failover process]({% link {{ page.version.version }}/failover-replication.md %}), the standby cluster waits until it has reached the specified failover time, which can be in the [past]({% link {{ page.version.version }}/failover-replication.md %}#fail-over-to-a-point-in-time) (retained time), the [`LATEST`]({% link {{ page.version.version }}/failover-replication.md %}#fail-over-to-the-most-recent-replicated-time) timestamp, or in the [future]({% link {{ page.version.version }}/failover-replication.md %}#fail-over-to-a-point-in-time). Once that timestamp has been reached, the replication stream stops and any data in the standby cluster that is **above** the failover time is removed. Depending on how much data the standby needs to revert, this can affect the duration of RTO (recovery time objective).
After reverting any necessary data, the standby virtual cluster is promoted as available to serve traffic and the replication job ends.
{{site.data.alerts.callout_info}}
-For detail on cutting back to the primary cluster following a cutover, refer to [Cut back to the primary cluster]({% link {{ page.version.version }}/cutover-replication.md %}#cut-back-to-the-original-primary-cluster).
+For detail on failing back to the primary cluster following a failover, refer to [Fail back to the primary cluster]({% link {{ page.version.version }}/failover-replication.md %}#fail-back-to-the-original-primary-cluster).
{{site.data.alerts.end}}
diff --git a/src/current/v24.1/set-up-physical-cluster-replication.md b/src/current/v24.1/set-up-physical-cluster-replication.md
index 81b0dfa1203..6c678ec3c51 100644
--- a/src/current/v24.1/set-up-physical-cluster-replication.md
+++ b/src/current/v24.1/set-up-physical-cluster-replication.md
@@ -35,11 +35,11 @@ The high-level steps in this tutorial are:
## Before you begin
-- Two separate CockroachDB clusters (primary and standby) with a minimum of three nodes each, and each using the same CockroachDB {{page.version.version}} version. The standby cluster should be the same version or one version ahead of the primary cluster. The primary and standby clusters must be configured with similar hardware profiles, number of nodes, and overall size. Significant discrepancies in the cluster configurations may result in degraded performance.
+- You need two separate CockroachDB clusters (primary and standby), each with a minimum of three nodes. The standby cluster should be the same version or one version ahead of the primary cluster. The primary and standby clusters must be configured with similar hardware profiles, number of nodes, and overall size. Significant discrepancies in the cluster configurations may result in degraded performance.
- To set up each cluster, you can follow [Deploy CockroachDB on Premises]({% link {{ page.version.version }}/deploy-cockroachdb-on-premises.md %}). When you initialize the cluster with the [`cockroach init`]({% link {{ page.version.version }}/cockroach-init.md %}) command, you **must** pass the `--virtualized` or `--virtualized-empty` flag. Refer to the cluster creation steps for the [primary cluster](#initialize-the-primary-cluster) and for the [standby cluster](#initialize-the-standby-cluster) for details.
- The [Deploy CockroachDB on Premises]({% link {{ page.version.version }}/deploy-cockroachdb-on-premises.md %}) tutorial creates a self-signed certificate for each {{ site.data.products.core }} cluster. To create certificates signed by an external certificate authority, refer to [Create Security Certificates using OpenSSL]({% link {{ page.version.version }}/create-security-certificates-openssl.md %}).
-- All nodes in each cluster will need access to the Certificate Authority for the other cluster. Refer to [Manage the cluster certificates](#step-3-manage-the-cluster-certificates).
-- The primary and standby clusters **must have the same [region topology]({% link {{ page.version.version }}/topology-patterns.md %})**. For example, replicating a multi-region primary cluster to a single-region standby cluster is not supported. Mismatching regions between a multi-region primary and standby cluster is also not supported.
+- All nodes in each cluster will need access to the Certificate Authority for the other cluster. Refer to [Manage cluster certificates](#step-3-manage-cluster-certificates-and-generate-connection-strings).
+- The primary and standby clusters can have different [region topologies]({% link {{ page.version.version }}/topology-patterns.md %}). However, behavior for features that rely on multi-region primitives, such as Region by Row and Region by Table, may be affected.
## Step 1. Create the primary cluster
@@ -103,7 +103,7 @@ Connect to your primary cluster's system virtual cluster using [`cockroach sql`]
Because this is the primary cluster rather than the standby cluster, the `data_state` of all rows is `ready`, rather than `replicating` or another [status]({% link {{ page.version.version }}/physical-cluster-replication-monitoring.md %}).
-### Create a replication user and password
+### Create a user with replication privileges
The standby cluster connects to the primary cluster's system virtual cluster using an identity with the `REPLICATION` privilege. Connect to the primary cluster's system virtual cluster and create a user with a password:
@@ -114,6 +114,8 @@ The standby cluster connects to the primary cluster's system virtual cluster usi
CREATE USER {your username} WITH PASSWORD '{your password}';
~~~
+ If you need to change the password later, refer to [`ALTER USER`]({% link {{ page.version.version }}/alter-user.md %}).
+
1. Grant the [`REPLICATION` system privilege]({% link {{ page.version.version }}/security-reference/authorization.md %}#supported-privileges) to your user:
{% include_cached copy-clipboard.html %}
@@ -121,8 +123,6 @@ The standby cluster connects to the primary cluster's system virtual cluster usi
GRANT SYSTEM REPLICATION TO {your username};
~~~
- If you need to change the password later, refer to [`ALTER USER`]({% link {{ page.version.version }}/alter-user.md %}).
-
### Connect to the primary virtual cluster (optional)
1. If you would like to run a sample workload on the primary's virtual cluster, open a new terminal window and use [`cockroach workload`]({% link {{ page.version.version }}/cockroach-workload.md %}) to run the workload.
@@ -234,7 +234,7 @@ Connect to your standby cluster's system virtual cluster using [`cockroach sql`]
(1 rows)
~~~
-### Create a user for the standby cluster
+### Create a user with replication privileges on the standby cluster
If you would like to access the [DB Console]({% link {{ page.version.version }}/ui-overview.md %}) to observe your replication, you will need to create a user:
@@ -254,7 +254,7 @@ If you would like to access the [DB Console]({% link {{ page.version.version }}/
Open the DB Console in your web browser: `https://{node IP or hostname}:8080/`, where you will be prompted for these credentials. Refer to [Physical Cluster Replication Monitoring]({% link {{ page.version.version }}/physical-cluster-replication-monitoring.md %}) for more detail on tracking relevant metrics for your replication stream.
-## Step 3. Manage the cluster certificates
+## Step 3. Manage cluster certificates and generate connection strings
{{site.data.alerts.callout_danger}}
It is important to carefully manage the exchange of CA certificates between clusters if you have generated self-signed certificates with `cockroach cert` as part of the [prerequisite deployment tutorial]({% link {{ page.version.version }}/deploy-cockroachdb-on-premises.md %}).
@@ -262,17 +262,13 @@ It is important to carefully manage the exchange of CA certificates between clus
To create certificates signed by an external certificate authority, refer to [Create Security Certificates using OpenSSL]({% link {{ page.version.version }}/create-security-certificates-openssl.md %}).
{{site.data.alerts.end}}
-At this point, the primary and standby clusters are both running. The next step allows the standby cluster to connect to the primary cluster and begin ingesting its data. Depending on how you manage certificates, you must ensure that all nodes on the primary and the standby cluster have access to the certificate of the other cluster.
-
-{% include_cached new-in.html version="v24.1" %} You can use the `cockroach encode-uri` command to generate a connection string containing a cluster's certificate for any [PCR statements]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}#manage-replication-in-the-sql-shell) that require a connection string.
-
-For example, in this tutorial you will need a connection string for the primary cluster when you start the replication stream from the standby.
+At this point, the primary and standby clusters are both running. The next step creates a connection URI with the certifications needed to connect the two clusters. In most cases, we recommend ensuring that all nodes on the primary cluster have access to the certificate of the standby cluster, and vice versa. This ensures that PCR is able to parallelize the work.
-To generate a connection string, pass the replication user, IP and port, along with the directory to the certificate for the primary cluster:
+Use the `cockroach encode-uri` command to generate a connection string containing a cluster's certificate for any [PCR statements]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}#manage-replication-in-the-sql-shell) that require a connection string. Pass the replication user, IP and port, along with the path to the certificate for the **primary cluster**, into the `encode-uri` command:
{% include_cached copy-clipboard.html %}
~~~ shell
-cockroach encode-uri {replication user}:{password}@{node IP or hostname}:26257 --ca-cert certs/ca.crt --inline
+cockroach encode-uri {replication user}:{password}@{node IP or hostname}:26257 --ca-cert {path to certs directory}/certs/ca.crt --inline
~~~
The connection string output contains the primary cluster's certificate:
@@ -286,11 +282,11 @@ Copy the output ready for [Step 4](#step-4-start-replication), which requires th
## Step 4. Start replication
-The system virtual cluster in the standby cluster initiates and controls the replication stream by pulling from the primary cluster. In this section, you will connect to the primary from the standby to initiate the replication stream.
+The system virtual cluster in the standby cluster initializes and controls the replication stream by pulling from the primary cluster. In this section, you will connect to the primary from the standby to initiate the replication stream.
1. From the **standby** cluster, use your connection string to the primary:
- If you generated the connection string using [`cockroach encode-uri`](#step-3-manage-the-cluster-certificates):
+ If you generated the connection string using [`cockroach encode-uri`](#step-3-manage-cluster-certificates-and-generate-connection-strings):
{% include_cached copy-clipboard.html %}
~~~ sql
@@ -300,7 +296,7 @@ The system virtual cluster in the standby cluster initiates and controls the rep
~~~
Otherwise, pass the connection string that contains:
- - The replication user and password that you [created for the primary cluster](#create-a-replication-user-and-password).
+ - The replication user and password that you [created for the primary cluster](#create-a-user-with-replication-privileges).
- The node IP address or hostname of one node from the primary cluster.
- The path to the primary node's certificate on the standby cluster.
@@ -330,7 +326,7 @@ The system virtual cluster in the standby cluster initiates and controls the rep
(2 rows)
~~~
- The standby cluster's virtual cluster is offline while the replication stream is running. To bring it online, you must explicitly [start its service after cutover]({% link {{ page.version.version }}/cutover-replication.md %}#step-2-complete-the-cutover).
+ The standby cluster's virtual cluster is offline while the replication stream is running. To bring it online, you must explicitly [start its service after failover]({% link {{ page.version.version }}/failover-replication.md %}#step-2-complete-the-failover).
1. To manage the replication stream, you can [pause and resume]({% link {{ page.version.version }}/alter-virtual-cluster.md %}) the replication stream as well as [show]({% link {{ page.version.version }}/show-virtual-cluster.md %}) the current details for the job:
@@ -351,7 +347,7 @@ The system virtual cluster in the standby cluster initiates and controls the rep
{% include_cached copy-clipboard.html %}
~~~
- id | name | source_tenant_name | source_cluster_uri | retained_time | replicated_time | replication_lag | cutover_time | status
+ id | name | source_tenant_name | source_cluster_uri | retained_time | replicated_time | replication_lag | failover_time | status
---+------+--------------------+--------------------------------------------------------+-------------------------------+------------------------+-----------------+--------------+--------------
3 | main | main | postgresql://user@{node IP or hostname}:{26257}?redacted | 2024-04-17 20:14:31.952783+00 | 2024-04-17 20:18:50+00 | 00:00:08.738176 | NULL | replicating
(1 row)
@@ -361,12 +357,12 @@ The system virtual cluster in the standby cluster initiates and controls the rep
## Set up PCR from an existing cluster
-{% include_cached new-in.html version="v24.1" %} You can replicate data from an existing CockroachDB cluster that does not have [cluster virtualization]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}) enabled to a standby cluster with cluster virtualization enabled. In the [PCR setup]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}), the existing cluster is the primary cluster, which serves application traffic.
+{% include_cached new-in.html version="v24.1" %} You can set up PCR replication from an existing CockroachDB cluster that does not have [cluster virtualization]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}) enabled. However, the standby cluster must have cluster virtualization enabled. In the [PCR setup]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}), the existing cluster is the primary cluster.
{{site.data.alerts.callout_info}}
-When you start PCR with an existing primary cluster that does **not** have [cluster virtualization]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}) enabled, you will not be able to [_cut back_]({% link {{ page.version.version }}/cutover-replication.md %}#cutback) to the original primary cluster from the promoted, original standby.
+When you start PCR with an existing primary cluster that does **not** have [cluster virtualization]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}) enabled, you will not be able to [_fail back_]({% link {{ page.version.version }}/failover-replication.md %}#failback) to the original primary cluster from the promoted, original standby.
-For more details on the cutback process when you have started PCR with a non-virtualized primary, refer to [Cut back after PCR from an existing cluster]({% link {{ page.version.version }}/cutover-replication.md %}#cut-back-after-pcr-from-an-existing-cluster).
+For more details on the failback process when you have started PCR with a non-virtualized primary, refer to [Fail back after replicating from an existing cluster]({% link {{ page.version.version }}/failover-replication.md %}#fail-back-after-replicating-from-an-existing-primary-cluster).
{{site.data.alerts.end}}
Before you begin, you will need:
@@ -407,7 +403,7 @@ Before you begin, you will need:
(1 row)
~~~
-1. To create the replication job, you will need a connection string for the **primary cluster** containing its CA certificate. For steps to generate a connection string with `cockroach encode-uri`, refer to [Step 3. Manage the cluster certificates](#step-3-manage-the-cluster-certificates).
+1. To create the replication job, you will need a connection string for the **primary cluster** containing its CA certificate. For steps to generate a connection string with `cockroach encode-uri`, refer to [Step 3. Manage cluster certificates and generate connection strings](#step-3-manage-cluster-certificates-and-generate-connection-strings).
1. If you would like to run a test workload on your existing **primary cluster**, you can use [`cockroach workload`]({% link {{ page.version.version }}/cockroach-workload.md %}) like the following:
@@ -450,9 +446,9 @@ Before you begin, you will need:
At this point, your replication stream will be running.
-To _cut over_ to the standby cluster, follow the instructions on the [Cut Over from a Primary Cluster to a Standby Cluster]({% link {{ page.version.version }}/cutover-replication.md %}) page.
+To _fail over_ to the standby cluster, follow the instructions on the [Fail Over from a Primary Cluster to a Standby Cluster]({% link {{ page.version.version }}/failover-replication.md %}) page.
-For details on how to _cut back_ after replicating a non-virtualized cluster, refer to [Cut back after PCR from an existing cluster]({% link {{ page.version.version }}/cutover-replication.md %}#cut-back-after-pcr-from-an-existing-cluster).
+For details on how to _fail back_ after replicating a non-virtualized cluster, refer to [Fail back after replicating from an existing cluster]({% link {{ page.version.version }}/failover-replication.md %}#fail-back-after-replicating-from-an-existing-primary-cluster).
## Connection reference
@@ -465,12 +461,12 @@ Cluster | Virtual Cluster | Usage | URL and Parameters
Primary | System | Set up a replication user and view running virtual clusters. Connect with [`cockroach sql`]({% link {{ page.version.version }}/cockroach-sql.md %}). | `"postgresql://root@{node IP or hostname}:{26257}?options=-ccluster=system&sslmode=verify-full"`- `options=-ccluster=system`
- `sslmode=verify-full`
Use the `--certs-dir` flag to specify the path to your certificate.
Primary | Main | Add and run a workload with [`cockroach workload`]({% link {{ page.version.version }}/cockroach-workload.md %}). | `"postgresql://root@{node IP or hostname}:{26257}?options=-ccluster=main&sslmode=verify-full&sslrootcert=certs/ca.crt&sslcert=certs/client.root.crt&sslkey=certs/client.root.key"`
{% include {{ page.version.version }}/connect/cockroach-workload-parameters.md %} As a result, for the example in this tutorial, you will need:- `options=-ccluster={virtual_cluster_name}`
- `sslmode=verify-full`
- `sslrootcert={path}/certs/ca.crt`
- `sslcert={path}/certs/client.root.crt`
- `sslkey={path}/certs/client.root.key`
Standby | System | Manage the replication stream. Connect with [`cockroach sql`]({% link {{ page.version.version }}/cockroach-sql.md %}). | `"postgresql://root@{node IP or hostname}:{26257}?options=-ccluster=system&sslmode=verify-full"`- `options=-ccluster=system`
- `sslmode=verify-full`
Use the `--certs-dir` flag to specify the path to your certificate.
-Standby/Primary | System | Connect to the other cluster. | `"postgresql://{replication user}:{password}@{node IP or hostname}:{26257}/defaultdb?options=-ccluster%3Dsystem&sslinline=true&sslmode=verify-full&sslrootcert=-----BEGIN+CERTIFICATE-----{encoded_cert}-----END+CERTIFICATE-----%0A"`
Generate the connection string with [`cockroach encode-uri`](#step-3-manage-the-cluster-certificates). Use the generated connection string in:- `CREATE VIRTUAL CLUSTER` statements to [start the replication stream](#step-4-start-replication).
- `ALTER VIRTUAL CLUSTER` statements to [cut back to the primary cluster]({% link {{ page.version.version }}/cutover-replication.md %}#cutback).
+Standby/Primary | System | Connect to the other cluster. | `"postgresql://{replication user}:{password}@{node IP or hostname}:{26257}/defaultdb?options=-ccluster%3Dsystem&sslinline=true&sslmode=verify-full&sslrootcert=-----BEGIN+CERTIFICATE-----{encoded_cert}-----END+CERTIFICATE-----%0A"`
Generate the connection string with [`cockroach encode-uri`](#step-3-manage-cluster-certificates-and-generate-connection-strings). Use the generated connection string in:- `CREATE VIRTUAL CLUSTER` statements to [start the replication stream](#step-4-start-replication).
- `ALTER VIRTUAL CLUSTER` statements to [fail back to the primary cluster]({% link {{ page.version.version }}/failover-replication.md %}#failback).
## What's next
- [Physical Cluster Replication Monitoring]({% link {{ page.version.version }}/physical-cluster-replication-monitoring.md %})
-- [Cut Over from a Primary Cluster to a Standby Cluster]({% link {{ page.version.version }}/cutover-replication.md %})
+- [Fail Over from a Primary Cluster to a Standby Cluster]({% link {{ page.version.version }}/failover-replication.md %})
- [`CREATE VIRTUAL CLUSTER`]({% link {{ page.version.version }}/create-virtual-cluster.md %})
- [`ALTER VIRTUAL CLUSTER`]({% link {{ page.version.version }}/alter-virtual-cluster.md %})
- [`DROP VIRTUAL CLUSTER`]({% link {{ page.version.version }}/drop-virtual-cluster.md %})
diff --git a/src/current/v24.3/create-virtual-cluster.md b/src/current/v24.3/create-virtual-cluster.md
index 994eb925fe4..d6727f5b8fc 100644
--- a/src/current/v24.3/create-virtual-cluster.md
+++ b/src/current/v24.3/create-virtual-cluster.md
@@ -62,7 +62,7 @@ To form a connection string similar to the example, include the following values
Value | Description
----------------+------------
-`{replication user}` | The user on the primary cluster that has the `REPLICATION` system privilege. Refer to the [Create a replication user and password]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#create-a-replication-user-and-password) for more detail.
+`{replication user}` | The user on the primary cluster that has the `REPLICATION` system privilege. Refer to [Create a user with replication privileges]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#create-a-user-with-replication-privileges) for more detail.
`{password}` | The replication user's password.
`{node ID or hostname}` | The node IP address or hostname of any node from the primary cluster.
`options=ccluster=system` | The parameter to connect to the system virtual cluster on the primary cluster.
diff --git a/src/current/v24.3/failover-replication.md b/src/current/v24.3/failover-replication.md
index 2647aa94e69..2ad88e4af87 100644
--- a/src/current/v24.3/failover-replication.md
+++ b/src/current/v24.3/failover-replication.md
@@ -5,13 +5,9 @@ toc: true
key: cutover-replication.html
---
-{{site.data.alerts.callout_info}}
-Physical cluster replication is supported in CockroachDB {{ site.data.products.core }} clusters.
-{{site.data.alerts.end}}
+_Failover_ in [**physical cluster replication (PCR)**]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}) allows you to move application traffic from the active primary cluster to the passive standby cluster. When you complete the replication stream to initiate a failover, the job stops replicating data from the primary, sets the standby [virtual cluster]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}) to a point in time (in the past or future) where all ingested data is consistent, and then makes the standby virtual cluster ready to accept traffic.
-_Failover_ in [**physical cluster replication (PCR)**]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}) allows you to switch from the active primary cluster to the passive standby cluster that has ingested replicated data. When you complete the replication stream to initiate a failover, the job stops replicating data from the primary, sets the standby [virtual cluster]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}) to a point in time (in the past or future) where all ingested data is consistent, and then makes the standby virtual cluster ready to accept traffic.
-
-_Failback_ in PCR switches operations back to the original primary cluster (or a new cluster) after a failover event. When you initiate a failback, the job ensures the original primary is up to date with writes from the standby that happened after failover. The original primary cluster is then set as ready to accept application traffic once again.
+After a failover event, you may want to return your operations to the original primary cluster (or a new cluster). _Failback_ in PCR does this by replicating new application traffic back onto the original primary cluster. When you initiate a failback, the job ensures the original primary is up to date with writes from the standby that happened after failover. The original primary cluster is then set as ready to accept application traffic once again.
This page describes:
@@ -21,8 +17,8 @@ This page describes:
- After the PCR stream used an existing cluster as the primary cluster.
- [**Job management**](#job-management) after a failover or failback.
-{{site.data.alerts.callout_danger}}
-Failover and failback do **not** redirect traffic automatically to the standby cluster. Once the failover or failback is complete, you must redirect application traffic to the standby (new) cluster. If you do not redirect traffic manually, writes to the primary (original) cluster may be lost.
+{{site.data.alerts.callout_info}}
+Failover and failback do **not** redirect traffic automatically to the standby cluster. Once the failover or failback is complete, you must redirect application traffic to the standby cluster.
{{site.data.alerts.end}}
## Failover
@@ -38,16 +34,19 @@ During PCR, jobs running on the primary cluster will replicate to the standby cl
### Step 1. Initiate the failover
-To initiate a failover to the standby cluster, you can specify the point in time for the standby's promotion in the following ways. That is, the standby cluster's live data at the point of failover. Refer to the following sections for steps:
+To initiate a failover to the standby cluster, specify the point in time for its promotion. At failover, the standby cluster’s data will reflect the state of the primary at the specified moment. Refer to the following sections for steps:
-- [`LATEST`](#fail-over-to-the-most-recent-replicated-time): The most recent replicated timestamp.
+- [`LATEST`](#fail-over-to-the-most-recent-replicated-time): The most recent replicated timestamp. This minimizes any data loss from the replication lag in asynchronous replication.
- [Point-in-time](#fail-over-to-a-point-in-time):
- - Past: A past timestamp within the [failover window]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}#failover-and-promotion-process).
+ - Past: A past timestamp within the [failover window]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}#failover-and-promotion-process) of up to 4 hours in the past.
+ {{site.data.alerts.callout_success}}
+ Failing over to a past point in time is useful if you need to recover from a recent human error.
+ {{site.data.alerts.end}}
- Future: A future timestamp for planning a failover.
#### Fail over to the most recent replicated time
-To initiate a failover to the most recent replicated timestamp, you can specify `LATEST` when you start the failover. The latest replicated time may be behind the actual time if there is [_replication lag_]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}#failover-and-promotion-process) in the stream. Replication lag is the time between the most up-to-date replicated time and the actual time.
+To initiate a failover to the most recent replicated timestamp, specify `LATEST`. Due to [_replication lag_]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}#failover-and-promotion-process), the most recent replicated time may be behind the current actual time. Replication lag is the time difference between the most recent replicated time and the actual time.
1. To view the current replication timestamp, use:
@@ -95,7 +94,7 @@ You can control the point in time that the PCR stream will fail over to.
SHOW VIRTUAL CLUSTER main WITH REPLICATION STATUS;
~~~
- The `retained_time` response provides the earliest time to which you can fail over.
+ The `retained_time` response provides the earliest time to which you can fail over. This is up to four hours in the past.
~~~
id | name | source_tenant_name | source_cluster_uri | retained_time | replicated_time | replication_lag | failover_time | status
@@ -174,10 +173,10 @@ To enable PCR again, from the new primary to the original primary (or a complete
## Failback
-After failing over to the standby cluster, you may need to fail back to the original primary-standby cluster setup cluster to serve your application. Depending on the configuration of the primary cluster in the original PCR stream, use one of the following workflows:
+After failing over to the standby cluster, you may want to return to your original configuration by failing back to the original primary-standby cluster setup. Depending on the configuration of the primary cluster in the original PCR stream, use one of the following workflows:
-- [From the original standby cluster (after it was promoted during failover) to the original primary cluster](#fail-back-to-the-original-primary-cluster).
-- [After the PCR stream used an existing cluster as the primary cluster](#fail-back-after-pcr-from-an-existing-cluster).
+- [From the original standby cluster (after it was promoted during failover) to the original primary cluster](#fail-back-to-the-original-primary-cluster). If this failback is initiated within 24 hours of the failover, PCR replicates the net-new changes from the standby cluster to the primary cluster, rather than fully replacing the existing data in the primary cluster.
+- [After the PCR stream used an existing cluster as the primary cluster](#fail-back-after-replicating-from-an-existing-primary-cluster).
{{site.data.alerts.callout_info}}
To move back to a different cluster that was not involved in the original PCR stream, set up a new PCR stream following the PCR [setup]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}) guide.
@@ -208,7 +207,7 @@ This section illustrates the steps to fail back to the original primary cluster
ALTER VIRTUAL CLUSTER {cluster_a} STOP SERVICE;
~~~
-1. Open another terminal window and generate a connection string for **Cluster B** using `cockroach encode-uri`:
+1. Open another terminal window and generate a connection string for **Cluster B** using [`cockroach encode-uri`]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#step-3-manage-cluster-certificates-and-generate-connection-strings):
{% include_cached copy-clipboard.html %}
~~~ shell
@@ -279,7 +278,7 @@ This section illustrates the steps to fail back to the original primary cluster
ALTER VIRTUAL CLUSTER {cluster_a} COMPLETE REPLICATION TO LATEST;
~~~
- The `failover_time` is the timestamp at which the replicated data is consistent. The cluster will revert any replicated data above this timestamp to ensure that the standby is consistent with the primary at that timestamp:
+ After the failover has successfully completed, it returns a `failover_time` timestamp, representing the time at which the replicated data is consistent. Note that the cluster reverts any replicated data above the `failover_time` to ensure that the standby is consistent with the primary at that time:
~~~
failover_time
@@ -302,13 +301,13 @@ This section illustrates the steps to fail back to the original primary cluster
SET CLUSTER SETTING server.controller.default_target_cluster='{cluster_a}';
~~~
-At this point, **Cluster A** is once again the primary and **Cluster B** is once again the standby. The clusters are entirely independent. To direct application traffic to the primary (**Cluster A**), you will need to use your own network load balancers, DNS servers, or other network configuration to direct application traffic to **Cluster A**. To enable PCR again, from the primary to the standby (or a completely different cluster), refer to [Set Up Physical Cluster Replication]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}).
+At this point, **Cluster A** has caught up to **Cluster B**. The clusters are entirely independent. To enable PCR again from the primary to the standby, refer to [Set Up Physical Cluster Replication]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}).
-### Fail back after PCR from an existing cluster
+### Fail back after replicating from an existing primary cluster
You can replicate data from an existing CockroachDB cluster that does not have [cluster virtualization]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}) enabled to a standby cluster with cluster virtualization enabled. For instructions on setting up a PCR in this way, refer to [Set up PCR from an existing cluster]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#set-up-pcr-from-an-existing-cluster).
-After a [failover](#failover) to the standby cluster, you may want to then set up PCR from the original standby cluster, which is now the primary, to another cluster, which will become the standby. There are couple of ways to set up a new standby, and some considerations.
+After a [failover](#failover) to the standby cluster, you may want to set up PCR from the original standby cluster, which is now the primary, to another cluster, which will become the standby. There are multiple ways to set up a new standby, and some considerations.
In the example, the clusters are named for reference:
@@ -324,11 +323,11 @@ In the example, the clusters are named for reference:
## Job management
-During a replication stream, jobs running on the primary cluster will replicate to the standby cluster. Once you have [completed a failover](#step-2-complete-the-failover) (or a [failback](#failback)), refer to the following sections for details on resuming jobs on the promoted cluster.
+During PCR, jobs running on the primary cluster replicate to the standby cluster. Once you have [completed a failover](#step-2-complete-the-failover) (or a [failback](#failback)), refer to the following sections for details on resuming jobs on the promoted cluster.
### Backup schedules
-[Backup schedules]({% link {{ page.version.version }}/manage-a-backup-schedule.md %}) will pause after failover on the promoted cluster. Take the following steps to resume jobs:
+[Backup schedules]({% link {{ page.version.version }}/manage-a-backup-schedule.md %}) pause after failover on the promoted standby cluster. Take the following steps to resume jobs:
1. Verify that there are no other schedules running backups to the same [collection of backups]({% link {{ page.version.version }}/take-full-and-incremental-backups.md %}#backup-collections), i.e., the schedule that was running on the original primary cluster.
1. [Resume]({% link {{ page.version.version }}/resume-schedules.md %}) the backup schedule on the promoted cluster.
diff --git a/src/current/v24.3/physical-cluster-replication-monitoring.md b/src/current/v24.3/physical-cluster-replication-monitoring.md
index 755f166ea80..68c2a8273ea 100644
--- a/src/current/v24.3/physical-cluster-replication-monitoring.md
+++ b/src/current/v24.3/physical-cluster-replication-monitoring.md
@@ -55,7 +55,6 @@ You can use Prometheus and Alertmanager to track and alert on PCR metrics. Refer
We recommend tracking the following metrics:
- `physical_replication.logical_bytes`: The logical bytes (the sum of all keys and values) ingested by all PCR jobs.
-- `physical_replication.sst_bytes`: The [SST]({% link {{ page.version.version }}/architecture/storage-layer.md %}#ssts) bytes (compressed) sent to the KV layer by all PCR jobs.
- `physical_replication.replicated_time_seconds`: The [replicated time]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}#failover-and-promotion-process) of the physical replication stream in seconds since the Unix epoch.
## Data verification
diff --git a/src/current/v24.3/physical-cluster-replication-overview.md b/src/current/v24.3/physical-cluster-replication-overview.md
index 135b5abef5c..49fe46a9f80 100644
--- a/src/current/v24.3/physical-cluster-replication-overview.md
+++ b/src/current/v24.3/physical-cluster-replication-overview.md
@@ -31,7 +31,7 @@ You can use PCR to:
- **Transactional consistency**: Avoid conflicts in data after recovery; the replication completes to a transactionally consistent state.
- **Improved RPO and RTO**: Depending on workload and deployment configuration, [replication lag]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}) between the primary and standby is generally in the tens-of-seconds range. The failover process from the primary cluster to the standby should typically happen within five minutes when completing a failover to the latest replicated time using [`LATEST`]({% link {{ page.version.version }}/alter-virtual-cluster.md %}#synopsis).
- **Failover to a timestamp in the past or the future**: In the case of logical disasters or mistakes, you can [fail over]({% link {{ page.version.version }}/failover-replication.md %}) from the primary to the standby cluster to a timestamp in the past. This means that you can return the standby to a timestamp before the mistake was replicated to the standby. Furthermore, you can plan a failover by specifying a timestamp in the future.
-- **Fast failback**: Switch back from the promoted standby cluster to the original primary cluster after a failover event without an initial scan.
+- **Fast failback**: Switch back from the promoted standby cluster to the original primary cluster after a failover event by replicating net-new changes rather than fully replacing existing data for an initial scan.
- {% include_cached new-in.html version="v24.3" %} **Read from standby cluster**: You can configure PCR to allow `SELECT` queries on the standby cluster. For more details, refer to [Start a PCR stream with read from standby]({% link {{ page.version.version }}/create-virtual-cluster.md %}#start-a-pcr-stream-with-read-from-standby).
- **Monitoring**: To monitor the replication's initial progress, current status, and performance, you can use metrics available in the [DB Console]({% link {{ page.version.version }}/ui-overview.md %}) and [Prometheus]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}). For more details, refer to [Physical Cluster Replication Monitoring]({% link {{ page.version.version }}/physical-cluster-replication-monitoring.md %}).
@@ -50,6 +50,7 @@ This section is a quick overview of the initial requirements to start a replicat
For more comprehensive guides, refer to:
+- [Cluster Virtualization Overview]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}): for information on enabling cluster virtualization, a requirement for setting up PCR.
- [Set Up Physical Cluster Replication]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}): for a tutorial on how to start a replication stream.
- [Physical Cluster Replication Monitoring]({% link {{ page.version.version }}/physical-cluster-replication-monitoring.md %}): for detail on metrics and observability into a replication stream.
- [Fail Over from a Primary Cluster to a Standby Cluster]({% link {{ page.version.version }}/failover-replication.md %}): for a guide on how to complete a replication stream and fail over to the standby cluster.
@@ -70,8 +71,8 @@ Statement | Action
## Cluster versions and upgrades
-{{site.data.alerts.callout_danger}}
-The standby cluster must be at the same version as, or one version ahead of, the primary's virtual cluster.
+{{site.data.alerts.callout_info}}
+The entire standby cluster must be at the same version as, or one version ahead of, the primary's virtual cluster.
{{site.data.alerts.end}}
When PCR is enabled, upgrade with the following procedure. This upgrades the standby cluster before the primary cluster. Within the primary and standby CockroachDB clusters, the system virtual cluster must be at a cluster version greater than or equal to the virtual cluster:
@@ -82,8 +83,6 @@ When PCR is enabled, upgrade with the following procedure. This upgrades the sta
1. [Finalize]({% link {{ page.version.version }}/upgrade-cockroach-version.md %}#finalize-a-major-version-upgrade-manually) the upgrade on the standby's virtual cluster.
1. [Finalize]({% link {{ page.version.version }}/upgrade-cockroach-version.md %}#finalize-a-major-version-upgrade-manually) the upgrade on the primary's virtual cluster.
-The standby cluster must be at the same version as, or one version ahead of, the primary's virtual cluster at the time of [failover]({% link {{ page.version.version }}/failover-replication.md %}).
-
## Demo video
Learn how to use PCR to meet your RTO and RPO requirements with the following demo:
diff --git a/src/current/v24.3/physical-cluster-replication-technical-overview.md b/src/current/v24.3/physical-cluster-replication-technical-overview.md
index 0eb5c70c931..1b779db9c25 100644
--- a/src/current/v24.3/physical-cluster-replication-technical-overview.md
+++ b/src/current/v24.3/physical-cluster-replication-technical-overview.md
@@ -5,11 +5,11 @@ toc: true
docs_area: manage
---
-[**Physical cluster replication (PCR)**]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}) automatically and continuously streams data from an active _primary_ CockroachDB cluster to a passive _standby_ cluster. Each cluster contains: a _system virtual cluster_ and an application [virtual cluster]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}):
+[**Physical cluster replication (PCR)**]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}) continuously and asynchronously replicates data from an active _primary_ CockroachDB cluster to a passive _standby_ cluster. When both clusters are virtualized, each cluster contains a _system virtual cluster_ and an application [virtual cluster]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}) during the PCR stream:
{% include {{ page.version.version }}/physical-replication/interface-virtual-cluster.md %}
-This separation of concerns means that the replication stream can operate without affecting work happening in a virtual cluster.
+If you utilize the [read on standby](#start-up-sequence-with-read-on-standby) feature in PCR, the standby cluster has an additional reader virtual cluster that safely serves read requests on the replicating virtual cluster.
### PCR stream start-up sequence
@@ -20,7 +20,7 @@ This separation of concerns means that the replication stream can operate withou
The stream initialization proceeds as follows:
-1. The standby's consumer job connects via its system virtual cluster to the primary cluster and starts the primary cluster's physical stream producer job.
+1. The standby's consumer job connects to the primary cluster via the standby's system virtual cluster and starts the primary cluster's `REPLICATION STREAM PRODUCER` job.
1. The primary cluster chooses a timestamp at which to start the physical replication stream. Data on the primary is protected from [garbage collection]({% link {{ page.version.version }}/architecture/storage-layer.md %}#garbage-collection) until it is replicated to the standby using a [protected timestamp]({% link {{ page.version.version }}/architecture/storage-layer.md %}#protected-timestamps).
1. The primary cluster returns the timestamp and a [job ID]({% link {{ page.version.version }}/show-jobs.md %}#response) for the replication job.
1. The standby cluster retrieves a list of all nodes in the primary cluster. It uses this list to distribute work across all nodes in the standby cluster.
@@ -53,7 +53,7 @@ If the primary cluster does not receive replicated time information from the sta
### Failover and promotion process
-The tracked replicated time and the advancing protected timestamp allows the replication stream to also track _retained time_, which is a timestamp in the past indicating the lower bound that the replication stream could fail over to. Therefore, the _failover window_ for a replication job falls between the retained time and the replicated time.
+The tracked replicated time and the advancing protected timestamp allow the replication stream to also track _retained time_, which is a timestamp in the past indicating the lower bound that the replication stream could fail over to. The retained time can be up to 4 hours in the past, due to the protected timestamp. Therefore, the _failover window_ for a replication job falls between the retained time and the replicated time.
diff --git a/src/current/v24.3/set-up-physical-cluster-replication.md b/src/current/v24.3/set-up-physical-cluster-replication.md
index c039e3dc30a..1902fc7d837 100644
--- a/src/current/v24.3/set-up-physical-cluster-replication.md
+++ b/src/current/v24.3/set-up-physical-cluster-replication.md
@@ -5,7 +5,11 @@ toc: true
docs_area: manage
---
-In this tutorial, you will set up [**physical cluster replication (PCR)**]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}) between a primary cluster and standby cluster. The primary cluster is _active_, serving application traffic. The standby cluster is _passive_, accepting updates from the primary cluster. The replication stream will send changes from the primary to the standby.
+{{site.data.alerts.callout_info}}
+Physical cluster replication is supported in CockroachDB {{ site.data.products.core }} clusters and is in [limited access]({% link {{ page.version.version }}/cockroachdb-feature-availability.md %}) on [CockroachDB {{ site.data.products.cloud }}]({% link cockroachcloud/physical-cluster-replication.md %}).
+{{site.data.alerts.end}}
+
+In this tutorial, you will set up [**physical cluster replication (PCR)**]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}) between a primary cluster and standby cluster. The primary cluster is _active_, serving application traffic. The standby cluster is _passive_, continuously receiving updates from the primary cluster. The replication stream replicates changes from the primary to the standby.
The unit of replication is a [virtual cluster]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}), which is part of the underlying infrastructure in the primary and standby clusters.
@@ -27,12 +31,12 @@ The high-level steps in this tutorial are:
## Before you begin
-- Two separate CockroachDB clusters (primary and standby) with a minimum of three nodes each, and each using the same CockroachDB {{page.version.version}} version. The standby cluster should be the same version or one version ahead of the primary cluster. The primary and standby clusters must be configured with similar hardware profiles, number of nodes, and overall size. Significant discrepancies in the cluster configurations may result in degraded performance.
+- You need two separate CockroachDB clusters (primary and standby), each with a minimum of three nodes. The standby cluster should be the same version or one version ahead of the primary cluster. The primary and standby clusters must be configured with similar hardware profiles, number of nodes, and overall size. Significant discrepancies in the cluster configurations may result in degraded performance.
- To set up each cluster, you can follow [Deploy CockroachDB on Premises]({% link {{ page.version.version }}/deploy-cockroachdb-on-premises.md %}). When you initialize the cluster with the [`cockroach init`]({% link {{ page.version.version }}/cockroach-init.md %}) command, you **must** pass the `--virtualized` or `--virtualized-empty` flag. Refer to the cluster creation steps for the [primary cluster](#initialize-the-primary-cluster) and for the [standby cluster](#initialize-the-standby-cluster) for details.
- The [Deploy CockroachDB on Premises]({% link {{ page.version.version }}/deploy-cockroachdb-on-premises.md %}) tutorial creates a self-signed certificate for each {{ site.data.products.core }} cluster. To create certificates signed by an external certificate authority, refer to [Create Security Certificates using OpenSSL]({% link {{ page.version.version }}/create-security-certificates-openssl.md %}).
-- All nodes in each cluster will need access to the Certificate Authority for the other cluster. Refer to [Manage the cluster certificates](#step-3-manage-the-cluster-certificates).
+- All nodes in each cluster will need access to the Certificate Authority for the other cluster. Refer to [Manage cluster certificates](#step-3-manage-cluster-certificates-and-generate-connection-strings).
- An [{{ site.data.products.enterprise }} license]({% link {{ page.version.version }}/licensing-faqs.md %}#types-of-licenses) on the primary **and** standby clusters. You must use the system virtual cluster on the primary and standby clusters to enable your {{ site.data.products.enterprise }} license.
-- The primary and standby clusters **must have the same [region topology]({% link {{ page.version.version }}/topology-patterns.md %})**. For example, replicating a multi-region primary cluster to a single-region standby cluster is not supported. Mismatching regions between a multi-region primary and standby cluster is also not supported.
+- The primary and standby clusters can have different [region topologies]({% link {{ page.version.version }}/topology-patterns.md %}). However, behavior for features that rely on multi-region primitives, such as Region by Row and Region by Table, may be affected.
{{site.data.alerts.callout_info}}
To set up PCR from an existing CockroachDB cluster, which will serve as the primary cluster, refer to [Set up PCR from an existing cluster](#set-up-pcr-from-an-existing-cluster).
@@ -100,7 +104,7 @@ Connect to your primary cluster's system virtual cluster using [`cockroach sql`]
Because this is the primary cluster rather than the standby cluster, the `data_state` of all rows is `ready`, rather than `replicating` or another [status]({% link {{ page.version.version }}/physical-cluster-replication-monitoring.md %}).
-### Create a replication user and password
+### Create a user with replication privileges
The standby cluster connects to the primary cluster's system virtual cluster using an identity with the `REPLICATION` privilege. Connect to the primary cluster's system virtual cluster and create a user with a password:
@@ -111,6 +115,8 @@ The standby cluster connects to the primary cluster's system virtual cluster usi
CREATE USER {your username} WITH PASSWORD '{your password}';
~~~
+ If you need to change the password later, refer to [`ALTER USER`]({% link {{ page.version.version }}/alter-user.md %}).
+
1. Grant the [`REPLICATION` system privilege]({% link {{ page.version.version }}/security-reference/authorization.md %}#supported-privileges) to your user:
{% include_cached copy-clipboard.html %}
@@ -118,8 +124,6 @@ The standby cluster connects to the primary cluster's system virtual cluster usi
GRANT SYSTEM REPLICATION TO {your username};
~~~
- If you need to change the password later, refer to [`ALTER USER`]({% link {{ page.version.version }}/alter-user.md %}).
-
### Connect to the primary virtual cluster (optional)
1. If you would like to run a sample workload on the primary's virtual cluster, open a new terminal window and use [`cockroach workload`]({% link {{ page.version.version }}/cockroach-workload.md %}) to run the workload.
@@ -231,7 +235,7 @@ Connect to your standby cluster's system virtual cluster using [`cockroach sql`]
(1 rows)
~~~
-### Create a user for the standby cluster
+### Create a user with replication privileges on the standby cluster
If you would like to access the [DB Console]({% link {{ page.version.version }}/ui-overview.md %}) to observe your replication, you will need to create a user:
@@ -251,7 +255,7 @@ If you would like to access the [DB Console]({% link {{ page.version.version }}/
Open the DB Console in your web browser: `https://{node IP or hostname}:8080/`, where you will be prompted for these credentials. Refer to [Physical Cluster Replication Monitoring]({% link {{ page.version.version }}/physical-cluster-replication-monitoring.md %}) for more detail on tracking relevant metrics for your replication stream.
-## Step 3. Manage the cluster certificates
+## Step 3. Manage cluster certificates and generate connection strings
{{site.data.alerts.callout_danger}}
It is important to carefully manage the exchange of CA certificates between clusters if you have generated self-signed certificates with `cockroach cert` as part of the [prerequisite deployment tutorial]({% link {{ page.version.version }}/deploy-cockroachdb-on-premises.md %}).
@@ -259,17 +263,13 @@ It is important to carefully manage the exchange of CA certificates between clus
To create certificates signed by an external certificate authority, refer to [Create Security Certificates using OpenSSL]({% link {{ page.version.version }}/create-security-certificates-openssl.md %}).
{{site.data.alerts.end}}
-At this point, the primary and standby clusters are both running. The next step allows the standby cluster to connect to the primary cluster and begin ingesting its data. Depending on how you manage certificates, you must ensure that all nodes on the primary and the standby cluster have access to the certificate of the other cluster.
-
-You can use the `cockroach encode-uri` command to generate a connection string containing a cluster's certificate for any [PCR statements]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}#manage-replication-in-the-sql-shell) that require a connection string.
-
-For example, in this tutorial you will need a connection string for the primary cluster when you start the replication stream from the standby.
+At this point, the primary and standby clusters are both running. The next step creates a connection URI with the certifications needed to connect the two clusters. In most cases, we recommend ensuring that all nodes on the primary cluster have access to the certificate of the standby cluster, and vice versa. This ensures that PCR is able to parallelize the work.
-To generate a connection string, pass the replication user, IP and port, along with the directory to the certificate for the primary cluster:
+Use the `cockroach encode-uri` command to generate a connection string containing a cluster's certificate for any [PCR statements]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}#manage-replication-in-the-sql-shell) that require a connection string. Pass the replication user, IP and port, along with the path to the certificate for the **primary cluster**, into the `encode-uri` command:
{% include_cached copy-clipboard.html %}
~~~ shell
-cockroach encode-uri {replication user}:{password}@{node IP or hostname}:26257 --ca-cert certs/ca.crt --inline
+cockroach encode-uri {replication user}:{password}@{node IP or hostname}:26257 --ca-cert {path to certs directory}/certs/ca.crt --inline
~~~
The connection string output contains the primary cluster's certificate:
@@ -283,11 +283,11 @@ Copy the output ready for [Step 4](#step-4-start-replication), which requires th
## Step 4. Start replication
-The system virtual cluster in the standby cluster initiates and controls the replication stream by pulling from the primary cluster. In this section, you will connect to the primary from the standby to initiate the replication stream.
+The system virtual cluster in the standby cluster initializes and controls the replication stream by pulling from the primary cluster. In this section, you will connect to the primary from the standby to initiate the replication stream.
1. From the **standby** cluster, use your connection string to the primary:
- If you generated the connection string using [`cockroach encode-uri`](#step-3-manage-the-cluster-certificates):
+ If you generated the connection string using [`cockroach encode-uri`](#step-3-manage-cluster-certificates-and-generate-connection-strings):
{% include_cached copy-clipboard.html %}
~~~ sql
@@ -297,7 +297,7 @@ The system virtual cluster in the standby cluster initiates and controls the rep
~~~
Otherwise, pass the connection string that contains:
- - The replication user and password that you [created for the primary cluster](#create-a-replication-user-and-password).
+ - The replication user and password that you [created for the primary cluster](#create-a-user-with-replication-privileges).
- The node IP address or hostname of one node from the primary cluster.
- The path to the primary node's certificate on the standby cluster.
@@ -362,12 +362,12 @@ The system virtual cluster in the standby cluster initiates and controls the rep
## Set up PCR from an existing cluster
-You can replicate data from an existing CockroachDB cluster that does not have [cluster virtualization]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}) enabled to a standby cluster with cluster virtualization enabled. In the [PCR setup]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}), the existing cluster is the primary cluster, which serves application traffic.
+You can set up PCR replication from an existing CockroachDB cluster that does not have [cluster virtualization]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}) enabled. However, the standby cluster must have cluster virtualization enabled. In the [PCR setup]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}), the existing cluster is the primary cluster.
{{site.data.alerts.callout_info}}
When you start PCR with an existing primary cluster that does **not** have [cluster virtualization]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}) enabled, you will not be able to [_fail back_]({% link {{ page.version.version }}/failover-replication.md %}) to the original primary cluster from the promoted, original standby.
-For more details on the failback process when you have started PCR with a non-virtualized primary, refer to [Fail back after PCR from an existing cluster]({% link {{ page.version.version }}/failover-replication.md %}#fail-back-after-pcr-from-an-existing-cluster).
+For more details on the failback process when you have started PCR with a non-virtualized primary, refer to [Fail back after replicating from an existing cluster]({% link {{ page.version.version }}/failover-replication.md %}#fail-back-after-replicating-from-an-existing-primary-cluster).
{{site.data.alerts.end}}
Before you begin, you will need:
@@ -408,7 +408,7 @@ Before you begin, you will need:
(1 row)
~~~
-1. To create the replication job, you will need a connection string for the **primary cluster** containing its CA certificate. For steps to generate a connection string with `cockroach encode-uri`, refer to [Step 3. Manage the cluster certificates](#step-3-manage-the-cluster-certificates).
+1. To create the replication job, you will need a connection string for the **primary cluster** containing its CA certificate. For steps to generate a connection string with `cockroach encode-uri`, refer to [Step 3. Manage cluster certificates and generate connection strings](#step-3-manage-cluster-certificates-and-generate-connection-strings).
1. If you would like to run a test workload on your existing **primary cluster**, you can use [`cockroach workload`]({% link {{ page.version.version }}/cockroach-workload.md %}) like the following:
@@ -453,7 +453,7 @@ At this point, your replication stream will be running.
To _fail over_ to the standby cluster, follow the instructions on the [Fail Over from a Primary Cluster to a Standby Cluster]({% link {{ page.version.version }}/failover-replication.md %}) page.
-For details on how to _fail back_ after replicating a non-virtualized cluster, refer to [Fail back after PCR from an existing cluster]({% link {{ page.version.version }}/failover-replication.md %}#fail-back-after-pcr-from-an-existing-cluster).
+For details on how to _fail back_ after replicating a non-virtualized cluster, refer to [Fail back after replicating from an existing cluster]({% link {{ page.version.version }}/failover-replication.md %}#fail-back-after-replicating-from-an-existing-primary-cluster).
## Connection reference
@@ -472,7 +472,7 @@ Cluster | Virtual Cluster | Usage | URL and Parameters
Primary | System | Set up a replication user and view running virtual clusters. Connect with [`cockroach sql`]({% link {{ page.version.version }}/cockroach-sql.md %}). | `"postgresql://root@{node IP or hostname}:{26257}?options=-ccluster=system&sslmode=verify-full"`- `options=-ccluster=system`
- `sslmode=verify-full`
Use the `--certs-dir` flag to specify the path to your certificate.
Primary | Main | Add and run a workload with [`cockroach workload`]({% link {{ page.version.version }}/cockroach-workload.md %}). | `"postgresql://root@{node IP or hostname}:{26257}?options=-ccluster=main&sslmode=verify-full&sslrootcert=certs/ca.crt&sslcert=certs/client.root.crt&sslkey=certs/client.root.key"`
{% include {{ page.version.version }}/connect/cockroach-workload-parameters.md %} As a result, for the example in this tutorial, you will need:- `options=-ccluster={virtual_cluster_name}`
- `sslmode=verify-full`
- `sslrootcert={path}/certs/ca.crt`
- `sslcert={path}/certs/client.root.crt`
- `sslkey={path}/certs/client.root.key`
Standby | System | Manage the replication stream. Connect with [`cockroach sql`]({% link {{ page.version.version }}/cockroach-sql.md %}). | `"postgresql://root@{node IP or hostname}:{26257}?options=-ccluster=system&sslmode=verify-full"`- `options=-ccluster=system`
- `sslmode=verify-full`
Use the `--certs-dir` flag to specify the path to your certificate.
-Standby/Primary | System | Connect to the other cluster. | `"postgresql://{replication user}:{password}@{node IP or hostname}:{26257}/defaultdb?options=-ccluster%3Dsystem&sslinline=true&sslmode=verify-full&sslrootcert=-----BEGIN+CERTIFICATE-----{encoded_cert}-----END+CERTIFICATE-----%0A"`
Generate the connection string with [`cockroach encode-uri`](#step-3-manage-the-cluster-certificates). Use the generated connection string in:- `CREATE VIRTUAL CLUSTER` statements to [start the replication stream](#step-4-start-replication).
- `ALTER VIRTUAL CLUSTER` statements to [fail back to the primary cluster]({% link {{ page.version.version }}/failover-replication.md %}#failback).
+Standby/Primary | System | Connect to the other cluster. | `"postgresql://{replication user}:{password}@{node IP or hostname}:{26257}/defaultdb?options=-ccluster%3Dsystem&sslinline=true&sslmode=verify-full&sslrootcert=-----BEGIN+CERTIFICATE-----{encoded_cert}-----END+CERTIFICATE-----%0A"`
Generate the connection string with [`cockroach encode-uri`](#step-3-manage-cluster-certificates-and-generate-connection-strings). Use the generated connection string in:- `CREATE VIRTUAL CLUSTER` statements to [start the replication stream](#step-4-start-replication).
- `ALTER VIRTUAL CLUSTER` statements to [fail back to the primary cluster]({% link {{ page.version.version }}/failover-replication.md %}#failback).
Standby | Read only | Run read queries on the standby's replicating virtual cluster | `"postgresql://root@{node IP or hostname}:{26257}?options=-ccluster=main-readonly&sslmode=verify-full"`- `options=-ccluster=main-readonly`
- `sslmode=verify-full`
Use the `--certs-dir` flag to specify the path to your certificate.
## What's next
diff --git a/src/current/v25.2/create-virtual-cluster.md b/src/current/v25.2/create-virtual-cluster.md
index 137367a88d3..435cec521d7 100644
--- a/src/current/v25.2/create-virtual-cluster.md
+++ b/src/current/v25.2/create-virtual-cluster.md
@@ -62,7 +62,7 @@ To form a connection string similar to the example, include the following values
Value | Description
----------------+------------
-`{replication user}` | The user on the primary cluster that has the `REPLICATION` system privilege. Refer to the [Create a replication user and password]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#create-a-replication-user-and-password) for more detail.
+`{replication user}` | The user on the primary cluster that has the `REPLICATION` system privilege. Refer to [Create a user with replication privileges]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#create-a-user-with-replication-privileges) for more detail.
`{password}` | The replication user's password.
`{node ID or hostname}` | The node IP address or hostname of any node from the primary cluster.
`options=ccluster=system` | The parameter to connect to the system virtual cluster on the primary cluster.
diff --git a/src/current/v25.2/failover-replication.md b/src/current/v25.2/failover-replication.md
index 75788baf1c1..dfe3edaf194 100644
--- a/src/current/v25.2/failover-replication.md
+++ b/src/current/v25.2/failover-replication.md
@@ -5,13 +5,9 @@ toc: true
key: cutover-replication.html
---
-{{site.data.alerts.callout_info}}
-Physical cluster replication is supported in CockroachDB {{ site.data.products.core }} clusters.
-{{site.data.alerts.end}}
+_Failover_ in [**physical cluster replication (PCR)**]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}) allows you to move application traffic from the active primary cluster to the passive standby cluster. When you complete the replication stream to initiate a failover, the job stops replicating data from the primary, sets the standby [virtual cluster]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}) to a point in time (in the past or future) where all ingested data is consistent, and then makes the standby virtual cluster ready to accept traffic.
-_Failover_ in [**physical cluster replication (PCR)**]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}) allows you to switch from the active primary cluster to the passive standby cluster that has ingested replicated data. When you complete the replication stream to initiate a failover, the job stops replicating data from the primary, sets the standby [virtual cluster]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}) to a point in time (in the past or future) where all ingested data is consistent, and then makes the standby virtual cluster ready to accept traffic.
-
-_Failback_ in PCR switches operations back to the original primary cluster (or a new cluster) after a failover event. When you initiate a failback, the job ensures the original primary is up to date with writes from the standby that happened after failover. The original primary cluster is then set as ready to accept application traffic once again.
+After a failover event, you may want to return your operations to the original primary cluster (or a new cluster). _Failback_ in PCR does this by replicating new application traffic back onto the original primary cluster. When you initiate a failback, the job ensures the original primary is up to date with writes from the standby that happened after failover. The original primary cluster is then set as ready to accept application traffic once again.
This page describes:
@@ -21,8 +17,8 @@ This page describes:
- After the PCR stream used an existing cluster as the primary cluster.
- [**Job management**](#job-management) after a failover or failback.
-{{site.data.alerts.callout_danger}}
-Failover and failback do **not** redirect traffic automatically to the standby cluster. Once the failover or failback is complete, you must redirect application traffic to the standby (new) cluster. If you do not redirect traffic manually, writes to the primary (original) cluster may be lost.
+{{site.data.alerts.callout_info}}
+Failover and failback do **not** redirect traffic automatically to the standby cluster. Once the failover or failback is complete, you must redirect application traffic to the standby cluster.
{{site.data.alerts.end}}
## Failover
@@ -38,16 +34,19 @@ During PCR, jobs running on the primary cluster will replicate to the standby cl
### Step 1. Initiate the failover
-To initiate a failover to the standby cluster, you can specify the point in time for the standby's promotion in the following ways. That is, the standby cluster's live data at the point of failover. Refer to the following sections for steps:
+To initiate a failover to the standby cluster, specify the point in time for its promotion. At failover, the standby cluster’s data will reflect the state of the primary at the specified moment. Refer to the following sections for steps:
-- [`LATEST`](#fail-over-to-the-most-recent-replicated-time): The most recent replicated timestamp.
+- [`LATEST`](#fail-over-to-the-most-recent-replicated-time): The most recent replicated timestamp. This minimizes any data loss from the replication lag in asynchronous replication.
- [Point-in-time](#fail-over-to-a-point-in-time):
- - Past: A past timestamp within the [failover window]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}#failover-and-promotion-process).
+ - Past: A past timestamp within the [failover window]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}#failover-and-promotion-process) of up to 4 hours in the past.
+ {{site.data.alerts.callout_success}}
+ Failing over to a past point in time is useful if you need to recover from a recent human error
+ {{site.data.alerts.end}}
- Future: A future timestamp for planning a failover.
#### Fail over to the most recent replicated time
-To initiate a failover to the most recent replicated timestamp, you can specify `LATEST` when you start the failover. The latest replicated time may be behind the actual time if there is [_replication lag_]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}#failover-and-promotion-process) in the stream. Replication lag is the time between the most up-to-date replicated time and the actual time.
+To initiate a failover to the most recent replicated timestamp, specify `LATEST`. Due to [_replication lag_]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}#failover-and-promotion-process), the most recent replicated time may be behind the current actual time. Replication lag is the time difference between the most recent replicated time and the actual time.
1. To view the current replication timestamp, use:
@@ -95,7 +94,7 @@ You can control the point in time that the PCR stream will fail over to.
SHOW VIRTUAL CLUSTER main WITH REPLICATION STATUS;
~~~
- The `retained_time` response provides the earliest time to which you can fail over.
+ The `retained_time` response provides the earliest time to which you can fail over. This is up to four hours in the past.
~~~
id | name | source_tenant_name | source_cluster_uri | retained_time | replicated_time | replication_lag | failover_time | status
@@ -174,10 +173,10 @@ To enable PCR again, from the new primary to the original primary (or a complete
## Failback
-After failing over to the standby cluster, you may need to fail back to the original primary-standby cluster setup cluster to serve your application. Depending on the configuration of the primary cluster in the original PCR stream, use one of the following workflows:
+After failing over to the standby cluster, you may want to return to your original configuration by failing back to the original primary-standby cluster setup. Depending on the configuration of the primary cluster in the original PCR stream, use one of the following workflows:
-- [From the original standby cluster (after it was promoted during failover) to the original primary cluster](#fail-back-to-the-original-primary-cluster).
-- [After the PCR stream used an existing cluster as the primary cluster](#fail-back-after-pcr-from-an-existing-cluster).
+- [From the original standby cluster (after it was promoted during failover) to the original primary cluster](#fail-back-to-the-original-primary-cluster). If this failback is initiated within 24 hours of the failover, PCR replicates the net-new changes from the standby cluster to the primary cluster, rather than fully replacing the existing data in the primary cluster.
+- [After the PCR stream used an existing cluster as the primary cluster](#fail-back-after-replicating-from-an-existing-primary-cluster).
{{site.data.alerts.callout_info}}
To move back to a different cluster that was not involved in the original PCR stream, set up a new PCR stream following the PCR [setup]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}) guide.
@@ -208,7 +207,7 @@ This section illustrates the steps to fail back to the original primary cluster
ALTER VIRTUAL CLUSTER {cluster_a} STOP SERVICE;
~~~
-1. Open another terminal window and generate a connection string for **Cluster B** using `cockroach encode-uri`:
+1. Open another terminal window and generate a connection string for **Cluster B** using [`cockroach encode-uri`]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#step-3-manage-cluster-certificates-and-generate-connection-strings):
{% include_cached copy-clipboard.html %}
~~~ shell
@@ -279,7 +278,7 @@ This section illustrates the steps to fail back to the original primary cluster
ALTER VIRTUAL CLUSTER {cluster_a} COMPLETE REPLICATION TO LATEST;
~~~
- The `failover_time` is the timestamp at which the replicated data is consistent. The cluster will revert any replicated data above this timestamp to ensure that the standby is consistent with the primary at that timestamp:
+ After the failover has successfully completed, it returns a `failover_time` timestamp, representing the time at which the replicated data is consistent. Note that the cluster reverts any replicated data above the `failover_time` to ensure that the standby is consistent with the primary at that time:
~~~
failover_time
@@ -302,13 +301,13 @@ This section illustrates the steps to fail back to the original primary cluster
SET CLUSTER SETTING server.controller.default_target_cluster='{cluster_a}';
~~~
-At this point, **Cluster A** is once again the primary and **Cluster B** is once again the standby. The clusters are entirely independent. To direct application traffic to the primary (**Cluster A**), you will need to use your own network load balancers, DNS servers, or other network configuration to direct application traffic to **Cluster A**. To enable PCR again, from the primary to the standby (or a completely different cluster), refer to [Set Up Physical Cluster Replication]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}).
+At this point, **Cluster A** has caught up to **Cluster B**. The clusters are entirely independent. To enable PCR again from the primary to the standby, refer to [Set Up Physical Cluster Replication]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}).
-### Fail back after PCR from an existing cluster
+### Fail back after replicating from an existing primary cluster
You can replicate data from an existing CockroachDB cluster that does not have [cluster virtualization]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}) enabled to a standby cluster with cluster virtualization enabled. For instructions on setting up a PCR in this way, refer to [Set up PCR from an existing cluster]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}#set-up-pcr-from-an-existing-cluster).
-After a [failover](#failover) to the standby cluster, you may want to then set up PCR from the original standby cluster, which is now the primary, to another cluster, which will become the standby. There are couple of ways to set up a new standby, and some considerations.
+After a [failover](#failover) to the standby cluster, you may want to set up PCR from the original standby cluster, which is now the primary, to another cluster, which will become the standby. There are multiple ways to set up a new standby, and some considerations.
In the example, the clusters are named for reference:
@@ -324,11 +323,11 @@ In the example, the clusters are named for reference:
## Job management
-During PCR, jobs running on the primary cluster will replicate to the standby cluster. Once you have [completed a failover](#step-2-complete-the-failover) (or a [failback](#failback)), refer to the following sections for details on resuming jobs on the promoted cluster.
+During PCR, jobs running on the primary cluster replicate to the standby cluster. Once you have [completed a failover](#step-2-complete-the-failover) (or a [failback](#failback)), refer to the following sections for details on resuming jobs on the promoted cluster.
### Backup schedules
-[Backup schedules]({% link {{ page.version.version }}/manage-a-backup-schedule.md %}) will pause after failover on the promoted cluster. Take the following steps to resume jobs:
+[Backup schedules]({% link {{ page.version.version }}/manage-a-backup-schedule.md %}) pause after failover on the promoted standby cluster. Take the following steps to resume jobs:
1. Verify that there are no other schedules running backups to the same [collection of backups]({% link {{ page.version.version }}/take-full-and-incremental-backups.md %}#backup-collections), i.e., the schedule that was running on the original primary cluster.
1. [Resume]({% link {{ page.version.version }}/resume-schedules.md %}) the backup schedule on the promoted cluster.
diff --git a/src/current/v25.2/physical-cluster-replication-monitoring.md b/src/current/v25.2/physical-cluster-replication-monitoring.md
index 755f166ea80..68c2a8273ea 100644
--- a/src/current/v25.2/physical-cluster-replication-monitoring.md
+++ b/src/current/v25.2/physical-cluster-replication-monitoring.md
@@ -55,7 +55,6 @@ You can use Prometheus and Alertmanager to track and alert on PCR metrics. Refer
We recommend tracking the following metrics:
- `physical_replication.logical_bytes`: The logical bytes (the sum of all keys and values) ingested by all PCR jobs.
-- `physical_replication.sst_bytes`: The [SST]({% link {{ page.version.version }}/architecture/storage-layer.md %}#ssts) bytes (compressed) sent to the KV layer by all PCR jobs.
- `physical_replication.replicated_time_seconds`: The [replicated time]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}#failover-and-promotion-process) of the physical replication stream in seconds since the Unix epoch.
## Data verification
diff --git a/src/current/v25.2/physical-cluster-replication-overview.md b/src/current/v25.2/physical-cluster-replication-overview.md
index b66b219c1c8..2ed567f7596 100644
--- a/src/current/v25.2/physical-cluster-replication-overview.md
+++ b/src/current/v25.2/physical-cluster-replication-overview.md
@@ -31,7 +31,7 @@ You can use PCR to:
- **Transactional consistency**: Avoid conflicts in data after recovery; the replication completes to a transactionally consistent state.
- **Improved RPO and RTO**: Depending on workload and deployment configuration, [replication lag]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}) between the primary and standby is generally in the tens-of-seconds range. The failover process from the primary cluster to the standby should typically happen within five minutes when completing a failover to the latest replicated time using [`LATEST`]({% link {{ page.version.version }}/alter-virtual-cluster.md %}#synopsis).
- **Failover to a timestamp in the past or the future**: In the case of logical disasters or mistakes, you can [fail over]({% link {{ page.version.version }}/failover-replication.md %}) from the primary to the standby cluster to a timestamp in the past. This means that you can return the standby to a timestamp before the mistake was replicated to the standby. Furthermore, you can plan a failover by specifying a timestamp in the future.
-- **Fast failback**: Switch back from the promoted standby cluster to the original primary cluster after a failover event without an initial scan.
+- **Fast failback**: Switch back from the promoted standby cluster to the original primary cluster after a failover event by replicating net-new changes rather than fully replacing existing data for an initial scan.
- **Read from standby cluster**: You can configure PCR to allow `SELECT` queries on the standby cluster. For more details, refer to [Start a PCR stream with read from standby]({% link {{ page.version.version }}/create-virtual-cluster.md %}#start-a-pcr-stream-with-read-from-standby).
- **Monitoring**: To monitor the replication's initial progress, current status, and performance, you can use metrics available in the [DB Console]({% link {{ page.version.version }}/ui-overview.md %}) and [Prometheus]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}). For more details, refer to [Physical Cluster Replication Monitoring]({% link {{ page.version.version }}/physical-cluster-replication-monitoring.md %}).
@@ -48,6 +48,7 @@ Frequent large schema changes or imports may cause a significant spike in [repli
This section is a quick overview of the initial requirements to start a replication stream. For more comprehensive guides, refer to:
+- [Cluster Virtualization Overview]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}): for information on enabling cluster virtualization, a requirement for setting up PCR.
- [Set Up Physical Cluster Replication]({% link {{ page.version.version }}/set-up-physical-cluster-replication.md %}): for a tutorial on how to start a replication stream.
- [Physical Cluster Replication Monitoring]({% link {{ page.version.version }}/physical-cluster-replication-monitoring.md %}): for detail on metrics and observability into a replication stream.
- [Fail Over from a Primary Cluster to a Standby Cluster]({% link {{ page.version.version }}/failover-replication.md %}): for a guide on how to complete a replication stream and fail over to the standby cluster.
@@ -68,8 +69,8 @@ Statement | Action
## Cluster versions and upgrades
-{{site.data.alerts.callout_danger}}
-The standby cluster must be at the same version as, or one version ahead of, the primary's virtual cluster.
+{{site.data.alerts.callout_info}}
+The entire standby cluster must be at the same version as, or one version ahead of, the primary's virtual cluster.
{{site.data.alerts.end}}
When PCR is enabled, upgrade with the following procedure. This upgrades the standby cluster before the primary cluster. Within the primary and standby CockroachDB clusters, the system virtual cluster must be at a cluster version greater than or equal to the virtual cluster:
@@ -80,8 +81,6 @@ When PCR is enabled, upgrade with the following procedure. This upgrades the sta
1. [Finalize]({% link {{ page.version.version }}/upgrade-cockroach-version.md %}#finalize-a-major-version-upgrade-manually) the upgrade on the standby's virtual cluster.
1. [Finalize]({% link {{ page.version.version }}/upgrade-cockroach-version.md %}#finalize-a-major-version-upgrade-manually) the upgrade on the primary's virtual cluster.
-The standby cluster must be at the same version as, or one version ahead of, the primary's virtual cluster at the time of [failover]({% link {{ page.version.version }}/failover-replication.md %}).
-
## Demo video
Learn how to use PCR to meet your RTO and RPO requirements with the following demo:
diff --git a/src/current/v25.2/physical-cluster-replication-technical-overview.md b/src/current/v25.2/physical-cluster-replication-technical-overview.md
index 83b4c3d30a9..1f7fb41af43 100644
--- a/src/current/v25.2/physical-cluster-replication-technical-overview.md
+++ b/src/current/v25.2/physical-cluster-replication-technical-overview.md
@@ -5,11 +5,11 @@ toc: true
docs_area: manage
---
-[**Physical cluster replication (PCR)**]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}) automatically and continuously streams data from an active _primary_ CockroachDB cluster to a passive _standby_ cluster. Each cluster contains: a _system virtual cluster_ and an application [virtual cluster]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}) during the PCR stream:
+[**Physical cluster replication (PCR)**]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}) continuously and asynchronously replicates data from an active _primary_ CockroachDB cluster to a passive _standby_ cluster. When both clusters are virtualized, each cluster contains a _system virtual cluster_ and an application [virtual cluster]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}) during the PCR stream:
{% include {{ page.version.version }}/physical-replication/interface-virtual-cluster.md %}
-This separation of concerns means that the replication stream can operate without affecting work happening in a virtual cluster.
+If you utilize the [read on standby](#start-up-sequence-with-read-on-standby) feature in PCR, the standby cluster has an additional reader virtual cluster that safely serves read requests on the replicating virtual cluster.
### PCR stream start-up sequence
@@ -20,7 +20,7 @@ This separation of concerns means that the replication stream can operate withou
The stream initialization proceeds as follows:
-1. The standby's consumer job connects via its system virtual cluster to the primary cluster and starts the primary cluster's physical stream producer job.
+1. The standby's consumer job connects to the primary cluster via the standby's system virtual cluster and starts the primary cluster's `REPLICATION STREAM PRODUCER` job.
1. The primary cluster chooses a timestamp at which to start the physical replication stream. Data on the primary is protected from [garbage collection]({% link {{ page.version.version }}/architecture/storage-layer.md %}#garbage-collection) until it is replicated to the standby using a [protected timestamp]({% link {{ page.version.version }}/architecture/storage-layer.md %}#protected-timestamps).
1. The primary cluster returns the timestamp and a [job ID]({% link {{ page.version.version }}/show-jobs.md %}#response) for the replication job.
1. The standby cluster retrieves a list of all nodes in the primary cluster. It uses this list to distribute work across all nodes in the standby cluster.
@@ -53,7 +53,7 @@ If the primary cluster does not receive replicated time information from the sta
### Failover and promotion process
-The tracked replicated time and the advancing protected timestamp allows the replication stream to also track _retained time_, which is a timestamp in the past indicating the lower bound that the replication stream could fail over to. Therefore, the _failover window_ for a replication job falls between the retained time and the replicated time.
+The tracked replicated time and the advancing protected timestamp allow the replication stream to also track _retained time_, which is a timestamp in the past indicating the lower bound that the replication stream could fail over to. The retained time can be up to 4 hours in the past, due to the protected timestamp. Therefore, the _failover window_ for a replication job falls between the retained time and the replicated time.
diff --git a/src/current/v25.2/set-up-physical-cluster-replication.md b/src/current/v25.2/set-up-physical-cluster-replication.md
index b69c3752065..1ed0e999277 100644
--- a/src/current/v25.2/set-up-physical-cluster-replication.md
+++ b/src/current/v25.2/set-up-physical-cluster-replication.md
@@ -5,7 +5,11 @@ toc: true
docs_area: manage
---
-In this tutorial, you will set up [**physical cluster replication (PCR)**]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}) between a primary cluster and standby cluster. The primary cluster is _active_, serving application traffic. The standby cluster is _passive_, accepting updates from the primary cluster. The replication stream will send changes from the primary to the standby.
+{{site.data.alerts.callout_info}}
+Physical cluster replication is supported in CockroachDB {{ site.data.products.core }} clusters and is in [limited access]({% link {{ page.version.version }}/cockroachdb-feature-availability.md %}) on [CockroachDB {{ site.data.products.cloud }}]({% link cockroachcloud/physical-cluster-replication.md %}).
+{{site.data.alerts.end}}
+
+In this tutorial, you will set up [**physical cluster replication (PCR)**]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}) between a primary cluster and standby cluster. The primary cluster is _active_, serving application traffic. The standby cluster is _passive_, continuously receiving updates from the primary cluster. The replication stream replicates changes from the primary to the standby.
The unit of replication is a [virtual cluster]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}), which is part of the underlying infrastructure in the primary and standby clusters.
@@ -31,11 +35,11 @@ To set up PCR from an existing CockroachDB cluster, which will serve as the prim
## Before you begin
-- Two separate CockroachDB clusters (primary and standby) with a minimum of three nodes each, and each using the same CockroachDB {{page.version.version}} version. The standby cluster should be the same version or one version ahead of the primary cluster. The primary and standby clusters must be configured with similar hardware profiles, number of nodes, and overall size. Significant discrepancies in the cluster configurations may result in degraded performance.
+- You need two separate CockroachDB clusters (primary and standby), each with a minimum of three nodes. The standby cluster should be the same version or one version ahead of the primary cluster. The primary and standby clusters must be configured with similar hardware profiles, number of nodes, and overall size. Significant discrepancies in the cluster configurations may result in degraded performance.
- To set up each cluster, you can follow [Deploy CockroachDB on Premises]({% link {{ page.version.version }}/deploy-cockroachdb-on-premises.md %}). When you initialize the cluster with the [`cockroach init`]({% link {{ page.version.version }}/cockroach-init.md %}) command, you **must** pass the `--virtualized` or `--virtualized-empty` flag. Refer to the cluster creation steps for the [primary cluster](#initialize-the-primary-cluster) and for the [standby cluster](#initialize-the-standby-cluster) for details.
- The [Deploy CockroachDB on Premises]({% link {{ page.version.version }}/deploy-cockroachdb-on-premises.md %}) tutorial creates a self-signed certificate for each {{ site.data.products.core }} cluster. To create certificates signed by an external certificate authority, refer to [Create Security Certificates using OpenSSL]({% link {{ page.version.version }}/create-security-certificates-openssl.md %}).
-- All nodes in each cluster will need access to the Certificate Authority for the other cluster. Refer to [Manage the cluster certificates](#step-3-manage-the-cluster-certificates).
-- The primary and standby clusters **must have the same [region topology]({% link {{ page.version.version }}/topology-patterns.md %})**. For example, replicating a multi-region primary cluster to a single-region standby cluster is not supported. Mismatching regions between a multi-region primary and standby cluster is also not supported.
+- All nodes in each cluster will need access to the Certificate Authority for the other cluster. Refer to [Manage cluster certificates](#step-3-manage-cluster-certificates-and-generate-connection-strings).
+- The primary and standby clusters can have different [region topologies]({% link {{ page.version.version }}/topology-patterns.md %}). However, behavior for features that rely on multi-region primitives, such as Region by Row and Region by Table, may be affected.
## Step 1. Create the primary cluster
@@ -99,7 +103,7 @@ Connect to your primary cluster's system virtual cluster using [`cockroach sql`]
Because this is the primary cluster rather than the standby cluster, the `data_state` of all rows is `ready`, rather than `replicating` or another [status]({% link {{ page.version.version }}/physical-cluster-replication-monitoring.md %}).
-### Create a replication user and password
+### Create a user with replication privileges
The standby cluster connects to the primary cluster's system virtual cluster using an identity with the `REPLICATIONSOURCE` [privilege]({% link {{ page.version.version }}/security-reference/authorization.md %}#supported-privileges). Connect to the primary cluster's system virtual cluster and create a user with a password:
@@ -110,6 +114,8 @@ The standby cluster connects to the primary cluster's system virtual cluster usi
CREATE USER {your username} WITH PASSWORD '{your password}';
~~~
+ If you need to change the password later, refer to [`ALTER USER`]({% link {{ page.version.version }}/alter-user.md %}).
+
1. Grant the [`REPLICATIONSOURCE` privilege]({% link {{ page.version.version }}/security-reference/authorization.md %}#supported-privileges) to your user:
{% include_cached copy-clipboard.html %}
@@ -117,8 +123,6 @@ The standby cluster connects to the primary cluster's system virtual cluster usi
GRANT SYSTEM REPLICATIONSOURCE TO {your username};
~~~
-If you need to change the password later, refer to [`ALTER USER`]({% link {{ page.version.version }}/alter-user.md %}).
-
### Connect to the primary virtual cluster (optional)
1. If you would like to run a sample workload on the primary's virtual cluster, open a new terminal window and use [`cockroach workload`]({% link {{ page.version.version }}/cockroach-workload.md %}) to run the workload.
@@ -219,7 +223,7 @@ Connect to your standby cluster's system virtual cluster using [`cockroach sql`]
(1 rows)
~~~
-### Create a user for the standby cluster
+### Create a user with replication privileges on the standby cluster
Create a user to run the PCR stream and access the [DB Console]({% link {{ page.version.version }}/ui-overview.md %}) to observe the job:
@@ -239,7 +243,7 @@ Create a user to run the PCR stream and access the [DB Console]({% link {{ page.
Open the DB Console in your web browser: `https://{node IP or hostname}:8080/`, where you will be prompted for these credentials. Refer to [Physical Cluster Replication Monitoring]({% link {{ page.version.version }}/physical-cluster-replication-monitoring.md %}) for more detail on tracking relevant metrics for your replication stream.
-## Step 3. Manage the cluster certificates
+## Step 3. Manage cluster certificates and generate connection strings
{{site.data.alerts.callout_danger}}
It is important to carefully manage the exchange of CA certificates between clusters if you have generated self-signed certificates with `cockroach cert` as part of the [prerequisite deployment tutorial]({% link {{ page.version.version }}/deploy-cockroachdb-on-premises.md %}).
@@ -247,17 +251,13 @@ It is important to carefully manage the exchange of CA certificates between clus
To create certificates signed by an external certificate authority, refer to [Create Security Certificates using OpenSSL]({% link {{ page.version.version }}/create-security-certificates-openssl.md %}).
{{site.data.alerts.end}}
-At this point, the primary and standby clusters are both running. The next step allows the standby cluster to connect to the primary cluster and begin ingesting its data. Depending on how you manage certificates, you must ensure that all nodes on the primary and the standby cluster have access to the certificate of the other cluster.
-
-You can use the `cockroach encode-uri` command to generate a connection string containing a cluster's certificate for any [PCR statements]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}#manage-replication-in-the-sql-shell) that require a connection string.
-
-For example, in this tutorial you will need a connection string for the primary cluster when you start the replication stream from the standby.
+At this point, the primary and standby clusters are both running. The next step creates a connection URI with the certifications needed to connect the two clusters. In most cases, we recommend ensuring that all nodes on the primary cluster have access to the certificate of the standby cluster, and vice versa. This ensures that PCR is able to parallelize the work.
-To generate a connection string, pass the replication user, IP and port, along with the directory to the certificate for the primary cluster:
+Use the `cockroach encode-uri` command to generate a connection string containing a cluster's certificate for any [PCR statements]({% link {{ page.version.version }}/physical-cluster-replication-overview.md %}#manage-replication-in-the-sql-shell) that require a connection string. Pass the replication user, IP and port, along with the path to the certificate for the **primary cluster**, into the `encode-uri` command:
{% include_cached copy-clipboard.html %}
~~~ shell
-cockroach encode-uri {replication user}:{password}@{node IP or hostname}:26257 --ca-cert certs/ca.crt --inline
+cockroach encode-uri {replication user}:{password}@{node IP or hostname}:26257 --ca-cert {path to certs directory}/certs/ca.crt --inline
~~~
The connection string output contains the primary cluster's certificate:
@@ -271,11 +271,11 @@ Copy the output ready for [Step 4](#step-4-start-replication), which requires th
## Step 4. Start replication
-The system virtual cluster in the standby cluster initiates and controls the replication stream by pulling from the primary cluster. In this section, you will connect to the primary from the standby to initiate the replication stream.
+The system virtual cluster in the standby cluster initializes and controls the replication stream by pulling from the primary cluster. In this section, you will connect to the primary from the standby to initiate the replication stream.
1. From the **standby** cluster, use your connection string to the primary:
- If you generated the connection string using [`cockroach encode-uri`](#step-3-manage-the-cluster-certificates):
+ If you generated the connection string using [`cockroach encode-uri`](#step-3-manage-cluster-certificates-and-generate-connection-strings):
{% include_cached copy-clipboard.html %}
~~~ sql
@@ -285,7 +285,7 @@ The system virtual cluster in the standby cluster initiates and controls the rep
~~~
Otherwise, pass the connection string that contains:
- - The replication user and password that you [created for the primary cluster](#create-a-replication-user-and-password).
+ - The replication user and password that you [created for the primary cluster](#create-a-user-with-replication-privileges).
- The node IP address or hostname of one node from the primary cluster.
- The path to the primary node's certificate on the standby cluster.
@@ -350,12 +350,12 @@ The system virtual cluster in the standby cluster initiates and controls the rep
## Set up PCR from an existing cluster
-You can replicate data from an existing CockroachDB cluster that does not have [cluster virtualization]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}) enabled to a standby cluster with cluster virtualization enabled. In the [PCR setup]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}), the existing cluster is the primary cluster, which serves application traffic.
+You can set up PCR replication from an existing CockroachDB cluster that does not have [cluster virtualization]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}) enabled. However, the standby cluster must have cluster virtualization enabled. In the [PCR setup]({% link {{ page.version.version }}/physical-cluster-replication-technical-overview.md %}), the existing cluster is the primary cluster.
{{site.data.alerts.callout_info}}
When you start PCR with an existing primary cluster that does **not** have [cluster virtualization]({% link {{ page.version.version }}/cluster-virtualization-overview.md %}) enabled, you will not be able to [_fail back_]({% link {{ page.version.version }}/failover-replication.md %}#failback) to the original primary cluster from the promoted, original standby.
-For more details on the failback process when you have started PCR with a non-virtualized primary, refer to [Fail back after PCR from an existing cluster]({% link {{ page.version.version }}/failover-replication.md %}#fail-back-after-pcr-from-an-existing-cluster).
+For more details on the failback process when you have started PCR with a non-virtualized primary, refer to [Fail back after replicating from an existing cluster]({% link {{ page.version.version }}/failover-replication.md %}#fail-back-after-replicating-from-an-existing-primary-cluster).
{{site.data.alerts.end}}
Before you begin, you will need:
@@ -396,7 +396,7 @@ Before you begin, you will need:
(1 row)
~~~
-1. To create the replication job, you will need a connection string for the **primary cluster** containing its CA certificate. For steps to generate a connection string with `cockroach encode-uri`, refer to [Step 3. Manage the cluster certificates](#step-3-manage-the-cluster-certificates).
+1. To create the replication job, you will need a connection string for the **primary cluster** containing its CA certificate. For steps to generate a connection string with `cockroach encode-uri`, refer to [Step 3. Manage cluster certificates and generate connection strings](#step-3-manage-cluster-certificates-and-generate-connection-strings).
1. If you would like to run a test workload on your existing **primary cluster**, you can use [`cockroach workload`]({% link {{ page.version.version }}/cockroach-workload.md %}) like the following:
@@ -441,7 +441,7 @@ At this point, your replication stream will be running.
To _fail over_ to the standby cluster, follow the instructions on the [Fail Over from a Primary Cluster to a Standby Cluster]({% link {{ page.version.version }}/failover-replication.md %}) page.
-For details on how to _fail back_ after replicating a non-virtualized cluster, refer to [Fail back after PCR from an existing cluster]({% link {{ page.version.version }}/failover-replication.md %}#fail-back-after-pcr-from-an-existing-cluster).
+For details on how to _fail back_ after replicating a non-virtualized cluster, refer to [Fail back after replicating from an existing cluster]({% link {{ page.version.version }}/failover-replication.md %}#fail-back-after-replicating-from-an-existing-primary-cluster).
## Connection reference
@@ -460,7 +460,7 @@ Cluster | Virtual Cluster | Usage | URL and Parameters
Primary | System | Set up a replication user and view running virtual clusters. Connect with [`cockroach sql`]({% link {{ page.version.version }}/cockroach-sql.md %}). | `"postgresql://root@{node IP or hostname}:{26257}?options=-ccluster=system&sslmode=verify-full"`- `options=-ccluster=system`
- `sslmode=verify-full`
Use the `--certs-dir` flag to specify the path to your certificate.
Primary | Main | Add and run a workload with [`cockroach workload`]({% link {{ page.version.version }}/cockroach-workload.md %}). | `"postgresql://root@{node IP or hostname}:{26257}?options=-ccluster=main&sslmode=verify-full&sslrootcert=certs/ca.crt&sslcert=certs/client.root.crt&sslkey=certs/client.root.key"`
{% include {{ page.version.version }}/connect/cockroach-workload-parameters.md %} As a result, for the example in this tutorial, you will need:- `options=-ccluster={virtual_cluster_name}`
- `sslmode=verify-full`
- `sslrootcert={path}/certs/ca.crt`
- `sslcert={path}/certs/client.root.crt`
- `sslkey={path}/certs/client.root.key`
Standby | System | Manage the replication stream. Connect with [`cockroach sql`]({% link {{ page.version.version }}/cockroach-sql.md %}). | `"postgresql://root@{node IP or hostname}:{26257}?options=-ccluster=system&sslmode=verify-full"`- `options=-ccluster=system`
- `sslmode=verify-full`
Use the `--certs-dir` flag to specify the path to your certificate.
-Standby/Primary | System | Connect to the other cluster. | `"postgresql://{replication user}:{password}@{node IP or hostname}:{26257}/defaultdb?options=-ccluster%3Dsystem&sslinline=true&sslmode=verify-full&sslrootcert=-----BEGIN+CERTIFICATE-----{encoded_cert}-----END+CERTIFICATE-----%0A"`
Generate the connection string with [`cockroach encode-uri`](#step-3-manage-the-cluster-certificates). Use the generated connection string in:- `CREATE VIRTUAL CLUSTER` statements to [start the replication stream](#step-4-start-replication).
- `ALTER VIRTUAL CLUSTER` statements to [fail back to the primary cluster]({% link {{ page.version.version }}/failover-replication.md %}#failback).
+Standby/Primary | System | Connect to the other cluster. | `"postgresql://{replication user}:{password}@{node IP or hostname}:{26257}/defaultdb?options=-ccluster%3Dsystem&sslinline=true&sslmode=verify-full&sslrootcert=-----BEGIN+CERTIFICATE-----{encoded_cert}-----END+CERTIFICATE-----%0A"`
Generate the connection string with [`cockroach encode-uri`](#step-3-manage-cluster-certificates-and-generate-connection-strings). Use the generated connection string in:- `CREATE VIRTUAL CLUSTER` statements to [start the replication stream](#step-4-start-replication).
- `ALTER VIRTUAL CLUSTER` statements to [fail back to the primary cluster]({% link {{ page.version.version }}/failover-replication.md %}#failback).
Standby | Read only | Run read queries on the standby's replicating virtual cluster | `"postgresql://root@{node IP or hostname}:{26257}?options=-ccluster=main-readonly&sslmode=verify-full"`- `options=-ccluster=main-readonly`
- `sslmode=verify-full`
Use the `--certs-dir` flag to specify the path to your certificate.
## What's next