diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index db0137ca..1de294e2 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -8,12 +8,12 @@ .{product} * xref:ROOT:introduction.adoc[] -* Planning +* Plan your migration ** xref:ROOT:feasibility-checklists.adoc[] ** xref:ROOT:deployment-infrastructure.adoc[] ** xref:ROOT:create-target.adoc[] ** xref:ROOT:rollback.adoc[] -* Phase 1 +* Phase 1: Deploy {product-proxy} ** xref:ROOT:phase1.adoc[] ** xref:ROOT:setup-ansible-playbooks.adoc[] ** xref:ROOT:deploy-proxy-monitoring.adoc[] @@ -21,17 +21,10 @@ ** xref:ROOT:connect-clients-to-proxy.adoc[] ** xref:ROOT:metrics.adoc[] ** xref:ROOT:manage-proxy-instances.adoc[] -* Phase 2 -** xref:ROOT:migrate-and-validate-data.adoc[] -** xref:sideloader:sideloader-zdm.adoc[] -** xref:ROOT:cassandra-data-migrator.adoc[] -** xref:ROOT:dsbulk-migrator.adoc[] -* Phase 3 -** xref:ROOT:enable-async-dual-reads.adoc[] -* Phase 4 -** xref:ROOT:change-read-routing.adoc[] -* Phase 5 -** xref:ROOT:connect-clients-to-target.adoc[] +* xref:ROOT:migrate-and-validate-data.adoc[] +* xref:ROOT:enable-async-dual-reads.adoc[] +* xref:ROOT:change-read-routing.adoc[] +* xref:ROOT:connect-clients-to-target.adoc[] * xref:ROOT:troubleshooting-tips.adoc[] * xref:ROOT:faqs.adoc[] * Release notes @@ -47,8 +40,8 @@ * xref:sideloader:troubleshoot-sideloader.adoc[] .{cass-migrator} -* xref:ROOT:cdm-overview.adoc[] +* xref:ROOT:cassandra-data-migrator.adoc[] * {cass-migrator-repo}/releases[{cass-migrator-short} release notes] .{dsbulk-migrator} -* xref:ROOT:dsbulk-migrator-overview.adoc[] \ No newline at end of file +* xref:ROOT:dsbulk-migrator.adoc[] \ No newline at end of file diff --git a/modules/ROOT/pages/cassandra-data-migrator.adoc b/modules/ROOT/pages/cassandra-data-migrator.adoc index 576cc572..501e6c36 100644 --- a/modules/ROOT/pages/cassandra-data-migrator.adoc +++ b/modules/ROOT/pages/cassandra-data-migrator.adoc @@ -1,8 +1,351 @@ = Use {cass-migrator} with {product-proxy} :navtitle: Use {cass-migrator} :description: You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases. -:page-aliases: cdm-parameters.adoc, ROOT:cdm-steps.adoc +:page-aliases: cdm-parameters.adoc, ROOT:cdm-steps.adoc, ROOT:cdm-overview.adoc -//This page was an exact duplicate of cdm-overview.adoc and the (now deleted) cdm-steps.adoc, they are just in different parts of the nav. +{description} -include::ROOT:partial$cassandra-data-migrator-body.adoc[] \ No newline at end of file +[IMPORTANT] +==== +To use {cass-migrator-short} successfully, your origin and target clusters must be {cass-short}-based databases with matching schemas. +==== + +{cass-migrator-short} is best for large or complex migrations that benefit from advanced features and configuration options, such as the following: + +* Logging and run tracking +* Automatic reconciliation +* Performance tuning +* Record filtering +* Column renaming +* Support for advanced data types, including sets, lists, maps, and UDTs +* Support for SSL, including custom cipher algorithms +* Use `writetime` timestamps to maintain chronological write history +* Use Time To Live (TTL) values to maintain data lifecycles + +For more information and a complete list of features, see the {cass-migrator-repo}?tab=readme-ov-file#features[{cass-migrator-short} GitHub repository]. + +== {cass-migrator-short} last-write-wins with {product-proxy} + +You can use {cass-migrator-short} alone, with {product-proxy}, or for data validation after using another data migration tool. + +When using {cass-migrator-short} with {product-proxy}, {cass-short}'s last-write-wins semantics ensure that new, real-time writes accurately take precedence over historical writes. + +Last-write-wins compares the `writetime` of conflicting records, and then retains the most recent write. + +For example, if a new write occurs in your target cluster with a `writetime` of `2023-10-01T12:05:00Z`, and then {cass-migrator-short} migrates a record against the same row with a `writetime` of `2023-10-01T12:00:00Z`, the target cluster retains the data from the new write because it has the most recent `writetime`. + +== Install {cass-migrator-short} + +{company} recommends that you always install the latest version of {cass-migrator-short} to get the latest features, dependencies, and bug fixes. + +[tabs] +====== +Install as a container:: ++ +-- +Get the latest `cassandra-data-migrator` image that includes all dependencies from https://hub.docker.com/r/datastax/cassandra-data-migrator[DockerHub]. + +The container's `assets` directory includes all required migration tools: `cassandra-data-migrator`, `dsbulk`, and `cqlsh`. +-- + +Install as a JAR file:: ++ +-- +. Install Java 11 or later, which includes Spark binaries. + +. Install https://spark.apache.org/downloads.html[Apache Spark(TM)] version 3.5.x with Scala 2.13 and Hadoop 3.3 and later. ++ +[tabs] +==== +Single VM:: ++ +For one-off migrations, you can install the Spark binary on a single VM where you will run the {cass-migrator-short} job. ++ +. Get the Spark tarball from the Apache Spark archive. ++ +[source,bash,subs="+quotes"] +---- +wget https://archive.apache.org/dist/spark/spark-3.5.**PATCH**/spark-3.5.**PATCH**-bin-hadoop3-scala2.13.tgz +---- ++ +Replace `**PATCH**` with your Spark patch version. ++ +. Change to the directory where you want install Spark, and then extract the tarball: ++ +[source,bash,subs="+quotes"] +---- +tar -xvzf spark-3.5.**PATCH**-bin-hadoop3-scala2.13.tgz +---- ++ +Replace `**PATCH**` with your Spark patch version. + +Spark cluster:: ++ +For large (several terabytes) migrations, complex migrations, and use of {cass-migrator-short} as a long-term data transfer utility, {company} recommends that you use a Spark cluster or Spark Serverless platform. ++ +If you deploy CDM on a Spark cluster, you must modify your `spark-submit` commands as follows: ++ +* Replace `--master "local[*]"` with the host and port for your Spark cluster, as in `--master "spark://**MASTER_HOST**:**PORT**"`. +* Remove parameters related to single-VM installations, such as `--driver-memory` and `--executor-memory`. +==== + +. Download the latest {cass-migrator-repo}/packages/1832128/versions[cassandra-data-migrator JAR file] {cass-migrator-shield}. + +. Add the `cassandra-data-migrator` dependency to `pom.xml`: ++ +[source,xml,subs="+quotes"] +---- + + datastax.cdm + cassandra-data-migrator + **VERSION** + +---- ++ +Replace `**VERSION**` with your {cass-migrator-short} version. + +. Run `mvn install`. + +If you need to build the JAR for local development or your environment only has Scala version 2.12.x, see the alternative installation instructions in the {cass-migrator-repo}?tab=readme-ov-file[{cass-migrator-short} README]. +-- +====== + +== Configure {cass-migrator-short} + +. Create a `cdm.properties` file. ++ +If you use a different name, make sure you specify the correct filename in your `spark-submit` commands. + +. Configure the properties for your environment. ++ +In the {cass-migrator-short} repository, you can find a {cass-migrator-repo}/blob/main/src/resources/cdm.properties[sample properties file with default values], as well as a {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file]. ++ +{cass-migrator-short} jobs process all uncommented parameters. +Any parameters that are commented out are ignored or use default values. ++ +If you want to reuse a properties file created for a previous {cass-migrator-short} version, make sure it is compatible with the version you are currently using. +Check the {cass-migrator-repo}/releases[{cass-migrator-short} release notes] for possible breaking changes in interim releases. +For example, the 4.x series of {cass-migrator-short} isn't backwards compatible with earlier properties files. + +. Store your properties file where it can be accessed while running {cass-migrator-short} jobs using `spark-submit`. + +[#migrate] +== Run a {cass-migrator-short} data migration job + +A data migration job copies data from a table in your origin cluster to a table with the same schema in your target cluster. + +To optimize large-scale migrations, {cass-migrator-short} can run multiple concurrent migration jobs on the same table. + +The following `spark-submit` command migrates one table from the origin to the target cluster, using the configuration in your properties file. +The migration job is specified in the `--class` argument. + +[tabs] +====== +Local installation:: ++ +-- +[source,bash,subs="+quotes,+attributes"] +---- +./spark-submit --properties-file cdm.properties \ +--conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ +--master "local[{asterisk}]" --driver-memory 25G --executor-memory 25G \ +--class com.datastax.cdm.job.Migrate cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt +---- + +Replace or modify the following, if needed: + +* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. ++ +Depending on where your properties file is stored, you might need to specify the full or relative file path. + +* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to migrate and the keyspace that it belongs to. ++ +You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. + +* `--driver-memory` and `--executor-memory`: For local installations, specify the appropriate memory settings for your environment. + +* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. +-- + +Spark cluster:: ++ +-- +[source,bash,subs="+quotes"] +---- +./spark-submit --properties-file cdm.properties \ +--conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ +--master "spark://**MASTER_HOST**:**PORT**" \ +--class com.datastax.cdm.job.Migrate cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt +---- + +Replace or modify the following, if needed: + +* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. ++ +Depending on where your properties file is stored, you might need to specify the full or relative file path. + +* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to migrate and the keyspace that it belongs to. ++ +You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. + +* `--master`: Provide the URL of your Spark cluster. + +* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. +-- +====== + +This command generates a log file (`logfile_name_**TIMESTAMP**.txt`) instead of logging output to the console. + +For additional modifications to this command, see <>. + +[#cdm-validation-steps] +== Run a {cass-migrator-short} data validation job + +After migrating data, use {cass-migrator-short}'s data validation mode to identify any inconsistencies between the origin and target tables, such as missing or mismatched records. + +Optionally, {cass-migrator-short} can automatically correct discrepancies in the target cluster during validation. + +. Use the following `spark-submit` command to run a data validation job using the configuration in your properties file. +The data validation job is specified in the `--class` argument. ++ +[tabs] +====== +Local installation:: ++ +-- +[source,bash,subs="+quotes,+attributes"] +---- +./spark-submit --properties-file cdm.properties \ +--conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ +--master "local[{asterisk}]" --driver-memory 25G --executor-memory 25G \ +--class com.datastax.cdm.job.DiffData cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt +---- + +Replace or modify the following, if needed: + +* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. ++ +Depending on where your properties file is stored, you might need to specify the full or relative file path. + +* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to validate and the keyspace that it belongs to. ++ +You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. + +* `--driver-memory` and `--executor-memory`: For local installations, specify the appropriate memory settings for your environment. + +* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. +-- + +Spark cluster:: ++ +-- +[source,bash,subs="+quotes"] +---- +./spark-submit --properties-file cdm.properties \ +--conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ +--master "spark://**MASTER_HOST**:**PORT**" \ +--class com.datastax.cdm.job.DiffData cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt +---- + +Replace or modify the following, if needed: + +* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. ++ +Depending on where your properties file is stored, you might need to specify the full or relative file path. + +* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to validate and the keyspace that it belongs to. ++ +You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. + +* `--master`: Provide the URL of your Spark cluster. + +* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. +-- +====== + +. Allow the command some time to run, and then open the log file (`logfile_name_**TIMESTAMP**.txt`) and look for `ERROR` entries. ++ +The {cass-migrator-short} validation job records differences as `ERROR` entries in the log file, listed by primary key values. +For example: ++ +[source,plaintext] +---- +23/04/06 08:43:06 ERROR DiffJobSession: Mismatch row found for key: [key3] Mismatch: Target Index: 1 Origin: valueC Target: value999) +23/04/06 08:43:06 ERROR DiffJobSession: Corrected mismatch row in target: [key3] +23/04/06 08:43:06 ERROR DiffJobSession: Missing target row found for key: [key2] +23/04/06 08:43:06 ERROR DiffJobSession: Inserted missing row in target: [key2] +---- ++ +When validating large datasets or multiple tables, you might want to extract the complete list of missing or mismatched records. +There are many ways to do this. +For example, you can grep for all `ERROR` entries in your {cass-migrator-short} log files or use the `log4j2` example provided in the {cass-migrator-repo}?tab=readme-ov-file#steps-for-data-validation[{cass-migrator-short} repository]. + +=== Run a validation job in AutoCorrect mode + +Optionally, you can run {cass-migrator-short} validation jobs in **AutoCorrect** mode, which offers the following functions: + +* `autocorrect.missing`: Add any missing records in the target with the value from the origin. + +* `autocorrect.mismatch`: Reconcile any mismatched records between the origin and target by replacing the target value with the origin value. ++ +[IMPORTANT] +==== +Timestamps have an effect on this function. + +If the `writetime` of the origin record (determined with `.writetime.names`) is before the `writetime` of the corresponding target record, then the original write won't appear in the target cluster. + +This comparative state can be challenging to troubleshoot if individual columns or cells were modified in the target cluster. +==== + +* `autocorrect.missing.counter`: By default, counter tables are not copied when missing, unless explicitly set. + +In your `cdm.properties` file, use the following properties to enable (`true`) or disable (`false`) autocorrect functions: + +[source,properties] +---- +spark.cdm.autocorrect.missing false|true +spark.cdm.autocorrect.mismatch false|true +spark.cdm.autocorrect.missing.counter false|true +---- + +The {cass-migrator-short} validation job never deletes records from either the origin or target. +Data validation only inserts or updates data on the target. + +For an initial data validation, consider disabling AutoCorrect so that you can generate a list of data discrepancies, investigate those discrepancies, and then decide whether you want to rerun the validation with AutoCorrect enabled. + +[#advanced] +== Additional {cass-migrator-short} options + +You can modify your properties file or append additional `--conf` arguments to your `spark-submit` commands to customize your {cass-migrator-short} jobs. +For example, you can do the following: + +* Check for large field guardrail violations before migrating. +* Use the `partition.min` and `partition.max` parameters to migrate or validate specific token ranges. +* Use the `track-run` feature to monitor progress and rerun a failed migration or validation job from point of failure. + +For all options, see the {cass-migrator-repo}[{cass-migrator-short} repository]. +Specifically, see the {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file]. + +== Troubleshoot {cass-migrator-short} + +.Java NoSuchMethodError +[%collapsible] +==== +If you installed Spark as a JAR file, and your Spark and Scala versions aren't compatible with your installed version of {cass-migrator-short}, {cass-migrator-short} jobs can throw exceptions such a the following: + +[source,console] +---- +Exception in thread "main" java.lang.NoSuchMethodError: 'void scala.runtime.Statics.releaseFence()' +---- + +Make sure that your Spark binary is compatible with your {cass-migrator-short} version. +If you installed an earlier version of {cass-migrator-short}, you might need to install an earlier Spark binary. +==== + +.Rerun a failed or partially completed job +[%collapsible] +==== +You can use the `track-run` feature to track the progress of a migration or validation, and then, if necessary, use the `run-id` to rerun a failed job from the last successful migration or validation point. + +For more information, see the {cass-migrator-repo}[{cass-migrator-short} repository] and the {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file]. +==== \ No newline at end of file diff --git a/modules/ROOT/pages/cdm-overview.adoc b/modules/ROOT/pages/cdm-overview.adoc deleted file mode 100644 index de50f252..00000000 --- a/modules/ROOT/pages/cdm-overview.adoc +++ /dev/null @@ -1,4 +0,0 @@ -= {cass-migrator} ({cass-migrator-short}) overview -:description: You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases. - -include::ROOT:partial$cassandra-data-migrator-body.adoc[] \ No newline at end of file diff --git a/modules/ROOT/pages/change-read-routing.adoc b/modules/ROOT/pages/change-read-routing.adoc index 777eea64..3c6cb325 100644 --- a/modules/ROOT/pages/change-read-routing.adoc +++ b/modules/ROOT/pages/change-read-routing.adoc @@ -1,96 +1,182 @@ -= Route reads to the target += Phase 4: Route reads to the target -This topic explains how you can configure {product-proxy} to route all reads to the target cluster instead of the origin cluster. +After you migrate and validate your data in xref:ROOT:migrate-and-validate-data.adoc[Phase 2], and then test your target cluster's production readiness in xref:ROOT:enable-async-dual-reads.adoc[Phase 3], you can configure {product-proxy} to route _all_ read requests to the target cluster instead of the origin cluster. -image::migration-phase4ra9.png["Phase 4 diagram shows read routing on {product-proxy} was switched to the target."] - -For illustrations of all the migration phases, see the xref:introduction.adoc#_migration_phases[Introduction]. - -== Steps - -You would typically perform these steps once you have migrated all the existing data from the origin cluster, and completed all validation checks and reconciliation if necessary. +[IMPORTANT] +==== +This phase routes production read requests to the target cluster exclusively. +Make sure all data is present on the target cluster, and it is prepared to handle full-scale production workloads. +==== -This operation is a configuration change that can be carried out as explained xref:manage-proxy-instances.adoc#change-mutable-config-variable[here]. +image::migration-phase4ra9.png[In migration Phase 4, {product-proxy}'s read routing switches to the target cluster] -[TIP] -==== -If you xref:enable-async-dual-reads.adoc[enabled asynchronous dual reads] to test your target cluster's performance, make sure that you disable asynchronous dual reads when you're done testing. +== Prerequisites -To do this, edit the `vars/zdm_proxy_core_config.yml` file, and then set the `read_mode` variable to `PRIMARY_ONLY`. +* Complete xref:ROOT:migrate-and-validate-data.adoc[Phase 2], including thorough data validation and reconciliation of any discrepancies. ++ +The success of Phase 4 depends on the target cluster having all the data from the origin cluster. ++ +If your migration was idle for some time after completing Phase 2, or you skipped Phase 3, {company} recommends re-validating the data on the target cluster before proceeding. -If you don't disable asynchronous dual reads, {product-proxy} instances send asynchronous, duplicate read requests to your origin cluster. +* Complete xref:ROOT:enable-async-dual-reads.adoc[Phase 3], and then disable asynchronous dual reads by setting `read_mode` to `PRIMARY_ONLY`. ++ +If you don't disable asynchronous dual reads, {product-proxy} sends asynchronous, duplicate read requests to your origin cluster. This is harmless but unnecessary. -==== -== Changing the read routing configuration +[#change-the-read-routing-configuration] +== Change the read routing configuration -If you're not there already, `ssh` back into the jumphost: +Read routing is controlled by a mutable configuration variable. +For more information, see xref:manage-proxy-instances.adoc#change-mutable-config-variable[Change a mutable configuration variable]. +. Connect to your Ansible Control Host container. ++ +For example, `ssh` into the jumphost: ++ [source,bash] ---- ssh -F ~/.ssh/zdm_ssh_config jumphost ---- - -On the jumphost, connect to the Ansible Control Host container: ++ +Then, connect to the Ansible Control Host container: ++ [source,bash] ---- docker exec -it zdm-ansible-container bash ---- - -You will see a prompt like: ++ +.Result +[%collapsible] +==== [source,bash] ---- ubuntu@52772568517c:~$ ---- +==== -Now open the configuration file `vars/zdm_proxy_core_config.yml` for editing. - -Change the variable `primary_cluster` to `TARGET`. +. Edit the {product-proxy} core configuration file: `vars/zdm_proxy_core_config.yml`. -Run the playbook that changes the configuration of the existing {product-proxy} deployment: +. Change the `primary_cluster` variable to `TARGET`. +. Run the rolling restart playbook to apply the configuration change to your entire {product-proxy} deployment: ++ [source,bash] ---- ansible-playbook rolling_update_zdm_proxy.yml -i zdm_ansible_inventory ---- -Wait for the {product-proxy} instances to be restarted by Ansible, one by one. -All instances will now send all reads to the target cluster instead of the origin cluster. +. Wait while Ansible restarts the {product-proxy} instances, one by one. -At this point, the target cluster becomes the primary cluster, but {product-proxy} still keeps the origin cluster up-to-date through dual writes. +Once the instances are restarted, all reads are routed to the target cluster instead of the origin cluster. -== Verifying the read routing change +At this point, the target cluster is considered the primary cluster, but {product-proxy} still keeps the origin cluster synchronized through dual writes. -Once the read routing configuration change has been rolled out, you may want to verify that reads are correctly sent to the target cluster, as expected. -This is not a required step, but you may wish to do it for peace of mind. +== Verify the read routing change -[TIP] -==== -Issuing a `DESCRIBE` or a read to any system table through {product-proxy} isn't a valid verification. +Once the read routing configuration change has been rolled out, you might want to verify that reads are being sent to the target cluster as expected. +This isn't required, but it can provide confirmation that the change was applied successfully. -{product-proxy} handles reads to system tables differently, by intercepting them and always routing them to the origin, in some cases partly populating them at the proxy level. +However, it is difficult to assess read routing because the purpose of {product-short} is to align the clusters and provide an invisible proxy layer between your client application and the database clusters. +By design, the data is expected to be identical on both clusters, and your client application has no awareness of which cluster is servicing its requests. -This means that system reads don't represent how {product-proxy} routes regular user reads. -Even after you switched the configuration to read the target cluster as the primary cluster, all system reads still go to the origin. +For this reason, the only way to manually test read routing is to intentionally write mismatched test data to the clusters. +Then, you can send a read request to {product-proxy} and see which cluster-specific data is returned, which indicates the cluster that received the read request. +There are two ways to do this. -Although `DESCRIBE` requests are not system requests, they are also generally resolved in a different way to regular requests, and should not be used as a means to verify the read routing behavior. +[tabs] +====== +Manually create mismatched tables:: ++ +-- +To manually create mismatched data, you can create a test table on each cluster, and then write different data to each table. + +[IMPORTANT] +==== +When you write the mismatched data to the tables, make sure you connect to each cluster directly. +Don't connect to {product-proxy}, because {product-proxy} will, by design, write the same data to both clusters through dual writes. ==== -Verifying that the correct routing is taking place is a slightly cumbersome operation, due to the fact that the purpose of the {product-short} process is to align the clusters and therefore, by definition, the data will be identical on both sides. +. Create a small test table on both clusters, such as a simple key/value table. +You can use an existing keyspace, or create one for this test specifically. +For example: ++ +[source,cql] +---- +CREATE TABLE test_keyspace.test_table(k TEXT PRIMARY KEY, v TEXT); +---- + +. Use `cqlsh` to connect _directly to the origin cluster_, and then insert a row with any key and a value that is specific to the origin cluster. +For example: ++ +[source,cql] +---- +INSERT INTO test_keyspace.test_table(k, v) VALUES ('1', 'Hello from the origin cluster!'); +---- + +. Use `cqlsh` to connect _directly to the target cluster_, and then insert a row with the same key and a value that is specific to the target cluster. +For example: ++ +[source,cql] +---- +INSERT INTO test_keyspace.test_table(k, v) VALUES ('1', 'Hello from the target cluster!'); +---- + +. Use `cqlsh` to xref:connect-clients-to-proxy.adoc#_connecting_cqlsh_to_the_zdm_proxy[connect to {product-proxy}], and then issue a read request to your test table. +For example: ++ +[source,cql] +---- +SELECT * FROM test_keyspace.test_table WHERE k = '1'; +---- ++ +The cluster-specific value in the response tells you which cluster received the read request. +For example: ++ +* If the read request was correctly routed to the target cluster, the result from `test_table` contains `Hello from the target cluster!`. +* If the read request was incorrectly routed to the origin cluster, the result from `test_table` contains `Hello from the origin cluster!`. + +. When you're done testing, drop the test tables from both clusters. +If you created dedicated test keyspaces, drop the keyspaces as well. +-- + +Use the Themis sample client application:: ++ +-- +The xref:connect-clients-to-proxy.adoc#_themis_client[Themis sample client application] connects directly to the origin cluster, the target cluster, and {product-proxy}. +It inserts some test data in its own, dedicated table. +Then, you can view the results of reads from each source. +For more information, see the https://github.com/absurdfarce/themis/blob/main/README.md[Themis README]. +-- +====== + +=== System tables cannot validate read routing + +Issuing a `DESCRIBE` command or read request to any system table through {product-proxy} cannot sufficiently validate read routing. + +When {product-proxy} receives system reads, it intercepts them and always routes them to the origin, regardless of the `primary_cluster` variable. +In some cases, {product-proxy} partially populates these queries at the proxy level. + +This means that system reads don't represent how {product-proxy} routes regular read requests. + +Although `DESCRIBE` requests aren't system reads, they are also resolved differently than other `DESCRIBE` requests. +Don't use `DESCRIBE` requests to verify read routing behavior. + +== Monitor and troubleshoot read performance + +After changing read routing, monitor the performance of {product-proxy} and the target cluster to ensure reads are succeeding and meeting your performance expectations. + +If read requests fail or perform poorly, you can <> back to `ORIGIN` while you investigate the issue. + +If read requests fail due to missing data, go back to xref:ROOT:migrate-and-validate-data.adoc[Phase 2] and repeat your data validation and reconciliation processes as needed to rectify the missing data errors. + +If your data model includes non-idempotent operations, ensure that this data is handled correctly during data migration, reconciliation, and ongoing dual writes. +For more information, see xref:ROOT:feasibility-checklists.adoc#non-idempotent-operations[Lightweight Transactions and other non-idempotent operations]. + +If your target cluster performs poorly, or you skipped Phase 3 previously, go back to xref:ROOT:enable-async-dual-reads.adoc[Phase 3] to test, adjust, and retest the target cluster before reattempting Phase 4. -For this reason, the only way to do a manual verification test is to force a discrepancy of some test data between the clusters. -To do this, you could consider using the xref:connect-clients-to-proxy.adoc#_themis_client[Themis sample client application]. -This client application connects directly to the origin cluster, the target cluster, and {product-proxy}. -It inserts some test data in its own table, and then you can view the results of reads from each source. -Refer to the Themis README for more information. +== Next steps -Alternatively, you could follow this manual procedure: +You can stay at this phase as long as you like. +{product-proxy} continues to perform dual writes to both clusters, keeping the origin and target clusters synchronized. -* Create a small test table on both clusters, for example a simple key/value table (it could be in an existing keyspace, or in one that you create specifically for this test). -For example `CREATE TABLE test_keyspace.test_table(k TEXT PRIMARY KEY, v TEXT);`. -* Use `cqlsh` to connect *directly to the origin cluster*. -Insert a row with any key, and with a value specific to the origin cluster, for example `INSERT INTO test_keyspace.test_table(k, v) VALUES ('1', 'Hello from the origin cluster!');`. -* Now, use `cqlsh` to connect *directly to the target cluster*. -Insert a row with the same key as above, but with a value specific to the target cluster, for example `INSERT INTO test_keyspace.test_table(k, v) VALUES ('1', 'Hello from the target cluster!');`. -* Now, use `cqlsh` to xref:connect-clients-to-proxy.adoc#_connecting_cqlsh_to_the_zdm_proxy[connect to {product-proxy}], and then issue a read request for this test table: `SELECT * FROM test_keyspace.test_table WHERE k = '1';`. -The result will clearly show you where the read actually comes from. +When you're ready to complete the migration and stop using your origin cluster, proceed to xref:ROOT:connect-clients-to-target.adoc[Phase 5] to disable dual writes and cut over to the target cluster exclusively. \ No newline at end of file diff --git a/modules/ROOT/pages/components.adoc b/modules/ROOT/pages/components.adoc index 510fff29..6fbb47fa 100644 --- a/modules/ROOT/pages/components.adoc +++ b/modules/ROOT/pages/components.adoc @@ -111,7 +111,7 @@ You can use these tools alone or with {product-proxy}. {sstable-sideloader} is a service running in {astra-db} that imports data from snapshots of your existing {cass-short}-based cluster. This tool is exclusively for migrations that move data to {astra-db}. -For more information, see xref:sideloader:sideloader-zdm.adoc[]. +For more information, see xref:sideloader:sideloader-overview.adoc[]. === {cass-migrator} diff --git a/modules/ROOT/pages/connect-clients-to-proxy.adoc b/modules/ROOT/pages/connect-clients-to-proxy.adoc index 80c542ea..b1af5465 100644 --- a/modules/ROOT/pages/connect-clients-to-proxy.adoc +++ b/modules/ROOT/pages/connect-clients-to-proxy.adoc @@ -1,5 +1,4 @@ -= Connect your client applications to {product-proxy} -:navtitle: Connect client applications to {product-proxy} += Connect client applications to {product-proxy} {product-proxy} is designed to mimic communication with a typical cluster based on {cass-reg}. This means that your client applications connect to {product-proxy} in the same way that they already connect to your existing {cass-short}-based clusters. @@ -246,4 +245,8 @@ If you need to provide credentials for an {astra-db} database, don't use the {sc Instead, use the token-based authentication option explained in <>. If you include the {scb-short}, `cqlsh` ignores all other connection arguments and connects exclusively to your {astra-db} database instead of {product-proxy}. -==== \ No newline at end of file +==== + +== Next steps + +After you connect your client applications to {product-proxy}, you can begin xref:ROOT:migrate-and-validate-data.adoc[Phase 2] of the migration, which is the data migration phase. \ No newline at end of file diff --git a/modules/ROOT/pages/connect-clients-to-target.adoc b/modules/ROOT/pages/connect-clients-to-target.adoc index 3b0764e4..c8d732e5 100644 --- a/modules/ROOT/pages/connect-clients-to-target.adoc +++ b/modules/ROOT/pages/connect-clients-to-target.adoc @@ -1,11 +1,12 @@ -= Phase 5: Connect your client applications directly to the target -:navtitle: Phase 5: Connect client applications directly to the target += Phase 5: Connect client applications to the target cluster +:navtitle: Phase 5: Connect client applications to the target -Phase 5 is the last phase of the xref:ROOT:introduction.adoc[migration process]. -In this phase, you configure your client applications to connect directly and exclusively to the target cluster. -This removes the dependency on {product-proxy} and completes the migration. +Phase 5 is the last phase of the xref:ROOT:introduction.adoc[migration process], after you route all read requests to the target cluster in xref:ROOT:change-read-routing.adoc[Phase 4]. -image::migration-phase5ra.png[In Phase 5, your applications no longer using the proxy and, instead, connect directly to the target.] +In this final phase, you connect your client applications directly and exclusively to the target cluster. +This removes the dependency on {product-proxy} and the origin cluster, thereby completing the migration process. + +image::migration-phase5ra.png[In Phase 5, your applications no longer use the proxy and, instead, connect directly to the target cluster] The minimum requirements for reconfiguring these connections depend on whether your target cluster is {astra-db} or a generic CQL cluster, such as {cass-reg}, {dse}, or {hcd}. @@ -185,7 +186,7 @@ Depending on your application's requirements, you might need to make these chang == Switch to the Data API -If you migrated to {astra-db} or {hcd-short}, and you have the option of using the Data API instead of, or in addition to, a {cass-short} driver. +If you migrated to {astra-db} or {hcd-short}, you have the option of using the Data API instead of, or in addition to, a {cass-short} driver. Although the Data API can read and write to CQL tables, it is significantly different from driver code. To use the Data API, you must rewrite your application code or create a new application. @@ -201,6 +202,6 @@ For more information, see the following: Your migration is now complete, and your target cluster is the source of truth for your client applications and data. -When you are ready, you can decommission your origin cluster and {product-proxy}, as these are no longer needed and clean xref:ROOT:rollback.adoc[rollback] is no longer possible. +When you are ready, you can decommission your origin cluster and {product-proxy} because these are no longer needed and xref:ROOT:rollback.adoc[seamless rollback] is no longer possible. -If you need to revert to the origin cluster after this point, you must perform a full migration with your previous origin cluster as the target to ensure that all data is rewritten and synchronized back to the origin. \ No newline at end of file +If you need to revert to the origin cluster after this point, you must perform a full migration in the opposite direction, with your previous origin cluster as the target, to ensure that all data is rewritten and synchronized back to the origin. \ No newline at end of file diff --git a/modules/ROOT/pages/create-target.adoc b/modules/ROOT/pages/create-target.adoc index c04e57af..67bc1588 100644 --- a/modules/ROOT/pages/create-target.adoc +++ b/modules/ROOT/pages/create-target.adoc @@ -1,7 +1,6 @@ -= Create the target environment for your migration -:navtitle: Create target environment for migration += Create the target environment -In this phase of the migration, you must create and prepare a new database (cluster) to be the target for your migration. +Before you begin your migration, you must create and prepare a new database (cluster) to be the target for your migration. You must also gather authentication credentials to allow {product-proxy} and your client applications to connect to the new database. == Prepare the target database @@ -125,4 +124,4 @@ include::ROOT:partial$multi-region-migrations.adoc[] == Next steps -* xref:ROOT:rollback.adoc[] \ No newline at end of file +Learn about xref:ROOT:rollback.adoc[rollback options] before you begin Phase 1 of the migration process. \ No newline at end of file diff --git a/modules/ROOT/pages/deploy-proxy-monitoring.adoc b/modules/ROOT/pages/deploy-proxy-monitoring.adoc index 4d9ff623..bdaac209 100644 --- a/modules/ROOT/pages/deploy-proxy-monitoring.adoc +++ b/modules/ROOT/pages/deploy-proxy-monitoring.adoc @@ -396,3 +396,7 @@ Login with: ==== Details about the metrics you can observe are available in xref:ROOT:metrics.adoc[]. ==== + +== Next steps + +To continue Phase 1 of the migration, xref:ROOT:connect-clients-to-proxy.adoc[connect your client applications to {product-proxy}]. \ No newline at end of file diff --git a/modules/ROOT/pages/deployment-infrastructure.adoc b/modules/ROOT/pages/deployment-infrastructure.adoc index 10bdcb45..cba607c1 100644 --- a/modules/ROOT/pages/deployment-infrastructure.adoc +++ b/modules/ROOT/pages/deployment-infrastructure.adoc @@ -1,8 +1,9 @@ -= Deployment and infrastructure considerations += Prepare the {product-proxy} deployment infrastructure +:navtitle: Prepare {product-proxy} infrastructure As part of planning your migration, you need to prepare your infrastructure. -== Choosing where to deploy the proxy +== Choose where to deploy the proxy A typical {product-proxy} deployment is made up of multiple proxy instances. A minimum of three proxy instances is recommended for any deployment apart from those for demo or local testing purposes. @@ -171,4 +172,4 @@ ssh -F zdm_ssh_config zdm-proxy-0 == Next steps -* xref:ROOT:create-target.adoc[] \ No newline at end of file +Next, xref:ROOT:create-target.adoc[prepare the target cluster] for your migration. \ No newline at end of file diff --git a/modules/ROOT/pages/dsbulk-migrator-overview.adoc b/modules/ROOT/pages/dsbulk-migrator-overview.adoc deleted file mode 100644 index 5ec1a6c5..00000000 --- a/modules/ROOT/pages/dsbulk-migrator-overview.adoc +++ /dev/null @@ -1,4 +0,0 @@ -= {dsbulk-migrator} overview -:description: {dsbulk-migrator} extends {dsbulk-loader} with migration commands. - -include::ROOT:partial$dsbulk-migrator-body.adoc[] \ No newline at end of file diff --git a/modules/ROOT/pages/dsbulk-migrator.adoc b/modules/ROOT/pages/dsbulk-migrator.adoc index 79e1822e..4ff82116 100644 --- a/modules/ROOT/pages/dsbulk-migrator.adoc +++ b/modules/ROOT/pages/dsbulk-migrator.adoc @@ -1,7 +1,846 @@ = Use {dsbulk-migrator} with {product-proxy} :navtitle: Use {dsbulk-migrator} :description: {dsbulk-migrator} extends {dsbulk-loader} with migration commands. +:page-aliases: ROOT:dsbulk-migrator-overview.adoc -//TODO: Reorganize this page and consider breaking it up into smaller pages. +{dsbulk-migrator} is an extension of xref:dsbulk:overview:dsbulk-about.adoc[{dsbulk-loader}] that adds the following three commands: -include::ROOT:partial$dsbulk-migrator-body.adoc[] \ No newline at end of file +* `migrate-live`: Immediately runs a live data migration using {dsbulk-loader}. + +* `generate-script`: Generates a migration script that you can use to run a data migration with a standalone {dsbulk-loader} installation. +This command _doesn't_ trigger the migration; it only generates the migration script. + +* `generate-ddl`: Reads the origin cluster's schema, and then generates CQL files that you can use to recreate the schema on your target cluster in preparation for data migration. + +{dsbulk-migrator} is best for smaller migrations and migrations that don't require extensive data validation aside from post-migration row counts. +You might also use this tool for migrations where you can shard data from large tables into more manageable quantities. + +You can use {dsbulk-migrator} alone or with {product-proxy}. + +== Install {dsbulk-migrator} + +. Install Java 11 and https://maven.apache.org/download.cgi[Maven] 3.9.x. + +. Optional: If you don't want to use the embedded {dsbulk-loader} that is bundled with {dsbulk-migrator}, you must xref:dsbulk:overview:install.adoc[install {dsbulk-loader}] before installing {dsbulk-migrator}. + +. Clone the {dsbulk-migrator-repo}[{dsbulk-migrator} repository]: ++ +[source,bash] +---- +git clone git@github.com:datastax/dsbulk-migrator.git +---- + +. Change to the cloned directory: ++ +[source,bash] +---- +cd dsbulk-migrator +---- + +. Use Maven to build {dsbulk-migrator}: ++ +[source,bash] +---- +mvn clean package +---- ++ +[[dsbulk-jar]]The build produces two distributable fat jars. +You will use one of these jars when you run a {dsbulk-migrator} command. ++ +* `dsbulk-migrator-**VERSION**-embedded-dsbulk.jar`: Contains an embedded {dsbulk-loader} installation and an embedded Java driver. ++ +Supports all {dsbulk-migrator} operations, but it is larger than the other JAR due to the presence of the {dsbulk-loader} classes. ++ +Use this jar if you _don't_ want to use your own {dsbulk-loader} installation. + +* `dsbulk-migrator-**VERSION**-embedded-driver.jar`: Contains an embedded Java driver only. ++ +Suitable for using the `generate-script` and `migrate-live` commands with your own {dsbulk-loader} installation. ++ +You cannot use this jar for `migrate-live` with the embedded {dsbulk-loader} because the required {dsbulk-loader} classes aren't present in this jar. + +. https://github.com/datastax/simulacron[Clone and build Simulacron], which is required for some {dsbulk-migrator} integration tests. ++ +Note the https://github.com/datastax/simulacron?tab=readme-ov-file#prerequisites[prerequisites for Simulacron], particularly for macOS. + +. Run the {dsbulk-migrator} integration tests: ++ +[source,bash] +---- +mvn clean verify +---- + +After you install, build, and test {dsbulk-migrator}, you can run it from the command line, specifying your desired jar, command, and options. + +For a quick test, try the `<>` option. + +For information and examples for each command, see the following: + +* <> +* <> +* <> + +[#get-help-for-dsbulk-migrator] +== Get help for {dsbulk-migrator} + +Use `--help` (`-h`) to get information about {dsbulk-migrator} commands and options: + +* Print the available {dsbulk-migrator} commands: ++ +[source,bash] +---- +java -jar /path/to/dsbulk-migrator.jar --help +---- ++ +Replace `/path/to/dsbulk-migrator.jar` with the path to your <>. + +* Print help for a specific command: ++ +[source,bash,subs="+quotes"] +---- +java -jar /path/to/dsbulk-migrator.jar **COMMAND** --help +---- ++ +Replace the following: ++ +** `/path/to/dsbulk-migrator.jar`: The path to your <>. +** `COMMAND`: The command for which you want to get help, one of `migrate-live`, `generate-script`, or `generate-ddl`. + +[#dsbulk-live] +== Run a live migration + +The `migrate-live` command immediately runs a live data migration using the embedded version of {dsbulk-loader} or your own {dsbulk-loader} installation. +A _live migration_ means the data migration starts immediately, and it is handled by the migrator tool through the specified {dsbulk-loader} installation. + +To run the `migrate-live` command, provide the path to your <> followed by `migrate-live` and any options: + +[source,bash,subs="+quotes"] +---- +java -jar /path/to/dsbulk-migrator.jar migrate-live **OPTIONS** +---- + +The following examples show how to use either fat jar to perform a live migration where the target cluster is an {astra-db} database. +The password parameters are left blank so that {dsbulk-migrator} prompts for them interactively during the migration. +All unspecified options use their default values. + +[tabs] +====== +Use the embedded {dsbulk-loader}:: ++ +-- +If you want to run the migration with the embedded {dsbulk-loader}, you must use the `dsbulk-migrator-**VERSION**-embedded-dsbulk.jar` fat jar and the `--dsbulk-use-embedded` option: + +[source,bash,subs="+quotes"] +---- + java -jar target/dsbulk-migrator-**VERSION**-embedded-dsbulk.jar migrate-live \ + --data-dir=/path/to/data/dir \ + --dsbulk-use-embedded \ + --dsbulk-log-dir=/path/to/log/dir \ + --export-host=**ORIGIN_CLUSTER_HOSTNAME** \ + --export-username=**ORIGIN_USERNAME** \ + --export-password # Origin password will be prompted \ + --export-dsbulk-option "--connector.csv.maxCharsPerColumn=65536" \ + --export-dsbulk-option "--executor.maxPerSecond=1000" \ + --import-bundle=/path/to/scb.zip \ + --import-username=token \ + --import-password # Application token will be prompted \ + --import-dsbulk-option "--connector.csv.maxCharsPerColumn=65536" \ + --import-dsbulk-option "--executor.maxPerSecond=1000" +---- +-- + +Use your own {dsbulk-loader} installation:: ++ +-- +If you want to run the migration with your own {dsbulk-loader} installation, use the `dsbulk-migrator-**VERSION**-embedded-driver.jar` fat jar, and use the `--dsbulk-cmd` option to specify the path to your {dsbulk-loader} installation: + +[source,bash,subs="+quotes,macros"] +---- + java -jar target/dsbulk-migrator-**VERSION**-embedded-driver.jar migrate-live \ + --data-dir=/path/to/data/dir \ + --dsbulk-cmd=pass:q[${DSBULK_ROOT}]/bin/dsbulk \ + --dsbulk-log-dir=/path/to/log/dir \ + --export-host=**ORIGIN_CLUSTER_HOSTNAME** \ + --export-username=**ORIGIN_USERNAME** \ + --export-password # Origin password will be prompted \ + --import-bundle=/path/to/scb.zip \ + --import-username=token \ + --import-password # Application token will be prompted +---- + +-- +====== + +=== Options for migrate-live + +Options for the `migrate-live` command are used to configure the migration parameters and connect to the origin and target clusters. + +Most options have sensible default values and don't need to be specified unless you want to override the default value. + +[cols="1,3"] +|=== +| Option | Description + +| `--data-dir` (`-d`) +| The directory where data is exported to and imported from. +The directory is created if it doesn't exist. + +The default is a `data` subdirectory in the current working directory. + +Tables are exported and imported in subdirectories of the specified data directory: One subdirectory is created for each keyspace, and then one subdirectory is created for each table within each keyspace subdirectory. + +| `--dsbulk-cmd` (`-c`) +| The path to your own external (non-embedded) {dsbulk-loader} installation, such as `--dsbulk-cmd=pass:q[${DSBULK_ROOT}]/bin/dsbulk`. + +The default is `dsbulk`, which assumes that the command is available through the `PATH` variable contents. + +Ignored if the embedded {dsbulk-loader} is used (`--dsbulk-use-embedded`). + +| `--dsbulk-log-dir` (`-l`) +| The path to the directory where you want to store {dsbulk-loader} logs, such as `--dsbulk-log-dir=~/tmp/dsbulk-logs`. +The directory is created if it doesn't exist. + +The default is a `logs` subdirectory in the current working directory. + +Each {dsbulk-loader} operation creates its own subdirectory within the specified log directory. + +This parameter applies whether you use the embedded {dsbulk-loader} or your own external (non-embedded) {dsbulk-loader} installation. + +| `--dsbulk-use-embedded` (`-e`) +| Use the embedded {dsbulk-loader}. +Accepts no arguments; it's either included (enabled) or not (disabled). + +By default, this option is disabled/omitted, and `migrate-live` expects to use an external (non-embedded) {dsbulk-loader} installation. +If disabled/omitted, set the path to your {dsbulk-loader} installation in `--dsbulk-cmd`. + +| `--dsbulk-working-dir` (`-w`) +| The path to the directory where you want to run `dsbulk`, such as `--dsbulk-working-dir=~/tmp/dsbulk-work`. +The default is the current working directory. + +Only applicable when using your own external (non-embedded) {dsbulk-loader} installation with the `--dsbulk-cmd` option. +Ignored if the embedded {dsbulk-loader} is used (`--dsbulk-use-embedded`). + +| `--export-bundle` +| If your origin cluster is an {astra-db} database, provide the path to your database's {scb}, such as `--export-bundle=/path/to/scb.zip`. + +Cannot be used with `--export-host`. + +| `--export-consistency` +| The consistency level to use when exporting data. +The default is `--export-consistency=LOCAL_QUORUM`. + +| `--export-dsbulk-option` +| An additional xref:dsbulk:reference:dsbulk-cmd.adoc#options[{dsbulk-loader} option] to use when exporting data. + +The expected format is `--export-dsbulk-option "--option.full.name=value"`. + +You must use the option's full long form name and leading dashes; short form options will fail. +You must wrap the entire expression in quotes so that it is handled correctly by {dsbulk-migrator}. +This is in addition to any xref:dsbulk:reference:dsbulk-cmd.adoc#escape-and-quote-command-line-arguments[escaping] required for {dsbulk-loader} to process the option correctly. + +To pass multiple additional options, pass each option separately with `--export-dsbulk-option`. +For example: `--export-dsbulk-option "--connector.csv.maxCharsPerColumn=65536" --export-dsbulk-option "--executor.maxPerSecond=1000"`. + +| `--export-host` +a| The origin cluster's host name or IP address, and an optional port for a node in the origin cluster. +The default port is `9042` if not specified. +For example: + +* Hostname with default port: `--export-host=db2.example.com` +* Hostname with custom port: `--export-host=db1.example.com:9001` +* IP address with default port: `--export-host=1.2.3.5` +* IP address with custom port: `--export-host=1.2.3.4:9001` + +This option can be passed multiple times. + +If your origin cluster is an {astra-db} database, use `--export-bundle` instead of `--export-host`. + +| `--export-max-concurrent-files` +| The maximum number of concurrent files to write to when exporting data from the origin cluster. + +Can be either `AUTO` (default) or a positive integer, such as `--export-max-concurrent-files=8`. + +| `--export-max-concurrent-queries` +| The maximum number of concurrent queries to execute. + +Can be either `AUTO` (default) or a positive integer, such as `--export-max-concurrent-queries=8`. + +| `--export-max-records` +| The maximum number of records to export for each table. + +The default is `-1`, which exports the entire table (all records). + +To export a fixed number of records, set to a positive integer, such as `--export-max-records=10000`. + +| `--export-password` +| The password for authentication to the origin cluster. + +You can either provide the password directly (`--export-password=pass:q[${ORIGIN_PASSWORD}]`), or pass the option without a value (`--export-password`) to be prompted for the password interactively. + +If set, then `--export-username` is required. + +If the cluster doesn't require authentication, omit both `--export-username` and `--export-password`. + +If your origin cluster is an {astra-db} database, the password is an {astra} application token. + +| `--export-protocol-version` +| The protocol version to use when connecting to the origin cluster, such as `--export-protocol-version=V4`. + +If unspecified, the driver negotiates the highest version supported by both the client and the server. + +Specify only if you want to force the protocol version. + +| `--export-splits` +a| The maximum number of token range queries to generate. + +This is an advanced setting that {company} doesn't recommend modifying unless you have a specific need to do so. + +Can be either of the following: + +* A positive integer, such as `--export-splits=16`. +* A multiple of the number of available cores, specified as `NC` where `N` is the number of cores, such as `--export-splits=8C`. + +The default is `8C` (8 times the number of available cores). + +| `--export-username` +| The username for authentication to the origin cluster. + +If set, then `--export-password` is required. + +If the cluster doesn't require authentication, omit both `--export-username` and `--export-password`. + +If your origin cluster is an {astra-db} database, the username is the literal string `token`, such as `--export-username=token`. + +| `--import-bundle` +| If your target cluster is an {astra-db} database, provide the path to your database's {scb}, such as `--import-bundle=/path/to/scb.zip`. + +Cannot be used with `--import-host`. + +| `--import-consistency` +| The consistency level to use when importing data. +The default is `--import-consistency=LOCAL_QUORUM`. + +| `--import-default-timestamp` +| The default timestamp to use when importing data. +Must be a valid instant in ISO-8601 format. +The default is `--import-default-timestamp=1970-01-01T00:00:00Z`. + +| `--import-dsbulk-option` +| An additional xref:dsbulk:reference:dsbulk-cmd.adoc#options[{dsbulk-loader} option] to use when importing data. + +The expected format is `--import-dsbulk-option "--option.full.name=value"`. + +You must use the option's full long form name and leading dashes; short form options will fail. +You must wrap the entire expression in quotes so that it is handled correctly by {dsbulk-migrator}. +This is in addition to any xref:dsbulk:reference:dsbulk-cmd.adoc#escape-and-quote-command-line-arguments[escaping] required for {dsbulk-loader} to process the option correctly. + +To pass multiple additional options, pass each option separately with `--import-dsbulk-option`. +For example: `--import-dsbulk-option "--connector.csv.maxCharsPerColumn=65536" --import-dsbulk-option "--executor.maxPerSecond=1000"`. + +| `--import-host` +a| The target cluster's host name or IP address, and an optional port for a node in the target cluster. +The default port is `9042` if not specified. +For example: + +* Hostname with default port: `--import-host=db2.example.com` +* Hostname with custom port: `--import-host=db1.example.com:9001` +* IP address with default port: `--import-host=1.2.3.5` +* IP address with custom port: `--import-host=1.2.3.4:9001` + +This option can be passed multiple times. + +If your target cluster is an {astra-db} database, use `--import-bundle` instead of `--import-host`. + +| `--import-max-concurrent-files` +| The maximum number of concurrent files to read from when importing data to the target cluster. + +Can be either `AUTO` (default) or a positive integer, such as `--import-max-concurrent-files=8`. + +| `--import-max-concurrent-queries` +| The maximum number of concurrent queries to execute. + +Can be either `AUTO` (default) or a positive integer, such as `--import-max-concurrent-queries=8`. + +| `--import-max-errors` +| The maximum number of failed records to tolerate when importing data. + +Must be a positive integer, such as `--import-max-errors=5000`. +The default is `1000`. + +Failed records are written to a `load.bad` file in the {dsbulk-loader} operation directory. + +| `--import-password` +| The password for authentication to the target cluster. + +You can either provide the password directly (`--import-password=pass:q[${TARGET_PASSWORD}]`), or pass the option without a value (`--import-password`) to be prompted for the password interactively. + +If set, then `--import-username` is required. + +If the cluster doesn't require authentication, omit both `--import-username` and `--import-password`. + +If your target cluster is an {astra-db} database, the password is an {astra} application token. + +| `--import-protocol-version` +| The protocol version to use when connecting to the target cluster, such as `--import-protocol-version=V4`. + +If unspecified, the driver negotiates the highest version supported by both the client and the server. + +Specify only if you want to force the protocol version. + +| `--import-username` +| The username for authentication to the target cluster. + +If set, then `--import-password` is required. + +If the cluster doesn't require authentication, omit both `--import-username` and `--import-password`. + +If your target cluster is an {astra-db} database, the username is the literal string `token`, such as `--import-username=token`. + +| `--keyspaces` (`-k`) +| A regular expression to select keyspaces to migrate, such as `--keyspaces="^(my_keyspace\|anotherKeyspace)$"`. + +The default expression is `^(?!system\|dse\|OpsCenter)\\w+$`, which migrates all keyspaces except system keyspaces, {dse-short}-specific keyspaces, and the OpsCenter keyspace if these are present on the origin cluster. + +Case-sensitive keyspace names must be specified by their exact case. + +| `--max-concurrent-ops` +| The maximum number of concurrent operations (exports and imports) to carry. + +The default is `1`. + +Increase this value to allow exports and imports to occur concurrently. +For example, if `--max-concurrent-ops=2`, then each table is imported as soon as it is exported, and the next table immediately begins being exported as soon as the previous table starts importing. + +| `--skip-truncate-confirmation` +| Whether to bypass truncation confirmation before actually truncating counter tables. + +The default is disabled/omitted, which means you must confirm truncation before counter tables are truncated. + +Only applicable when migrating counter tables. +This option is ignored otherwise. + +| `--tables` (`-t`) +| A regular expression to select tables to migrate, such as `--tables="^(table1\|table_two)$"`. + +The default expression is `.{asterisk}`, which migrates all tables in the keyspaces that are selected by the `--keyspaces` option. + +Case-sensitive table names must be specified by their exact case. + +| `--table-types` +a| The table types to migrate: + +* `--table-types=regular`: Migrate only regular tables. +* `--table-types=counter`: Migrate only counter tables. +* `--table-types=all` (default): Migrate both regular and counter tables. + +| `--truncate-before-export` +| Truncate counter tables before exporting them, rather than truncating them afterwards. + +The default is disabled/omitted, which means counter tables are truncated after being exported. + +Only applicable when migrating counter tables. +This option is ignored otherwise. +|=== + +[#dsbulk-script] +== Generate a migration script + +The `generate-script` command generates a migration script that you can use to perform a data migration with your own {dsbulk-loader} installation. +This command _doesn't_ trigger the migration; it only generates the migration script that you must then run. + +If you want to run a migration immediately, or you want to use the embedded {dsbulk-loader}, use the `migrate-live` command instead. + +To run the `generate-script` command, provide the path to your <> followed by `generate-script` and any options: + +[source,bash,subs="+quotes"] +---- +java -jar /path/to/dsbulk-migrator.jar generate-script **OPTIONS** +---- + +The following example generates a migration script where the target cluster is an {astra-db} database. +The `--dsbulk-cmd` option specifies the path to the {dsbulk-loader} installation that you plan to use to run the generated migration script. +All unspecified options use their default values. + +[source,bash,subs="+quotes,macros"] +---- + java -jar target/dsbulk-migrator-**VERSION**-embedded-driver.jar generate-script \ + --data-dir=/path/to/data/dir \ + --dsbulk-cmd=pass:q[${DSBULK_ROOT}]/bin/dsbulk \ + --dsbulk-log-dir=/path/to/log/dir \ + --export-host=**ORIGIN_CLUSTER_HOSTNAME** \ + --export-username=**ORIGIN_USERNAME** \ + --export-password=**ORIGIN_PASSWORD** \ + --import-bundle=/path/to/scb.zip \ + --import-username=token \ + --import-password=**ASTRA_APPLICATION_TOKEN** +---- + +=== Options for generate-script + +The options for the `generate-script` command become options in the generated migration script. +The only exceptions are the origin cluster connection parameters (`export-username`, `export-password`, `export-host`, `export-bundle`), which are used in the migration script _and_ by {dsbulk-migrator} to gather metadata about the tables to migrate. + +Most options have sensible default values and don't need to be specified unless you want to override the default value. + +[cols="1,3"] +|=== +| Option | Description + +| `--data-dir` (`-d`) +| The directory where you want the generated migration script files are stored. +The directory is created if it doesn't exist. + +The default is a `data` subdirectory in the current working directory. + +| `--dsbulk-cmd` (`-c`) +| The path to an external (non-embedded) {dsbulk-loader} installation, such as `--dsbulk-cmd=pass:q[${DSBULK_ROOT}]/bin/dsbulk`. + +The default is `dsbulk`, which assumes that the command is available through the `PATH` variable contents. + +| `--dsbulk-log-dir` (`-l`) +| The path to the directory where you want to store {dsbulk-loader} logs, such as `--dsbulk-log-dir=~/tmp/dsbulk-logs`. +The directory is created if it doesn't exist. + +The default is a `logs` subdirectory in the current working directory. + +Each {dsbulk-loader} operation creates its own subdirectory within the specified log directory. + +| `--dsbulk-working-dir` (`-w`) +| The path to the directory where you want to run `dsbulk`, such as `--dsbulk-working-dir=~/tmp/dsbulk-work`. +The default is the current working directory. + +| `--export-bundle` +| If your origin cluster is an {astra-db} database, provide the path to your database's {scb}, such as `--export-bundle=/path/to/scb.zip`. + +Cannot be used with `--export-host`. + +| `--export-consistency` +| The consistency level to use when exporting data. +The default is `--export-consistency=LOCAL_QUORUM`. + +| `--export-dsbulk-option` +| An additional xref:dsbulk:reference:dsbulk-cmd.adoc#options[{dsbulk-loader} option] to use when exporting data. + +The expected format is `--export-dsbulk-option "--option.full.name=value"`. + +You must use the option's full long form name and leading dashes; short form options will fail. +You must wrap the entire expression in quotes so that it is handled correctly by {dsbulk-migrator}. +This is in addition to any xref:dsbulk:reference:dsbulk-cmd.adoc#escape-and-quote-command-line-arguments[escaping] required for {dsbulk-loader} to process the option correctly. + +To pass multiple additional options, pass each option separately with `--export-dsbulk-option`. +For example: `--export-dsbulk-option "--connector.csv.maxCharsPerColumn=65536" --export-dsbulk-option "--executor.maxPerSecond=1000"`. + +| `--export-host` +a| The origin cluster's host name or IP address, and an optional port for a node in the origin cluster. +The default port is `9042` if not specified. +For example: + +* Hostname with default port: `--export-host=db2.example.com` +* Hostname with custom port: `--export-host=db1.example.com:9001` +* IP address with default port: `--export-host=1.2.3.5` +* IP address with custom port: `--export-host=1.2.3.4:9001` + +This option can be passed multiple times. + +If your origin cluster is an {astra-db} database, use `--export-bundle` instead of `--export-host`. + +| `--export-max-concurrent-files` +| The maximum number of concurrent files to write to when exporting data from the origin cluster. + +Can be either `AUTO` (default) or a positive integer, such as `--export-max-concurrent-files=8`. + +| `--export-max-concurrent-queries` +| The maximum number of concurrent queries to execute. + +Can be either `AUTO` (default) or a positive integer, such as `--export-max-concurrent-queries=8`. + +| `--export-max-records` +| The maximum number of records to export for each table. + +The default is `-1`, which exports the entire table (all records). + +To export a fixed number of records, set to a positive integer, such as `--export-max-records=10000`. + +| `--export-password` +| The password for authentication to the origin cluster. + +You can either provide the password directly (`--export-password=pass:q[${ORIGIN_PASSWORD}]`), or pass the option without a value (`--export-password`) to be prompted for the password interactively. + +If set, then `--export-username` is required. + +If the cluster doesn't require authentication, omit both `--export-username` and `--export-password`. + +If your origin cluster is an {astra-db} database, the password is an {astra} application token. + +| `--export-protocol-version` +| The protocol version to use when connecting to the origin cluster, such as `--export-protocol-version=V4`. + +If unspecified, the driver negotiates the highest version supported by both the client and the server. + +Specify only if you want to force the protocol version. + +| `--export-splits` +a| The maximum number of token range queries to generate. + +This is an advanced setting that {company} doesn't recommend modifying unless you have a specific need to do so. + +Can be either of the following: + +* A positive integer, such as `--export-splits=16`. +* A multiple of the number of available cores, specified as `NC` where `N` is the number of cores, such as `--export-splits=8C`. + +The default is `8C` (8 times the number of available cores). + +| `--export-username` +| The username for authentication to the origin cluster. + +If set, then `--export-password` is required. + +If the cluster doesn't require authentication, omit both `--export-username` and `--export-password`. + +If your origin cluster is an {astra-db} database, the username is the literal string `token`, such as `--export-username=token`. + +| `--import-bundle` +| If your target cluster is an {astra-db} database, provide the path to your database's {scb}, such as `--import-bundle=/path/to/scb.zip`. + +Cannot be used with `--import-host`. + +| `--import-consistency` +| The consistency level to use when importing data. +The default is `--import-consistency=LOCAL_QUORUM`. + +| `--import-default-timestamp` +| The default timestamp to use when importing data. +Must be a valid instant in ISO-8601 format. +The default is `--import-default-timestamp=1970-01-01T00:00:00Z`. + +| `--import-dsbulk-option` +| An additional xref:dsbulk:reference:dsbulk-cmd.adoc#options[{dsbulk-loader} option] to use when importing data. + +The expected format is `--import-dsbulk-option "--option.full.name=value"`. + +You must use the option's full long form name and leading dashes; short form options will fail. +You must wrap the entire expression in quotes so that it is handled correctly by {dsbulk-migrator}. +This is in addition to any xref:dsbulk:reference:dsbulk-cmd.adoc#escape-and-quote-command-line-arguments[escaping] required for {dsbulk-loader} to process the option correctly. + +To pass multiple additional options, pass each option separately with `--import-dsbulk-option`. +For example: `--import-dsbulk-option "--connector.csv.maxCharsPerColumn=65536" --import-dsbulk-option "--executor.maxPerSecond=1000"`. + +| `--import-host` +a| The target cluster's host name or IP address, and an optional port for a node in the target cluster. +The default port is `9042` if not specified. +For example: + +* Hostname with default port: `--import-host=db2.example.com` +* Hostname with custom port: `--import-host=db1.example.com:9001` +* IP address with default port: `--import-host=1.2.3.5` +* IP address with custom port: `--import-host=1.2.3.4:9001` + +This option can be passed multiple times. + +If your target cluster is an {astra-db} database, use `--import-bundle` instead of `--import-host`. + +| `--import-max-concurrent-files` +| The maximum number of concurrent files to read from when importing data to the target cluster. + +Can be either `AUTO` (default) or a positive integer, such as `--import-max-concurrent-files=8`. + +| `--import-max-concurrent-queries` +| The maximum number of concurrent queries to execute. + +Can be either `AUTO` (default) or a positive integer, such as `--import-max-concurrent-queries=8`. + +| `--import-max-errors` +| The maximum number of failed records to tolerate when importing data. + +Must be a positive integer, such as `--import-max-errors=5000`. +The default is `1000`. + +Failed records are written to a `load.bad` file in the {dsbulk-loader} operation directory. + +| `--import-password` +| The password for authentication to the target cluster. + +You can either provide the password directly (`--import-password=pass:q[${TARGET_PASSWORD}]`), or pass the option without a value (`--import-password`) to be prompted for the password interactively. + +If set, then `--import-username` is required. + +If the cluster doesn't require authentication, omit both `--import-username` and `--import-password`. + +If your target cluster is an {astra-db} database, the password is an {astra} application token. + +| `--import-protocol-version` +| The protocol version to use when connecting to the target cluster, such as `--import-protocol-version=V4`. + +If unspecified, the driver negotiates the highest version supported by both the client and the server. + +Specify only if you want to force the protocol version. + +| `--import-username` +| The username for authentication to the target cluster. + +If set, then `--import-password` is required. + +If the cluster doesn't require authentication, omit both `--import-username` and `--import-password`. + +If your target cluster is an {astra-db} database, the username is the literal string `token`, such as `--import-username=token`. + +| `--keyspaces` (`-k`) +| A regular expression to select keyspaces to migrate, such as `--keyspaces="^(my_keyspace\|anotherKeyspace)$"`. + +The default expression is `^(?!system\|dse\|OpsCenter)\\w+$`, which migrates all keyspaces except system keyspaces, {dse-short}-specific keyspaces, and the OpsCenter keyspace if these are present on the origin cluster. + +Case-sensitive keyspace names must be specified by their exact case. + +| `--tables` (`-t`) +| A regular expression to select tables to migrate, such as `--tables="^(table1\|table_two)$"`. + +The default expression is `.{asterisk}`, which migrates all tables in the keyspaces that are selected by the `--keyspaces` option. + +Case-sensitive table names must be specified by their exact case. + +| `--table-types` +a| The table types to migrate: + +* `--table-types=regular`: Migrate only regular tables. +* `--table-types=counter`: Migrate only counter tables. +* `--table-types=all` (default): Migrate both regular and counter tables. +|=== + +=== Unsupported live migration options for migration scripts + +The following `migrate-live` options cannot be set in `generate-script`. +If you want to use these options, you must run the migration directly with `migrate-live` instead of generating a script. + +* `--dsbulk-use-embedded`: Not applicable to `generate-script` because the resulting script is intended to be run with your own (non-embedded) {dsbulk-loader} installation. + +* `--max-concurrent-ops`: Cannot be customized in `generate-script`. +Uses the default value of `1`. + +* `--skip-truncate-confirmation`: Cannot be customized in `generate-script`. +Uses the default behavior of requiring confirmation before truncating counter tables. + +* `--truncate-before-export`: Cannot be customized in `generate-script`. +Uses the default behavior of truncating counter tables after exporting them. + +* `--data-dir`: In `generate-script`, this parameter sets the location to store the generated script files. +There is no `generate-script` option to set a custom data directory for the migration's actual import and export operations. +When you run the migration script, the default data directory is used for the data export and import operations, which is a `data` subdirectory in the current working directory. + +[#dsbulk-ddl] +== Generate DDL files + +The `generate-ddl` command reads the origin cluster's schema, and then generates CQL files that you can use to recreate the schema on your target CQL-compatible cluster. + +To run the `generate-ddl` command, provide the path to your <> followed by `generate-ddl` and any options: + +[source,bash,subs="+quotes"] +---- +java -jar /path/to/dsbulk-migrator.jar generate-ddl **OPTIONS** +---- + +The following example generates DDL files that are optimized for recreating the schema on an {astra-db} database: + +[source,bash,subs="+quotes"] +---- + java -jar target/dsbulk-migrator-**VERSION**-embedded-driver.jar generate-ddl \ + --data-dir=/path/to/data/directory \ + --export-host=**ORIGIN_CLUSTER_HOSTNAME** \ + --export-username=**ORIGIN_USERNAME** \ + --export-password=**ORIGIN_PASSWORD** \ + --optimize-for-astra +---- + +=== Options for generate-ddl + +The `generate-ddl` command ignores all `import-{asterisk}` options and {dsbulk-loader}-related options because they aren't relevant to this operation. + +Origin cluster connection details (`export-{asterisk}` options) are required so that {dsbulk-migrator} can access the origin cluster to gather metadata about the keyspaces and tables for the DDL statements. + +Most options have sensible default values and don't need to be specified unless you want to override the default value. + +[cols="1,3"] +|=== +| Option | Description + +| `--data-dir` (`-d`) +| The directory where you want to store the generated CQL files. +The directory is created if it doesn't exist. + +The default is a `data` subdirectory in the current working directory. + +| `--export-bundle` +| If your origin cluster is an {astra-db} database, provide the path to your database's {scb}, such as `--export-bundle=/path/to/scb.zip`. + +Cannot be used with `--export-host`. + +| `--export-host` +a| The origin cluster's host name or IP address, and an optional port for a node in the origin cluster. +The default port is `9042` if not specified. +For example: + +* Hostname with default port: `--export-host=db2.example.com` +* Hostname with custom port: `--export-host=db1.example.com:9001` +* IP address with default port: `--export-host=1.2.3.5` +* IP address with custom port: `--export-host=1.2.3.4:9001` + +This option can be passed multiple times. + +If your origin cluster is an {astra-db} database, use `--export-bundle` instead of `--export-host`. + +| `--export-password` +| The password for authentication to the origin cluster. + +You can either provide the password directly (`--export-password=pass:q[${ORIGIN_PASSWORD}]`), or pass the option without a value (`--export-password`) to be prompted for the password interactively. + +If set, then `--export-username` is required. + +If the cluster doesn't require authentication, omit both `--export-username` and `--export-password`. + +If your origin cluster is an {astra-db} database, the password is an {astra} application token. + +| `--export-protocol-version` +| The protocol version to use when connecting to the origin cluster, such as `--export-protocol-version=V4`. + +If unspecified, the driver negotiates the highest version supported by both the client and the server. + +Specify only if you want to force the protocol version. + +| `--export-username` +| The username for authentication to the origin cluster. + +If set, then `--export-password` is required. + +If the cluster doesn't require authentication, omit both `--export-username` and `--export-password`. + +If your origin cluster is an {astra-db} database, the username is the literal string `token`, such as `--export-username=token`. + +| `--keyspaces` (`-k`) +| A regular expression to select keyspaces to include in the generated CQL files, such as `--keyspaces="^(my_keyspace\|anotherKeyspace)$"`. + +The default expression is `^(?!system\|dse\|OpsCenter)\\w+$`, which includes all keyspaces except system keyspaces, {dse-short}-specific keyspaces, and the OpsCenter keyspace if these are present on the origin cluster. + +Case-sensitive keyspace names must be specified by their exact case. + +| `--optimize-for-astra` (`-a`) +| Produce CQL files optimized for {astra-db}. + +xref:astra-db-serverless:cql:develop-with-cql.adoc#unsupported-values-are-ignored[{astra-db} doesn't support all CQL options in DDL statements]. +This option omits forbidden CQL options from the generated CQL files so you can use them to create the schema in your {astra-db} database without producing warnings or errors. + +The default is disabled/omitted, which generates the CQL files as-is without any {astra-db}-specific optimizations. + +| `--tables` (`-t`) +| A regular expression to select tables to include in the generated CQL files, such as `--tables="^(table1\|table_two)$"`. + +The default expression is `.{asterisk}`, which includes all tables in the keyspaces that are selected by the `--keyspaces` option. + +Case-sensitive table names must be specified by their exact case. + +| `--table-types` +a| The table types to include in the generated CQL files: + +* `--table-types=regular`: Include only regular tables. +* `--table-types=counter`: Include only counter tables. +* `--table-types=all` (default): Include both regular and counter tables. +|=== \ No newline at end of file diff --git a/modules/ROOT/pages/enable-async-dual-reads.adoc b/modules/ROOT/pages/enable-async-dual-reads.adoc index 8d8057f1..0a0e0616 100644 --- a/modules/ROOT/pages/enable-async-dual-reads.adoc +++ b/modules/ROOT/pages/enable-async-dual-reads.adoc @@ -1,7 +1,10 @@ -= Enable asynchronous dual reads += Phase 3: Enable asynchronous dual reads :description: Use asynchronous dual reads to test your target database's ability to handle a simulated production workload. -In this optional phase, you can enable the _asynchronous dual reads_ feature to test the secondary (target) cluster's ability to handle a production workload before you permanently redirect read requests in xref:ROOT:change-read-routing.adoc[phase 4]. +After migrating and validating your data in xref:ROOT:migrate-and-validate-data.adoc[Phase 2], you can begin to test your target cluster's production readiness. + +Phase 3 is optional but highly recommended. +In this phase, you enable the _asynchronous dual reads_ feature to test the secondary (target) cluster's ability to handle a production workload before you permanently redirect read requests in xref:ROOT:change-read-routing.adoc[Phase 4]. By default, {product-proxy} sends all reads to the primary (origin) cluster, and then returns the result to the client application. @@ -10,7 +13,7 @@ When you enable _asynchronous dual reads_, {product-proxy} sends asynchronous re At this point in the migration process, the secondary cluster is typically the target cluster. Because this feature is intended to test your target cluster's ability to handle a simulated production workload, there is no reason to run it against the origin cluster that is already capable of handling production workloads. -image:migration-phase3ra.png["Migration phase 3 diagram with asynchronous dual reads sent to the secondary cluster."] +image:migration-phase3ra.png[In migration Phase 3, enable asynchronous dual reads to send read requests to both clusters] This allows you to assess the target cluster's performance and make any adjustments before permanently switching your workloads to the target cluster. @@ -93,6 +96,23 @@ To assess performance, you can monitor the following: If needed, adjust the target cluster's configuration and continue monitoring until the cluster reaches your performance targets. +If read requests fail due to missing data, go back to xref:ROOT:migrate-and-validate-data.adoc[Phase 2] and repeat your data validation and reconciliation processes as needed to rectify the missing data errors. + +If your data model includes non-idempotent operations, ensure that this data is handled correctly during data migration, reconciliation, and ongoing dual writes. +For more information, see xref:ROOT:feasibility-checklists.adoc#non-idempotent-operations[Lightweight Transactions and other non-idempotent operations]. + == Next steps -When you are confident that the target cluster is prepared to handle production workloads, you can <>, and then permanently xref:ROOT:change-read-routing.adoc[route read requests to the target cluster]. \ No newline at end of file +[IMPORTANT] +==== +Don't rush through this phase. + +Take time to verify that the target cluster can handle the expected production workloads successfully before proceeding to Phase 4. +Make adjustments and reassess performance until you are confident in the cluster's readiness. + +Asynchronous dual reads simulate real read requests without impacting your applications or end users. +However, in Phase 4, genuine production read requests are directed to the target cluster _exclusively_. +If your target cluster hasn't been fully tested and prepared for production workloads, your applications can experience performance degradation, timeouts, and other issues. +==== + +When you are confident that the target cluster is prepared to handle production workloads, you can <>, and then proceed to xref:ROOT:change-read-routing.adoc[Phase 4] where you will permanently route read requests to the target cluster. \ No newline at end of file diff --git a/modules/ROOT/pages/feasibility-checklists.adoc b/modules/ROOT/pages/feasibility-checklists.adoc index 42fabb83..cdfe3a36 100644 --- a/modules/ROOT/pages/feasibility-checklists.adoc +++ b/modules/ROOT/pages/feasibility-checklists.adoc @@ -239,4 +239,4 @@ The origin and target clusters can have different authentication configurations == Next steps -* xref:ROOT:deployment-infrastructure.adoc[] \ No newline at end of file +Next, xref:ROOT:deployment-infrastructure.adoc[prepare the {product-proxy} infrastructure]. \ No newline at end of file diff --git a/modules/ROOT/pages/index.adoc b/modules/ROOT/pages/index.adoc index 31cab81b..c00e946b 100644 --- a/modules/ROOT/pages/index.adoc +++ b/modules/ROOT/pages/index.adoc @@ -73,7 +73,7 @@ svg::sideloader:astra-migration-toolkit.svg[role="absolute bottom-1/2 translate-

{cass-migrator-short} can migrate and validate data between {cass-short}-based clusters, with optional logging and reconciliation support.

- xref:ROOT:cdm-overview.adoc[Get started with {cass-migrator-short}] + xref:ROOT:cassandra-data-migrator.adoc[Get started with {cass-migrator-short}]
diff --git a/modules/ROOT/pages/introduction.adoc b/modules/ROOT/pages/introduction.adoc index 63ad2bda..aa612ed0 100644 --- a/modules/ROOT/pages/introduction.adoc +++ b/modules/ROOT/pages/introduction.adoc @@ -48,7 +48,7 @@ The _target_ is your new {cass-short}-based environment where you want to migrat Before you begin a migration, your client applications perform read/write operations with your existing xref:cql:ROOT:index.adoc[CQL]-compatible database, such as {cass}, {dse-short}, {hcd-short}, or {astra-db}. -image:pre-migration0ra.png["Pre-migration environment."] +image:pre-migration0ra.png[Before the migration begins, your applications connect exclusively to your origin cluster] While your application is stable with the current data model and database platform, you might need to make some adjustments before enabling {product-proxy}. @@ -74,9 +74,9 @@ In this first phase, deploy the {product-proxy} instances and connect client app This phase activates the dual-write logic. Writes are sent to both the origin and target databases, while reads are executed on the origin only. -For more information and instructions, see xref:ROOT:phase1.adoc[]. +For more information and instructions, see xref:ROOT:phase1.adoc[Phase 1: Deploy and connect {product-proxy}]. -image:migration-phase1ra.png["Migration Phase 1."] +image:migration-phase1ra.png[In Phase 1, you deploy and connect {product-proxy}] === Phase 2: Migrate data @@ -86,7 +86,7 @@ Then, you thoroughly validate the migrated data, resolving missing and mismatche For more information and instructions, see xref:ROOT:migrate-and-validate-data.adoc[]. -image:migration-phase2ra.png["Migration Phase 2."] +image:migration-phase2ra.png[In Phase 2, you migrate and validate data from the origin cluster to the target cluster] === Phase 3: Enable asynchronous dual reads @@ -98,7 +98,7 @@ When enabled, {product-proxy} sends asynchronous read requests to the secondary For more information, see xref:ROOT:enable-async-dual-reads.adoc[] and xref:ROOT:components.adoc#how_zdm_proxy_handles_reads_and_writes[How {product-proxy} handles reads and writes]. -image:migration-phase3ra.png["Migration Phase 3."] +image:migration-phase3ra.png[In Phase 3, you test your target cluster's production readiness] === Phase 4: Route reads to the target database @@ -109,7 +109,7 @@ At this point, the target database becomes the primary database. For more information and instructions, see xref:ROOT:change-read-routing.adoc[]. -image:migration-phase4ra9.png["Migration Phase 4."] +image:migration-phase4ra9.png[In Phase 4, you route reads to the target cluster exclusively] === Phase 5: Connect directly to the target database @@ -122,7 +122,7 @@ However, be aware that the origin database is no longer synchronized with the ta For more information, see xref:ROOT:connect-clients-to-target.adoc[]. -image:migration-phase5ra.png["Migration Phase 5."] +image:migration-phase5ra.png[In Phase 5, you connect your client applications directly and exclusively to the target cluster] [#lab] == {product} interactive lab diff --git a/modules/ROOT/pages/migrate-and-validate-data.adoc b/modules/ROOT/pages/migrate-and-validate-data.adoc index e2a8e2e6..5be69633 100644 --- a/modules/ROOT/pages/migrate-and-validate-data.adoc +++ b/modules/ROOT/pages/migrate-and-validate-data.adoc @@ -1,8 +1,11 @@ -= Migrate and validate data += Phase 2: Migrate and validate data +:page-aliases: ROOT:sideloader-zdm.adoc + +In xref:ROOT:phase1.adoc[Phase 1], you set up {product-proxy} to orchestrate live traffic to your origin and target clusters. In Phase 2 of {product}, you migrate data from the origin to the target, and then validate the migrated data. -image::migration-phase2ra.png[In {product-short} Phase 2, you migrate data from the origin cluster to the target cluster.] +image::migration-phase2ra.png[In {product-short} Phase 2, you migrate data from the origin cluster to the target cluster] To move and validate data, you can use a dedicated data migration tool, such as {sstable-sideloader}, {cass-migrator}, or {dsbulk-migrator}, or your can create your own custom data migration script. @@ -10,19 +13,31 @@ To move and validate data, you can use a dedicated data migration tool, such as == {sstable-sideloader} -{sstable-sideloader} is a service running in {astra-db} that imports data from snapshots of your existing {cass-reg}-based cluster. This tool is exclusively for migrations that move data to {astra-db}. +{sstable-sideloader} is a service running in {astra-db} that imports data from snapshots of your existing {cass-reg}-based cluster. +Because it imports data directly, {sstable-sideloader} can offer several advantages over CQL-based tools like {dsbulk-migrator} and {cass-migrator}, including faster, more cost-effective data loading, and minimal performance impacts on your origin cluster and target database. + +To migrate data with {sstable-sideloader}, you use `nodetool`, a cloud provider's CLI, and the {astra} {devops-api}: + +* *`nodetool`*: Create snapshots of your existing {dse-short}, {hcd-short}, or open-source {cass-short} cluster. +For compatible origin clusters, see xref:ROOT:astra-migration-paths.adoc[]. +* *Cloud provider CLI*: Upload snapshots to a dedicated cloud storage bucket for your migration. +* *{astra} {devops-api}*: Run the {sstable-sideloader} commands to write the data from cloud storage to your {astra-db} database. + You can use {sstable-sideloader} alone or with {product-proxy}. -For more information, see xref:sideloader:sideloader-zdm.adoc[]. +For more information and instructions, see xref:sideloader:sideloader-overview.adoc[]. + +.Use {sstable-sideloader} with {product-proxy} +svg::sideloader:astra-migration-toolkit.svg[] == {cass-migrator} You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-short}-based databases. It offers extensive functionality and configuration options to support large and complex migrations as well as post-migration data validation. -You can use {cass-migrator-short} by itself, with {product-proxy}, or for data validation after using another data migration tool. +You can use {cass-migrator-short} alone, with {product-proxy}, or for data validation after using another data migration tool. For more information, see xref:ROOT:cassandra-data-migrator.adoc[]. @@ -48,4 +63,17 @@ This is crucial to a successful migration. * Preserves the data model, including column names and data types, so that {product-proxy} can send the same read/write statements to both databases successfully. + Migrations that perform significant data transformations might not be compatible with {product-proxy}. -The impact of data transformations depends on your specific data model, database platforms, and the scale of your migration. \ No newline at end of file +The impact of data transformations depends on your specific data model, database platforms, and the scale of your migration. + +== Next steps + +[IMPORTANT] +==== +Don't proceed to Phase 3 until you have replicated _all_ preexisting data from your origin cluster to your target cluster, _and_ you have taken time to validate that the data was migrated correctly and completely. + +The success of your migration and future performance of the target cluster depends on correct and complete data. + +If your chosen data migration tool doesn't have built-in validation features, you need to use a separate tool for validation. +==== + +After using your chosen data migration tool to migrate and thoroughly validate your data, proceed to xref:ROOT:enable-async-dual-reads.adoc[Phase 3] to test your target cluster's production readiness. \ No newline at end of file diff --git a/modules/ROOT/pages/phase1.adoc b/modules/ROOT/pages/phase1.adoc index 78473841..ab913321 100644 --- a/modules/ROOT/pages/phase1.adoc +++ b/modules/ROOT/pages/phase1.adoc @@ -1,11 +1,14 @@ -= Phase 1: Deploy {product-proxy} and connect client applications += Deploy and connect {product-proxy} -This section presents the following: +After you plan and prepare for your migration, you can start Phase 1 of the migration process where you deploy and connect {product-proxy}. -* xref:setup-ansible-playbooks.adoc[] -* xref:deploy-proxy-monitoring.adoc[] -** xref:tls.adoc[] -* xref:connect-clients-to-proxy.adoc[] -* xref:manage-proxy-instances.adoc[] +image::migration-phase1ra.png[In migration Phase 1, you deploy {product-proxy} instances, and then connect your client applications to the proxies] -image::migration-phase1ra.png[Phase 1 diagram shows deployed {product-proxy} instances, client app connections to proxies, and the target cluster is setup.] \ No newline at end of file +To complete Phase 1, do the following: + +. xref:setup-ansible-playbooks.adoc[]. +. xref:deploy-proxy-monitoring.adoc[] with optional xref:tls.adoc[TLS]. +. xref:connect-clients-to-proxy.adoc[]. + +During the migration you will modify {product-proxy} configuration settings and monitor your {product-proxy} instances. +Before you proceed to Phase 2, make sure you understand how to xref:manage-proxy-instances.adoc[manage {product-proxy} instances] and xref:metrics.adoc[use {product-proxy} metrics]. \ No newline at end of file diff --git a/modules/ROOT/pages/rollback.adoc b/modules/ROOT/pages/rollback.adoc index 45cab323..5d7c7712 100644 --- a/modules/ROOT/pages/rollback.adoc +++ b/modules/ROOT/pages/rollback.adoc @@ -1,5 +1,4 @@ -= Understand the rollback options -:navtitle: Understand rollback options += Understand rollback options At any point from Phase 1 through Phase 4, if you encounter an unexpected issue and need to stop or roll back the migration, you can revert your client applications to connect directly to the origin cluster. @@ -19,4 +18,4 @@ In this case, you use your original target cluster as the new origin cluster, an == Next steps -* xref:ROOT:phase1.adoc[] +After preparing the infrastructure for {product-proxy} and your target cluster, begin xref:ROOT:phase1.adoc[Phase 1] of the migration. \ No newline at end of file diff --git a/modules/ROOT/pages/setup-ansible-playbooks.adoc b/modules/ROOT/pages/setup-ansible-playbooks.adoc index 07065911..905b6bab 100644 --- a/modules/ROOT/pages/setup-ansible-playbooks.adoc +++ b/modules/ROOT/pages/setup-ansible-playbooks.adoc @@ -232,3 +232,7 @@ image::zdm-go-utility-results3.png[A summary of the configuration provided is di + image::zdm-go-utility-success3.png[Ansible Docker container success messages] + +== Next steps + +After you use {product-utility} to set up the Ansible Control Host container, you can xref:deploy-proxy-monitoring.adoc[use {product-automation} to deploy your {product-proxy} instances and the monitoring stack]. \ No newline at end of file diff --git a/modules/ROOT/partials/cassandra-data-migrator-body.adoc b/modules/ROOT/partials/cassandra-data-migrator-body.adoc deleted file mode 100644 index f1ba3f01..00000000 --- a/modules/ROOT/partials/cassandra-data-migrator-body.adoc +++ /dev/null @@ -1,344 +0,0 @@ -{description} -It is best for large or complex migrations that benefit from advanced features and configuration options, such as the following: - -* Logging and run tracking -* Automatic reconciliation -* Performance tuning -* Record filtering -* Column renaming -* Support for advanced data types, including sets, lists, maps, and UDTs -* Support for SSL, including custom cipher algorithms -* Use `writetime` timestamps to maintain chronological write history -* Use Time To Live (TTL) values to maintain data lifecycles - -For more information and a complete list of features, see the {cass-migrator-repo}?tab=readme-ov-file#features[{cass-migrator-short} GitHub repository]. - -== {cass-migrator} requirements - -To use {cass-migrator-short} successfully, your origin and target clusters must be {cass-short}-based databases with matching schemas. - -== {cass-migrator-short} with {product-proxy} - -You can use {cass-migrator-short} alone, with {product-proxy}, or for data validation after using another data migration tool. - -When using {cass-migrator-short} with {product-proxy}, {cass-short}'s last-write-wins semantics ensure that new, real-time writes accurately take precedence over historical writes. - -Last-write-wins compares the `writetime` of conflicting records, and then retains the most recent write. - -For example, if a new write occurs in your target cluster with a `writetime` of `2023-10-01T12:05:00Z`, and then {cass-migrator-short} migrates a record against the same row with a `writetime` of `2023-10-01T12:00:00Z`, the target cluster retains the data from the new write because it has the most recent `writetime`. - -== Install {cass-migrator} - -{company} recommends that you always install the latest version of {cass-migrator-short} to get the latest features, dependencies, and bug fixes. - -[tabs] -====== -Install as a container:: -+ --- -Get the latest `cassandra-data-migrator` image that includes all dependencies from https://hub.docker.com/r/datastax/cassandra-data-migrator[DockerHub]. - -The container's `assets` directory includes all required migration tools: `cassandra-data-migrator`, `dsbulk`, and `cqlsh`. --- - -Install as a JAR file:: -+ --- -. Install Java 11 or later, which includes Spark binaries. - -. Install https://spark.apache.org/downloads.html[Apache Spark(TM)] version 3.5.x with Scala 2.13 and Hadoop 3.3 and later. -+ -[tabs] -==== -Single VM:: -+ -For one-off migrations, you can install the Spark binary on a single VM where you will run the {cass-migrator-short} job. -+ -. Get the Spark tarball from the Apache Spark archive. -+ -[source,bash,subs="+quotes"] ----- -wget https://archive.apache.org/dist/spark/spark-3.5.**PATCH**/spark-3.5.**PATCH**-bin-hadoop3-scala2.13.tgz ----- -+ -Replace `**PATCH**` with your Spark patch version. -+ -. Change to the directory where you want install Spark, and then extract the tarball: -+ -[source,bash,subs="+quotes"] ----- -tar -xvzf spark-3.5.**PATCH**-bin-hadoop3-scala2.13.tgz ----- -+ -Replace `**PATCH**` with your Spark patch version. - -Spark cluster:: -+ -For large (several terabytes) migrations, complex migrations, and use of {cass-migrator-short} as a long-term data transfer utility, {company} recommends that you use a Spark cluster or Spark Serverless platform. -+ -If you deploy CDM on a Spark cluster, you must modify your `spark-submit` commands as follows: -+ -* Replace `--master "local[*]"` with the host and port for your Spark cluster, as in `--master "spark://**MASTER_HOST**:**PORT**"`. -* Remove parameters related to single-VM installations, such as `--driver-memory` and `--executor-memory`. -==== - -. Download the latest {cass-migrator-repo}/packages/1832128/versions[cassandra-data-migrator JAR file] {cass-migrator-shield}. - -. Add the `cassandra-data-migrator` dependency to `pom.xml`: -+ -[source,xml,subs="+quotes"] ----- - - datastax.cdm - cassandra-data-migrator - **VERSION** - ----- -+ -Replace `**VERSION**` with your {cass-migrator-short} version. - -. Run `mvn install`. - -If you need to build the JAR for local development or your environment only has Scala version 2.12.x, see the alternative installation instructions in the {cass-migrator-repo}?tab=readme-ov-file[{cass-migrator-short} README]. --- -====== - -== Configure {cass-migrator-short} - -. Create a `cdm.properties` file. -+ -If you use a different name, make sure you specify the correct filename in your `spark-submit` commands. - -. Configure the properties for your environment. -+ -In the {cass-migrator-short} repository, you can find a {cass-migrator-repo}/blob/main/src/resources/cdm.properties[sample properties file with default values], as well as a {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file]. -+ -{cass-migrator-short} jobs process all uncommented parameters. -Any parameters that are commented out are ignored or use default values. -+ -If you want to reuse a properties file created for a previous {cass-migrator-short} version, make sure it is compatible with the version you are currently using. -Check the {cass-migrator-repo}/releases[{cass-migrator-short} release notes] for possible breaking changes in interim releases. -For example, the 4.x series of {cass-migrator-short} isn't backwards compatible with earlier properties files. - -. Store your properties file where it can be accessed while running {cass-migrator-short} jobs using `spark-submit`. - -[#migrate] -== Run a {cass-migrator-short} data migration job - -A data migration job copies data from a table in your origin cluster to a table with the same schema in your target cluster. - -To optimize large-scale migrations, {cass-migrator-short} can run multiple concurrent migration jobs on the same table. - -The following `spark-submit` command migrates one table from the origin to the target cluster, using the configuration in your properties file. -The migration job is specified in the `--class` argument. - -[tabs] -====== -Local installation:: -+ --- -[source,bash,subs="+quotes,+attributes"] ----- -./spark-submit --properties-file cdm.properties \ ---conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ ---master "local[{asterisk}]" --driver-memory 25G --executor-memory 25G \ ---class com.datastax.cdm.job.Migrate cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt ----- - -Replace or modify the following, if needed: - -* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. -+ -Depending on where your properties file is stored, you might need to specify the full or relative file path. - -* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to migrate and the keyspace that it belongs to. -+ -You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. - -* `--driver-memory` and `--executor-memory`: For local installations, specify the appropriate memory settings for your environment. - -* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. --- - -Spark cluster:: -+ --- -[source,bash,subs="+quotes"] ----- -./spark-submit --properties-file cdm.properties \ ---conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ ---master "spark://**MASTER_HOST**:**PORT**" \ ---class com.datastax.cdm.job.Migrate cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt ----- - -Replace or modify the following, if needed: - -* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. -+ -Depending on where your properties file is stored, you might need to specify the full or relative file path. - -* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to migrate and the keyspace that it belongs to. -+ -You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. - -* `--master`: Provide the URL of your Spark cluster. - -* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. --- -====== - -This command generates a log file (`logfile_name_**TIMESTAMP**.txt`) instead of logging output to the console. - -For additional modifications to this command, see <>. - -[#cdm-validation-steps] -== Run a {cass-migrator-short} data validation job - -After migrating data, use {cass-migrator-short}'s data validation mode to identify any inconsistencies between the origin and target tables, such as missing or mismatched records. - -Optionally, {cass-migrator-short} can automatically correct discrepancies in the target cluster during validation. - -. Use the following `spark-submit` command to run a data validation job using the configuration in your properties file. -The data validation job is specified in the `--class` argument. -+ -[tabs] -====== -Local installation:: -+ --- -[source,bash,subs="+quotes,+attributes"] ----- -./spark-submit --properties-file cdm.properties \ ---conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ ---master "local[{asterisk}]" --driver-memory 25G --executor-memory 25G \ ---class com.datastax.cdm.job.DiffData cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt ----- - -Replace or modify the following, if needed: - -* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. -+ -Depending on where your properties file is stored, you might need to specify the full or relative file path. - -* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to validate and the keyspace that it belongs to. -+ -You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. - -* `--driver-memory` and `--executor-memory`: For local installations, specify the appropriate memory settings for your environment. - -* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. --- - -Spark cluster:: -+ --- -[source,bash,subs="+quotes"] ----- -./spark-submit --properties-file cdm.properties \ ---conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \ ---master "spark://**MASTER_HOST**:**PORT**" \ ---class com.datastax.cdm.job.DiffData cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt ----- - -Replace or modify the following, if needed: - -* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. -+ -Depending on where your properties file is stored, you might need to specify the full or relative file path. - -* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to validate and the keyspace that it belongs to. -+ -You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. - -* `--master`: Provide the URL of your Spark cluster. - -* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. --- -====== - -. Allow the command some time to run, and then open the log file (`logfile_name_**TIMESTAMP**.txt`) and look for `ERROR` entries. -+ -The {cass-migrator-short} validation job records differences as `ERROR` entries in the log file, listed by primary key values. -For example: -+ -[source,plaintext] ----- -23/04/06 08:43:06 ERROR DiffJobSession: Mismatch row found for key: [key3] Mismatch: Target Index: 1 Origin: valueC Target: value999) -23/04/06 08:43:06 ERROR DiffJobSession: Corrected mismatch row in target: [key3] -23/04/06 08:43:06 ERROR DiffJobSession: Missing target row found for key: [key2] -23/04/06 08:43:06 ERROR DiffJobSession: Inserted missing row in target: [key2] ----- -+ -When validating large datasets or multiple tables, you might want to extract the complete list of missing or mismatched records. -There are many ways to do this. -For example, you can grep for all `ERROR` entries in your {cass-migrator-short} log files or use the `log4j2` example provided in the {cass-migrator-repo}?tab=readme-ov-file#steps-for-data-validation[{cass-migrator-short} repository]. - -=== Run a validation job in AutoCorrect mode - -Optionally, you can run {cass-migrator-short} validation jobs in **AutoCorrect** mode, which offers the following functions: - -* `autocorrect.missing`: Add any missing records in the target with the value from the origin. - -* `autocorrect.mismatch`: Reconcile any mismatched records between the origin and target by replacing the target value with the origin value. -+ -[IMPORTANT] -==== -Timestamps have an effect on this function. - -If the `writetime` of the origin record (determined with `.writetime.names`) is before the `writetime` of the corresponding target record, then the original write won't appear in the target cluster. - -This comparative state can be challenging to troubleshoot if individual columns or cells were modified in the target cluster. -==== - -* `autocorrect.missing.counter`: By default, counter tables are not copied when missing, unless explicitly set. - -In your `cdm.properties` file, use the following properties to enable (`true`) or disable (`false`) autocorrect functions: - -[source,properties] ----- -spark.cdm.autocorrect.missing false|true -spark.cdm.autocorrect.mismatch false|true -spark.cdm.autocorrect.missing.counter false|true ----- - -The {cass-migrator-short} validation job never deletes records from either the origin or target. -Data validation only inserts or updates data on the target. - -For an initial data validation, consider disabling AutoCorrect so that you can generate a list of data discrepancies, investigate those discrepancies, and then decide whether you want to rerun the validation with AutoCorrect enabled. - -[#advanced] -== Additional {cass-migrator-short} options - -You can modify your properties file or append additional `--conf` arguments to your `spark-submit` commands to customize your {cass-migrator-short} jobs. -For example, you can do the following: - -* Check for large field guardrail violations before migrating. -* Use the `partition.min` and `partition.max` parameters to migrate or validate specific token ranges. -* Use the `track-run` feature to monitor progress and rerun a failed migration or validation job from point of failure. - -For all options, see the {cass-migrator-repo}[{cass-migrator-short} repository]. -Specifically, see the {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file]. - -== Troubleshoot {cass-migrator-short} - -.Java NoSuchMethodError -[%collapsible] -==== -If you installed Spark as a JAR file, and your Spark and Scala versions aren't compatible with your installed version of {cass-migrator-short}, {cass-migrator-short} jobs can throw exceptions such a the following: - -[source,console] ----- -Exception in thread "main" java.lang.NoSuchMethodError: 'void scala.runtime.Statics.releaseFence()' ----- - -Make sure that your Spark binary is compatible with your {cass-migrator-short} version. -If you installed an earlier version of {cass-migrator-short}, you might need to install an earlier Spark binary. -==== - -.Rerun a failed or partially completed job -[%collapsible] -==== -You can use the `track-run` feature to track the progress of a migration or validation, and then, if necessary, use the `run-id` to rerun a failed job from the last successful migration or validation point. - -For more information, see the {cass-migrator-repo}[{cass-migrator-short} repository] and the {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file]. -==== \ No newline at end of file diff --git a/modules/ROOT/partials/dsbulk-migrator-body.adoc b/modules/ROOT/partials/dsbulk-migrator-body.adoc deleted file mode 100644 index 45ea3680..00000000 --- a/modules/ROOT/partials/dsbulk-migrator-body.adoc +++ /dev/null @@ -1,642 +0,0 @@ -{dsbulk-migrator} is an extension of {dsbulk-loader}. -It is best for smaller migrations or migrations that don't require extensive data validation, aside from post-migration row counts. -You can also consider this tool for migrations where you can shard data from large tables into more manageable quantities. - -{dsbulk-migrator} extends {dsbulk-loader} with the following commands: - -* `migrate-live`: Start a live data migration using the embedded version of {dsbulk-loader} or your own {dsbulk-loader} installation. -A live migration means that the data migration starts immediately and is performed by the migrator tool through the specified {dsbulk-loader} installation. - -* `generate-script`: Generate a migration script that you can execute to perform a data migration with a your own {dsbulk-loader} installation. -This command _doesn't_ trigger the migration; it only generates the migration script that you must then execute. - -* `generate-ddl`: Read the schema from origin, and then generate CQL files to recreate it in your target {astra-db} database. - -[[prereqs-dsbulk-migrator]] -== {dsbulk-migrator} prerequisites - -* Java 11 - -* https://maven.apache.org/download.cgi[Maven] 3.9.x - -* Optional: If you don't want to use the embedded {dsbulk-loader} that is bundled with {dsbulk-migrator}, xref:dsbulk:overview:install.adoc[install {dsbulk-loader}] before installing {dsbulk-migrator}. - -== Build {dsbulk-migrator} - -. Clone the {dsbulk-migrator-repo}[{dsbulk-migrator} repository]: -+ -[source,bash] ----- -cd ~/github -git clone git@github.com:datastax/dsbulk-migrator.git -cd dsbulk-migrator ----- - -. Use Maven to build {dsbulk-migrator}: -+ -[source,bash] ----- -mvn clean package ----- - -The build produces two distributable fat jars: - -* `dsbulk-migrator-**VERSION**-embedded-driver.jar` contains an embedded Java driver. -Suitable for script generation or live migrations using an external {dsbulk-loader}. -+ -This jar isn't suitable for live migrations that use the embedded {dsbulk-loader} because no {dsbulk-loader} classes are present. - -* `dsbulk-migrator-**VERSION**-embedded-dsbulk.jar` contains an embedded {dsbulk-loader} and an embedded Java driver. -Suitable for all operations. -Much larger than the other JAR due to the presence of {dsbulk-loader} classes. - -== Test {dsbulk-migrator} - -The {dsbulk-migrator} project contains some integration tests that require https://github.com/datastax/simulacron[Simulacron]. - -. Clone and build Simulacron, as explained in the https://github.com/datastax/simulacron[Simulacron GitHub repository]. -Note the prerequisites for Simulacron, particularly for macOS. - -. Run the tests: - -[source,bash] ----- -mvn clean verify ----- - -== Run {dsbulk-migrator} - -Launch {dsbulk-migrator} with the command and options you want to run: - -[source,bash] ----- -java -jar /path/to/dsbulk-migrator.jar { migrate-live | generate-script | generate-ddl } [OPTIONS] ----- - -The role and availability of the options depends on the command you run: - -* During a live migration, the options configure {dsbulk-migrator} and establish connections to -the clusters. - -* When generating a migration script, most options become default values in the generated scripts. -However, even when generating scripts, {dsbulk-migrator} still needs to access the origin cluster to gather metadata about the tables to migrate. - -* When generating a DDL file, import options and {dsbulk-loader}-related options are ignored. -However, {dsbulk-migrator} still needs to access the origin cluster to gather metadata about the keyspaces and tables for the DDL statements. - -For more information about the commands and their options, see the following references: - -* <> -* <> -* <> - -For help and examples, see <> and <>. - -[[dsbulk-live]] -== Live migration command-line options - -The following options are available for the `migrate-live` command. -Most options have sensible default values and do not need to be specified, unless you want to override the default value. - -[cols="2,8,14"] -|=== - -| `-c` -| `--dsbulk-cmd=CMD` -| The external {dsbulk-loader} command to use. -Ignored if the embedded {dsbulk-loader} is being used. -The default is simply `dsbulk`, assuming that the command is available through the `PATH` variable contents. - -| `-d` -| `--data-dir=PATH` -| The directory where data will be exported to and imported from. -The default is a `data` subdirectory in the current working directory. -The data directory will be created if it does not exist. -Tables will be exported and imported in subdirectories of the data directory specified here. -There will be one subdirectory per keyspace in the data directory, then one subdirectory per table in each keyspace directory. - -| `-e` -| `--dsbulk-use-embedded` -| Use the embedded {dsbulk-loader} version instead of an external one. -The default is to use an external {dsbulk-loader} command. - -| -| `--export-bundle=PATH` -| The path to a secure connect bundle to connect to the origin cluster, if that cluster is a {company} {astra-db} cluster. -Options `--export-host` and `--export-bundle` are mutually exclusive. - -| -| `--export-consistency=CONSISTENCY` -| The consistency level to use when exporting data. -The default is `LOCAL_QUORUM`. - -| -| `--export-dsbulk-option=OPT=VALUE` -| An extra {dsbulk-loader} option to use when exporting. -Any valid {dsbulk-loader} option can be specified here, and it will passed as is to the {dsbulk-loader} process. -{dsbulk-loader} options, including driver options, must be passed as `--long.option.name=`. -Short options are not supported. - -| -| `--export-host=HOST[:PORT]` -| The host name or IP and, optionally, the port of a node from the origin cluster. -If the port is not specified, it will default to `9042`. -This option can be specified multiple times. -Options `--export-host` and `--export-bundle` are mutually exclusive. - -| -| `--export-max-concurrent-files=NUM\|AUTO` -| The maximum number of concurrent files to write to. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--export-max-concurrent-queries=NUM\|AUTO` -| The maximum number of concurrent queries to execute. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--export-max-records=NUM` -| The maximum number of records to export for each table. -Must be a positive number or `-1`. -The default is `-1` (export the entire table). - -| -| `--export-password` -| The password to use to authenticate against the origin cluster. -Options `--export-username` and `--export-password` must be provided together, or not at all. -Omit the parameter value to be prompted for the password interactively. - -| -| `--export-splits=NUM\|NC` -| The maximum number of token range queries to generate. -Use the `NC` syntax to specify a multiple of the number of available cores. -For example, `8C` = 8 times the number of available cores. -The default is `8C`. -This is an advanced setting; you should rarely need to modify the default value. - -| -| `--export-username=STRING` -| The username to use to authenticate against the origin cluster. -Options `--export-username` and `--export-password` must be provided together, or not at all. - -| `-h` -| `--help` -| Displays this help text. - -| -| `--import-bundle=PATH` -| The path to a {scb} to connect to a target {astra-db} cluster. -Options `--import-host` and `--import-bundle` are mutually exclusive. - -| -| `--import-consistency=CONSISTENCY` -| The consistency level to use when importing data. -The default is `LOCAL_QUORUM`. - -| -| `--import-default-timestamp=` -| The default timestamp to use when importing data. -Must be a valid instant in ISO-8601 syntax. -The default is `1970-01-01T00:00:00Z`. - -| -| `--import-dsbulk-option=OPT=VALUE` -| An extra {dsbulk-loader} option to use when importing. -Any valid {dsbulk-loader} option can be specified here, and it will passed as is to the {dsbulk-loader} process. -{dsbulk-loader} options, including driver options, must be passed as `--long.option.name=`. -Short options are not supported. - -| -| `--import-host=HOST[:PORT]` -| The host name or IP and, optionally, the port of a node on the target cluster. -If the port is not specified, it will default to `9042`. -This option can be specified multiple times. -Options `--import-host` and `--import-bundle` are mutually exclusive. - -| -| `--import-max-concurrent-files=NUM\|AUTO` -| The maximum number of concurrent files to read from. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--import-max-concurrent-queries=NUM\|AUTO` -| The maximum number of concurrent queries to execute. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--import-max-errors=NUM` -| The maximum number of failed records to tolerate when importing data. -The default is `1000`. -Failed records will appear in a `load.bad` file in the {dsbulk-loader} operation directory. - -| -| `--import-password` -| The password to use to authenticate against the target cluster. -Options `--import-username` and `--import-password` must be provided together, or not at all. -Omit the parameter value to be prompted for the password interactively. - -| -| `--import-username=STRING` -| The username to use to authenticate against the target cluster. Options `--import-username` and `--import-password` must be provided together, or not at all. - -| `-k` -| `--keyspaces=REGEX` -| A regular expression to select keyspaces to migrate. -The default is to migrate all keyspaces except system keyspaces, {dse-short}-specific keyspaces, and the OpsCenter keyspace. -Case-sensitive keyspace names must be entered in their exact case. - -| `-l` -| `--dsbulk-log-dir=PATH` -| The directory where the {dsbulk-loader} should store its logs. -The default is a `logs` subdirectory in the current working directory. -This subdirectory will be created if it does not exist. -Each {dsbulk-loader} operation will create a subdirectory in the log directory specified here. - -| -| `--max-concurrent-ops=NUM` -| The maximum number of concurrent operations (exports and imports) to carry. -The default is `1`. -Set this to higher values to allow exports and imports to occur concurrently. -For example, with a value of `2`, each table will be imported as soon as it is exported, while the next table is being exported. - -| -| `--skip-truncate-confirmation` -| Skip truncate confirmation before actually truncating tables. -Only applicable when migrating counter tables, ignored otherwise. - -| `-t` -| `--tables=REGEX` -| A regular expression to select tables to migrate. -The default is to migrate all tables in the keyspaces that were selected for migration with `--keyspaces`. -Case-sensitive table names must be entered in their exact case. - -| -| `--table-types=regular\|counter\|all` -| The table types to migrate. -The default is `all`. - -| -| `--truncate-before-export` -| Truncate tables before the export instead of after. -The default is to truncate after the export. -Only applicable when migrating counter tables, ignored otherwise. - -| `-w` -| `--dsbulk-working-dir=PATH` -| The directory where `dsbulk` should be executed. -Ignored if the embedded {dsbulk-loader} is being used. -If unspecified, it defaults to the current working directory. - -|=== - -[[dsbulk-script]] -== Script generation command-line options - -The following options are available for the `generate-script` command. -Most options have sensible default values and do not need to be specified, unless you want to override the default value. - - -[cols="2,8,14"] -|=== - -| `-c` -| `--dsbulk-cmd=CMD` -| The {dsbulk-loader} command to use. -The default is simply `dsbulk`, assuming that the command is available through the `PATH` variable contents. - -| `-d` -| `--data-dir=PATH` -| The directory where data will be exported to and imported from. -The default is a `data` subdirectory in the current working directory. -The data directory will be created if it does not exist. - -| -| `--export-bundle=PATH` -| The path to a secure connect bundle to connect to the origin cluster, if that cluster is a {company} {astra-db} cluster. -Options `--export-host` and `--export-bundle` are mutually exclusive. - -| -| `--export-consistency=CONSISTENCY` -| The consistency level to use when exporting data. -The default is `LOCAL_QUORUM`. - -| -| `--export-dsbulk-option=OPT=VALUE` -| An extra {dsbulk-loader} option to use when exporting. -Any valid {dsbulk-loader} option can be specified here, and it will passed as is to the {dsbulk-loader} process. -{dsbulk-loader} options, including driver options, must be passed as `--long.option.name=`. -Short options are not supported. - -| -| `--export-host=HOST[:PORT]` -| The host name or IP and, optionally, the port of a node from the origin cluster. -If the port is not specified, it will default to `9042`. -This option can be specified multiple times. -Options `--export-host` and `--export-bundle` are mutually exclusive. - -| -| `--export-max-concurrent-files=NUM\|AUTO` -| The maximum number of concurrent files to write to. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--export-max-concurrent-queries=NUM\|AUTO` -| The maximum number of concurrent queries to execute. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--export-max-records=NUM` -| The maximum number of records to export for each table. -Must be a positive number or `-1`. -The default is `-1` (export the entire table). - -| -| `--export-password` -| The password to use to authenticate against the origin cluster. -Options `--export-username` and `--export-password` must be provided together, or not at all. -Omit the parameter value to be prompted for the password interactively. - -| -| `--export-splits=NUM\|NC` -| The maximum number of token range queries to generate. -Use the `NC` syntax to specify a multiple of the number of available cores. -For example, `8C` = 8 times the number of available cores. -The default is `8C`. -This is an advanced setting. -You should rarely need to modify the default value. - -| -| `--export-username=STRING` -| The username to use to authenticate against the origin cluster. -Options `--export-username` and `--export-password` must be provided together, or not at all. - -| `-h` -| `--help` -| Displays this help text. - -| -| `--import-bundle=PATH` -| The path to a Secure Connect Bundle to connect to a target {astra-db} cluster. -Options `--import-host` and `--import-bundle` are mutually exclusive. - -| -| `--import-consistency=CONSISTENCY` -| The consistency level to use when importing data. -The default is `LOCAL_QUORUM`. - -| -| `--import-default-timestamp=` -| The default timestamp to use when importing data. -Must be a valid instant in ISO-8601 syntax. -The default is `1970-01-01T00:00:00Z`. - -| -| `--import-dsbulk-option=OPT=VALUE` -| An extra {dsbulk-loader} option to use when importing. -Any valid {dsbulk-loader} option can be specified here, and it will passed as is to the {dsbulk-loader} process. -{dsbulk-loader} options, including driver options, must be passed as `--long.option.name=`. -Short options are not supported. - -| -| `--import-host=HOST[:PORT]` -| The host name or IP and, optionally, the port of a node on the target cluster. -If the port is not specified, it will default to `9042`. -This option can be specified multiple times. -Options `--import-host` and `--import-bundle` are mutually exclusive. - -| -| `--import-max-concurrent-files=NUM\|AUTO` -| The maximum number of concurrent files to read from. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--import-max-concurrent-queries=NUM\|AUTO` -| The maximum number of concurrent queries to execute. -Must be a positive number or the special value `AUTO`. -The default is `AUTO`. - -| -| `--import-max-errors=NUM` -| The maximum number of failed records to tolerate when importing data. -The default is `1000`. -Failed records will appear in a `load.bad` file in the {dsbulk-loader} operation directory. - -| -| `--import-password` -| The password to use to authenticate against the target cluster. -Options `--import-username` and `--import-password` must be provided together, or not at all. -Omit the parameter value to be prompted for the password interactively. - -| -| `--import-username=STRING` -| The username to use to authenticate against the target cluster. -Options `--import-username` and `--import-password` must be provided together, or not at all. - -| `-k` -| `--keyspaces=REGEX` -| A regular expression to select keyspaces to migrate. -The default is to migrate all keyspaces except system keyspaces, {dse-short}-specific keyspaces, and the OpsCenter keyspace. -Case-sensitive keyspace names must be entered in their exact case. - -| `-l` -| `--dsbulk-log-dir=PATH` -| The directory where {dsbulk-loader} should store its logs. -The default is a `logs` subdirectory in the current working directory. -This subdirectory will be created if it does not exist. -Each {dsbulk-loader} operation will create a subdirectory in the log directory specified here. - -| `-t` -| `--tables=REGEX` -| A regular expression to select tables to migrate. -The default is to migrate all tables in the keyspaces that were selected for migration with `--keyspaces`. -Case-sensitive table names must be entered in their exact case. - -| -| `--table-types=regular\|counter\|all` -| The table types to migrate. The default is `all`. - -|=== - - -[[dsbulk-ddl]] -== DDL generation command-line options - -The following options are available for the `generate-ddl` command. -Most options have sensible default values and do not need to be specified, unless you want to override the default value. - -[cols="2,8,14"] -|=== - -| `-a` -| `--optimize-for-astra` -| Produce CQL scripts optimized for {company} {astra-db}. -{astra-db} does not allow some options in DDL statements. -Using this {dsbulk-migrator} command option, forbidden {astra-db} options will be omitted from the generated CQL files. - -| `-d` -| `--data-dir=PATH` -| The directory where data will be exported to and imported from. -The default is a `data` subdirectory in the current working directory. -The data directory will be created if it does not exist. - -| -| `--export-bundle=PATH` -| The path to a secure connect bundle to connect to the origin cluster, if that cluster is a {company} {astra-db} cluster. -Options `--export-host` and `--export-bundle` are mutually exclusive. - -| -| `--export-host=HOST[:PORT]` -| The host name or IP and, optionally, the port of a node from the origin cluster. -If the port is not specified, it will default to `9042`. -This option can be specified multiple times. -Options `--export-host` and `--export-bundle` are mutually exclusive. - -| -| `--export-password` -| The password to use to authenticate against the origin cluster. -Options `--export-username` and `--export-password` must be provided together, or not at all. -Omit the parameter value to be prompted for the password interactively. - -| -| `--export-username=STRING` -| The username to use to authenticate against the origin cluster. -Options `--export-username` and `--export-password` must be provided together, or not at all. - -| `-h` -| `--help` -| Displays this help text. - -| `-k` -| `--keyspaces=REGEX` -| A regular expression to select keyspaces to migrate. -The default is to migrate all keyspaces except system keyspaces, {dse-short}-specific keyspaces, and the OpsCenter keyspace. -Case-sensitive keyspace names must be entered in their exact case. - -| `-t` -| `--tables=REGEX` -| A regular expression to select tables to migrate. -The default is to migrate all tables in the keyspaces that were selected for migration with `--keyspaces`. -Case-sensitive table names must be entered in their exact case. - -| -| `--table-types=regular\|counter\|all` -| The table types to migrate. -The default is `all`. - -|=== - -[[dsbulk-examples]] -== {dsbulk-migrator} examples - -These examples show sample `username` and `password` values that are for demonstration purposes only. -Don't use these values in your environment. - -=== Generate a migration script - -Generate a migration script to migrate from an existing origin cluster to a target {astra-db} cluster: - -[source,bash] ----- - java -jar target/dsbulk-migrator--embedded-driver.jar migrate-live \ - --data-dir=/path/to/data/dir \ - --dsbulk-cmd=${DSBULK_ROOT}/bin/dsbulk \ - --dsbulk-log-dir=/path/to/log/dir \ - --export-host=my-origin-cluster.com \ - --export-username=user1 \ - --export-password=s3cr3t \ - --import-bundle=/path/to/bundle \ - --import-username=user1 \ - --import-password=s3cr3t ----- - -=== Live migration with an external {dsbulk-loader} installation - -Perform a live migration from an existing origin cluster to a target {astra-db} cluster using an external {dsbulk-loader} installation: - -[source,bash] ----- - java -jar target/dsbulk-migrator--embedded-driver.jar migrate-live \ - --data-dir=/path/to/data/dir \ - --dsbulk-cmd=${DSBULK_ROOT}/bin/dsbulk \ - --dsbulk-log-dir=/path/to/log/dir \ - --export-host=my-origin-cluster.com \ - --export-username=user1 \ - --export-password # password will be prompted \ - --import-bundle=/path/to/bundle \ - --import-username=user1 \ - --import-password # password will be prompted ----- - -Passwords are prompted interactively. - -=== Live migration with the embedded {dsbulk-loader} - -Perform a live migration from an existing origin cluster to a target {astra-db} cluster using the embedded {dsbulk-loader} installation: - -[source,bash] ----- - java -jar target/dsbulk-migrator--embedded-dsbulk.jar migrate-live \ - --data-dir=/path/to/data/dir \ - --dsbulk-use-embedded \ - --dsbulk-log-dir=/path/to/log/dir \ - --export-host=my-origin-cluster.com \ - --export-username=user1 \ - --export-password # password will be prompted \ - --export-dsbulk-option "--connector.csv.maxCharsPerColumn=65536" \ - --export-dsbulk-option "--executor.maxPerSecond=1000" \ - --import-bundle=/path/to/bundle \ - --import-username=user1 \ - --import-password # password will be prompted \ - --import-dsbulk-option "--connector.csv.maxCharsPerColumn=65536" \ - --import-dsbulk-option "--executor.maxPerSecond=1000" ----- - -Passwords are prompted interactively. - -The preceding example passes additional {dsbulk-loader} options. - -The preceding example requires the `dsbulk-migrator--embedded-dsbulk.jar` fat jar. -Otherwise, an error is raised because no embedded {dsbulk-loader} can be found. - -=== Generate DDL files to recreate the origin schema on the target cluster - -Generate DDL files to recreate the origin schema on a target {astra-db} cluster: - -[source,bash] ----- - java -jar target/dsbulk-migrator--embedded-driver.jar generate-ddl \ - --data-dir=/path/to/data/dir \ - --export-host=my-origin-cluster.com \ - --export-username=user1 \ - --export-password=s3cr3t \ - --optimize-for-astra ----- - -[[getting-help-with-dsbulk-migrator]] -== Get help with {dsbulk-migrator} - -Use the following command to display the available {dsbulk-migrator} commands: - -[source,bash] ----- -java -jar /path/to/dsbulk-migrator-embedded-dsbulk.jar --help ----- - -For individual command help and each one's options: - -[source,bash] ----- -java -jar /path/to/dsbulk-migrator-embedded-dsbulk.jar COMMAND --help ----- - -== See also - -* xref:dsbulk:overview:dsbulk-about.adoc[{dsbulk-loader}] -* xref:dsbulk:reference:dsbulk-cmd.adoc#escape-and-quote-command-line-arguments[Escape and quote {dsbulk-loader} command line arguments] \ No newline at end of file diff --git a/modules/sideloader/pages/sideloader-overview.adoc b/modules/sideloader/pages/sideloader-overview.adoc index 1b6cd07b..9765c11d 100644 --- a/modules/sideloader/pages/sideloader-overview.adoc +++ b/modules/sideloader/pages/sideloader-overview.adoc @@ -115,7 +115,10 @@ include::sideloader:partial$validate.adoc[] == Use {sstable-sideloader} with {product-proxy} -include::sideloader:partial$sideloader-zdm.adoc[] +If you need to migrate a live database, you can use {sstable-sideloader} instead of {dsbulk-migrator} or {cass-migrator} during of xref:ROOT:migrate-and-validate-data.adoc[Phase 2 of {product}]. + +.Use {sstable-sideloader} with {product-proxy} +svg::sideloader:astra-migration-toolkit.svg[] == Next steps diff --git a/modules/sideloader/pages/sideloader-zdm.adoc b/modules/sideloader/pages/sideloader-zdm.adoc deleted file mode 100644 index 1111f833..00000000 --- a/modules/sideloader/pages/sideloader-zdm.adoc +++ /dev/null @@ -1,25 +0,0 @@ -= Use {sstable-sideloader} with {product-proxy} -:navtitle: Use {sstable-sideloader} -:description: {sstable-sideloader} is a service running in {astra-db} that imports data from snapshots of your existing {cass-short}-based cluster. - -{description} -This tool is exclusively for migrations that move data to {astra-db}. - -Because it imports data directly, {sstable-sideloader} can offer several advantages over CQL-based tools like {dsbulk-migrator} and {cass-migrator}, including faster, more cost-effective data loading, and minimal performance impacts on your origin cluster and target database. - -== Migrate data with {sstable-sideloader} - -To migrate data with {sstable-sideloader}, you use `nodetool`, a cloud provider's CLI, and the {astra} {devops-api}: - -* *`nodetool`*: Create snapshots of your existing {dse-short}, {hcd-short}, open-source {cass-short} cluster. -For compatible origin clusters, see xref:ROOT:astra-migration-paths.adoc[]. -* *Cloud provider CLI*: Upload snapshots to a dedicated cloud storage bucket for your migration. -* *{astra} {devops-api}*: Run the {sstable-sideloader} commands to write the data from cloud storage to your {astra-db} database. - -For more information and instructions, see xref:sideloader:sideloader-overview.adoc[]. - -== Use {sstable-sideloader} with {product-proxy} - -You can use {sstable-sideloader} alone or with {product-proxy}. - -include::sideloader:partial$sideloader-zdm.adoc[] \ No newline at end of file diff --git a/modules/sideloader/partials/sideloader-zdm.adoc b/modules/sideloader/partials/sideloader-zdm.adoc deleted file mode 100644 index bf4fd583..00000000 --- a/modules/sideloader/partials/sideloader-zdm.adoc +++ /dev/null @@ -1,4 +0,0 @@ -If you need to migrate a live database, you can use {sstable-sideloader} instead of {dsbulk-migrator} or {cass-migrator} during of xref:ROOT:migrate-and-validate-data.adoc[Phase 2 of {product}]. - -.Use {sstable-sideloader} with {product-proxy} -svg::sideloader:astra-migration-toolkit.svg[] \ No newline at end of file