From 320416c8ef84995f4f30d298e713b670a568cbe6 Mon Sep 17 00:00:00 2001 From: Lokesh Jain Date: Wed, 13 Sep 2023 15:05:43 +0530 Subject: [PATCH 1/2] [DOCS] Add documentation around downgrading hudi table --- website/docs/deployment.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/website/docs/deployment.md b/website/docs/deployment.md index 088f99834ad2a..b965a4e715aae 100644 --- a/website/docs/deployment.md +++ b/website/docs/deployment.md @@ -10,7 +10,7 @@ This section provides all the help you need to deploy and operate Hudi tables at Specifically, we will cover the following aspects. - [Deployment Model](#deploying) : How various Hudi components are deployed and managed. - - [Upgrading Versions](#upgrading) : Picking up new releases of Hudi, guidelines and general best-practices. + - [Upgrading/Downgrading Versions](#upgrading--downgrading) : Picking up new releases of Hudi, guidelines and general best-practices. - [Migrating to Hudi](#migrating) : How to migrate your existing tables to Apache Hudi. ## Deploying @@ -151,9 +151,10 @@ inputDF.write() .save(basePath); ``` -## Upgrading +## Upgrading / Downgrading -New Hudi releases are listed on the [releases page](/releases/download), with detailed notes which list all the changes, with highlights in each release. +New Hudi releases are listed on the [releases page](/releases/download), with detailed notes which list all the changes, with highlights in each release. +Upgrade/Downgrade can be performed using Hudi CLI. For more details please refer documentation [here](cli#upgrade-and-downgrade-table). At the end of the day, Hudi is a storage system and with that comes a lot of responsibilities, which we take seriously. As general guidelines, @@ -161,9 +162,9 @@ As general guidelines, - We strive to keep all changes backwards compatible (i.e new code can read old data/timeline files) and when we cannot, we will provide upgrade/downgrade tools via the CLI - We cannot always guarantee forward compatibility (i.e old code being able to read data/timeline files written by a greater version). This is generally the norm, since no new features can be built otherwise. However any large such changes, will be turned off by default, for smooth transition to newer release. After a few releases and once enough users deem the feature stable in production, we will flip the defaults in a subsequent release. - - Always upgrade the query bundles (mr-bundle, presto-bundle, spark-bundle) first and then upgrade the writers (Hudi Streamer, spark jobs using datasource). This often provides the best experience and it's easy to fix + - Always upgrade/downgrade the query bundles (mr-bundle, presto-bundle, spark-bundle) first and then upgrade/downgrade the writers (Hudi Streamer, spark jobs using datasource). This often provides the best experience and it's easy to fix any issues by rolling forward/back the writer code (which typically you might have more control over) - - With large, feature rich releases we recommend migrating slowly, by first testing in staging environments and running your own tests. Upgrading Hudi is no different than upgrading any database system. + - With large, feature rich releases we recommend migrating slowly, by first testing in staging environments and running your own tests. Upgrading/Downgrading Hudi is no different than upgrading/downgrading any database system. Note that release notes can override this information with specific instructions, applicable on case-by-case basis. From 85af1293688cd98ff733c25dc45c132bcb7ae1ec Mon Sep 17 00:00:00 2001 From: Lokesh Jain Date: Thu, 14 Sep 2023 18:22:56 +0530 Subject: [PATCH 2/2] Address review comments --- website/docs/deployment.md | 93 +++++++++++++++++++++++++++++++++++--- 1 file changed, 87 insertions(+), 6 deletions(-) diff --git a/website/docs/deployment.md b/website/docs/deployment.md index b965a4e715aae..52f502ffb79fe 100644 --- a/website/docs/deployment.md +++ b/website/docs/deployment.md @@ -10,7 +10,8 @@ This section provides all the help you need to deploy and operate Hudi tables at Specifically, we will cover the following aspects. - [Deployment Model](#deploying) : How various Hudi components are deployed and managed. - - [Upgrading/Downgrading Versions](#upgrading--downgrading) : Picking up new releases of Hudi, guidelines and general best-practices. + - [Upgrading Versions](#upgrading) : Picking up new releases of Hudi, guidelines and general best-practices. + - [Downgrading Versions](#downgrading) : Reverting back to an older version of Hudi - [Migrating to Hudi](#migrating) : How to migrate your existing tables to Apache Hudi. ## Deploying @@ -151,10 +152,9 @@ inputDF.write() .save(basePath); ``` -## Upgrading / Downgrading +## Upgrading -New Hudi releases are listed on the [releases page](/releases/download), with detailed notes which list all the changes, with highlights in each release. -Upgrade/Downgrade can be performed using Hudi CLI. For more details please refer documentation [here](cli#upgrade-and-downgrade-table). +New Hudi releases are listed on the [releases page](/releases/download), with detailed notes which list all the changes, with highlights in each release. At the end of the day, Hudi is a storage system and with that comes a lot of responsibilities, which we take seriously. As general guidelines, @@ -162,12 +162,93 @@ As general guidelines, - We strive to keep all changes backwards compatible (i.e new code can read old data/timeline files) and when we cannot, we will provide upgrade/downgrade tools via the CLI - We cannot always guarantee forward compatibility (i.e old code being able to read data/timeline files written by a greater version). This is generally the norm, since no new features can be built otherwise. However any large such changes, will be turned off by default, for smooth transition to newer release. After a few releases and once enough users deem the feature stable in production, we will flip the defaults in a subsequent release. - - Always upgrade/downgrade the query bundles (mr-bundle, presto-bundle, spark-bundle) first and then upgrade/downgrade the writers (Hudi Streamer, spark jobs using datasource). This often provides the best experience and it's easy to fix + - Always upgrade the query bundles (mr-bundle, presto-bundle, spark-bundle) first and then upgrade the writers (Hudi Streamer, spark jobs using datasource). This often provides the best experience and it's easy to fix any issues by rolling forward/back the writer code (which typically you might have more control over) - - With large, feature rich releases we recommend migrating slowly, by first testing in staging environments and running your own tests. Upgrading/Downgrading Hudi is no different than upgrading/downgrading any database system. + - With large, feature rich releases we recommend migrating slowly, by first testing in staging environments and running your own tests. Upgrading Hudi is no different than upgrading any database system. Note that release notes can override this information with specific instructions, applicable on case-by-case basis. +## Downgrading + +Upgrade is automatic whenever a new Hudi version is used whereas downgrade is a manual step. We need to use the Hudi +CLI to downgrade a table from a higher version to lower version. Let's consider an example where we create a table using +0.12.0, upgrade it to 0.13.0 and then downgrade it via Hudi CLI. + +Launch spark shell with Hudi 0.11.0 version. +```shell +spark-shell \ + --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0 \ + --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \ + --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog' \ + --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' +``` + +Create a hudi table by using the scala script below. +```scala +import org.apache.hudi.QuickstartUtils._ +import scala.collection.JavaConversions._ +import org.apache.spark.sql.SaveMode._ +import org.apache.hudi.DataSourceReadOptions._ +import org.apache.hudi.DataSourceWriteOptions._ +import org.apache.hudi.config.HoodieWriteConfig._ +import org.apache.hudi.common.model.HoodieRecord +import org.apache.hudi.common.table.timeline.HoodieTimeline +import org.apache.hudi.common.fs.FSUtils +import org.apache.hudi.HoodieDataSourceHelpers + +val dataGen = new DataGenerator +val tableType = MOR_TABLE_TYPE_OPT_VAL +val basePath = "file:///tmp/hudi_table" +val tableName = "hudi_table" + +val inserts = convertToStringList(dataGen.generateInserts(100)).toList +val insertDf = spark.read.json(spark.sparkContext.parallelize(inserts, 2)) +insertDf.write.format("hudi"). + options(getQuickstartWriteConfigs). + option(PRECOMBINE_FIELD_OPT_KEY, "ts"). + option(RECORDKEY_FIELD_OPT_KEY, "uuid"). + option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath"). + option(TABLE_NAME, tableName). + option(OPERATION.key(), INSERT_OPERATION_OPT_VAL). + mode(Append). + save(basePath) +``` + +You will see an entry for table version in hoodie.properties which states the table version is 4. +```shell +bash$ cat /tmp/hudi_table/.hoodie/hoodie.properties | grep hoodie.table.version +hoodie.table.version=4 +``` + +Launch a new spark shell using version 0.13.0 and append to the same table using the script above. Note the upgrade +happens automatically with the new version. +```shell +spark-shell \ + --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.13.1 \ + --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \ + --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog' \ + --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' +``` + +After upgrade, the table version is updated to 5. +```shell +bash$ cat /tmp/hudi_table/.hoodie/hoodie.properties | grep hoodie.table.version +hoodie.table.version=5 +``` + +Lets try downgrading the table back to version 4. For downgrading we will need to use Hudi CLI and execute downgrade. +For more details on downgrade, please refer documentation [here](cli#upgrade-and-downgrade-table). +```shell +connect --path /tmp/hudi_table +downgrade table --toVersion 4 +``` + +After downgrade, the table version is updated to 4. +```shell +bash$ cat /tmp/hudi_table/.hoodie/hoodie.properties | grep hoodie.table.version +hoodie.table.version=4 +``` + ## Migrating Currently migrating to Hudi can be done using two approaches