[HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default … #6489

paul8263 · 2022-08-25T02:38:04Z

…value for show fsview all pathRegex parameter.

Change Logs

In order to fix HUDI-4485, we bumped spring shell to 2.1.1 and updated the default value for show fsview all pathRegex parameter.

Impact

Public API and user-facing features are not affected. But it may have performance impact.

Risk level: medium

Updated the unit test and all hudi-cli tests can pass.

Also its functionality has been tested in the real-world environment.

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

paul8263 · 2022-08-26T06:43:58Z

The unit test crashed on hudi_utilities_2.11 as insufficient heap memory. Plan to increase the limit to 4G.

paul8263 · 2022-08-29T03:44:24Z

Hi @nsivabalan and @codope ,

CI report got a failure:

https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=11000&view=logs&j=3b6e910d-b98f-5de6-b9cb-1e5ff571f5de&t=30b5aae4-0ea0-5566-42d0-febf71a7061a&l=25048

But hudi-cli unit tests can run successfully on my local linux machine. It is a strange problem.

paul8263 · 2022-09-02T00:42:58Z

@hudi-bot run azure

paul8263 · 2022-09-02T10:50:00Z

Hi community,

After testing some compaction commands I found that there was a problem with SparkUtil::initLauncher. Spring shell 2.x requires Spring boot but the trick is, spring boot maven plugin repackage everything stored in src into /BOOT-INF/classes inside the jar, not in the root path of the jar. As a result SparkLauncher cannot find the main class. Currently I am working on how to solve this packaging problem.

paul8263 · 2022-09-05T02:24:44Z

Hi community,

After testing some compaction commands I found that there was a problem with SparkUtil::initLauncher. Spring shell 2.x requires Spring boot but the trick is, spring boot maven plugin repackage everything stored in src into /BOOT-INF/classes inside the jar, not in the root path of the jar. As a result SparkLauncher cannot find the main class. Currently I am working on how to solve this packaging problem.

The hudi-cli packaging and SparkUtil issue has been fixed.

paul8263 · 2022-09-07T01:54:59Z

Hi @codope and @yihua ,
Errors of hudi-integ-test are almost cleared. The only one left is:

org.apache.hudi.integ.command.ITTestHoodieSyncCommand.testValidateSync(ITTestHoodieSyncCommand.java:56)

https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=11183&view=logs&j=3b6e910d-b98f-5de6-b9cb-1e5ff571f5de&t=30b5aae4-0ea0-5566-42d0-febf71a7061a&l=146906

Is there a way to view the detailed error log in the docker container via Azure?

paul8263 · 2022-09-08T00:35:01Z

Hi @codope and @yihua , Errors of hudi-integ-test are almost cleared. The only one left is:

org.apache.hudi.integ.command.ITTestHoodieSyncCommand.testValidateSync(ITTestHoodieSyncCommand.java:56)

https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=11183&view=logs&j=3b6e910d-b98f-5de6-b9cb-1e5ff571f5de&t=30b5aae4-0ea0-5566-42d0-febf71a7061a&l=146906

Is there a way to view the detailed error log in the docker container via Azure?

Finally all test failures has been resolved.

paul8263 · 2022-09-09T05:48:09Z

@hudi-bot run azure

paul8263 · 2022-09-09T06:09:49Z

Hi @yihua and @codope ,

How could I rerun the CI? It seems that commenting the bot command takes no effect.

The commit 3ae4fb8 passed the CI(https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=11215&view=logs&j=3b6e910d-b98f-5de6-b9cb-1e5ff571f5de) but got an error of class not found exception in the final Github check stages. I added a dependency but suprisingly CI failed because of a Time out exception(https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=11258&view=logs&j=b1544eb9-7ff1-5db9-0187-3e05abf459bc&t=0ec7e803-cfc6-5180-f2a9-ea971f54ee54&l=9346). The error has nothing to do with the latest change.

codope

@paul8263 Thanks for working on this. The upgrade is much needed.
I assume you woud have already tested locally. Did you also run the cli commands for Hudi table on cloud (e.g. S3)?

codope · 2022-09-09T13:22:21Z

docker/demo/compaction-bootstrap.commands

-#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-#  See the License for the specific language governing permissions and
-# limitations under the License.
+//  Licensed to the Apache Software Foundation (ASF) under one


Why is this required? Please revert this change if not necessary.

The reason is Spring Shell 2.1.1 only treats lines starting with // as comments, while the original Spring Shell uses leading #. To avoid command parsing error we have to change this.

codope · 2022-09-09T13:22:41Z

docker/demo/compaction.commands

-#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-#  See the License for the specific language governing permissions and
-# limitations under the License.
+//  Licensed to the Apache Software Foundation (ASF) under one


same here and other files. revert if not needed.

Same as above.

codope · 2022-09-09T13:24:29Z

pom.xml

@@ -182,7 +182,7 @@
    <spark.bundle.hive.shade.prefix/>
    <utilities.bundle.hive.scope>provided</utilities.bundle.hive.scope>
    <utilities.bundle.hive.shade.prefix/>
-    <argLine>-Xmx2g</argLine>
+    <argLine>-Xms4g -Xmx4g</argLine>


why are we increasing the jvm allocation?

I will restore this config and try to run again.

Previously it may lead JVM for maven surefire to crash with no specific reason. Incresing the JVM heap memory can alleviate this.

hudi-cli/pom.xml

hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkMain.java

hudi-cli/pom.xml

hudi-cli/src/main/java/org/apache/hudi/cli/commands/UpgradeOrDowngradeCommand.java

hudi-examples/hudi-examples-flink/pom.xml

paul8263 · 2022-09-13T07:04:18Z

@paul8263 Thanks for working on this. The upgrade is much needed. I assume you woud have already tested locally. Did you also run the cli commands for Hudi table on cloud (e.g. S3)?

Hi @codope ,
Thank you very much for your suggestion. Sorry that I don't have the cloud environment and I did not test on it.

We might need someone to help test on cloud.

yihua · 2022-09-14T01:46:35Z

@rahil-c Could you review this PR as well? It's related to Hudi CLI.

rahil-c · 2022-09-15T17:25:28Z

@rahil-c Could you review this PR as well? It's related to Hudi CLI.

will take a look as well

rahil-c · 2022-09-16T00:30:06Z

hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkMain.java

@@ -86,7 +87,7 @@
 */
 public class SparkMain {

-  private static final Logger LOG = Logger.getLogger(SparkMain.class);
+  private static final Logger LOG = LogManager.getLogger(SparkMain.class);


Just want to know why we have to change Logger to LogManager here (and also in other places) ?

Hi @rahil-c ,
Logger belongs to log4j-1.2-api while LogManager is log4j 2.x API. Previously I changed all logs to slf4j but now restored all of them to log4j2.

paul8263 · 2022-09-16T02:09:47Z

Pushed to solve the conflicts.

codope

LGTM.
I ran the CLI built with this patch. I only notice that the logging is very verbose (INFO level, see an example). Can we fix the logging level?

…alue for show fsview all pathRegex parameter.

paul8263 · 2022-09-19T03:11:46Z

LGTM. I ran the CLI built with this patch. I only notice that the logging is very verbose (INFO level, see an example). Can we fix the logging level?

Hi @codope ，
Thank you for your suggestion. I updated the log4j2.properties and limited the logging level of packages outside of Hudi and Spark.
Also I corrected org/apache/hudi/cli/utils/InputStreamConsumer.java, by changing the logger from java.utils.logger to log4j2 logger.

hudi-bot · 2022-09-19T06:22:06Z

CI report:

7ea965d Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

codope

Thanks for addressing the comments. Looks good to me now.

Bumped spring shell to 2.1.1 and updated the default value for show fsview all `pathRegex` parameter.

xushiyan · 2022-10-04T08:16:09Z

pom.xml

@@ -526,6 +528,7 @@
              <exclude>**/target/**</exclude>
              <exclude>**/generated-sources/**</exclude>
              <exclude>.github/**</exclude>
+              <exclude>**/banner.txt</exclude>


this causes compliance issue with release, which requires source files having license header

hot fix in #6865

Bumped spring shell to 2.1.1 and updated the default value for show fsview all `pathRegex` parameter.

Bumped spring shell to 2.1.1 and updated the default value for show fsview all `pathRegex` parameter. (cherry picked from commit c0eae6d)

Bumped spring shell to 2.1.1 and updated the default value for show fsview all `pathRegex` parameter.

* [HUDI-4354] Add --force-empty-sync flag to deltastreamer (apache#6027) * [HUDI-4601] Read error from MOR table after compaction with timestamp partitioning (apache#6365) * read error from mor after compaction Co-authored-by: 吴文池 <wuwenchi@deepexi.com> * [MINOR] Update DOAP with 0.12.0 Release (apache#6413) * [HUDI-4529] Tweak some default config options for flink (apache#6287) * [HUDI-4632] Remove the force active property for flink1.14 profile (apache#6415) * [HUDI-4551] Tweak the default parallelism of flink pipeline to execution env parallelism (apache#6312) * [MINOR] Improve code style of CLI Command classes (apache#6427) * [HUDI-3625] Claim RFC-60 for Federated Storage Layer (apache#6440) * [HUDI-4616] Adding `PulsarSource` to `DeltaStreamer` to support ingesting from Apache Pulsar (apache#6386) - Adding PulsarSource to DeltaStreamer to support ingesting from Apache Pulsar. - Current implementation of PulsarSource is relying on "pulsar-spark-connector" to ingest using Spark instead of building similar pipeline from scratch. * [HUDI-3579] Add timeline commands in hudi-cli (apache#5139) * [HUDI-4638] Rename payload clazz and preCombine field options for flink sql (apache#6434) * Revert "[HUDI-4632] Remove the force active property for flink1.14 profile (apache#6415)" (apache#6449) This reverts commit 9055b2f. * [HUDI-4643] MergeInto syntax WHEN MATCHED is optional but must be set (apache#6443) * [HUDI-4644] Change default flink profile to 1.15.x (apache#6445) * [HUDI-4678] Claim RFC-61 for Snapshot view management (apache#6461) Co-authored-by: jian.feng <jian.feng@shopee.com> * [HUDI-4676] infer cleaner policy when write concurrency mode is OCC (apache#6459) * [HUDI-4676] infer cleaner policy when write concurrency mode is OCC Co-authored-by: jian.feng <jian.feng@shopee.com> * [HUDI-4683] Use enum class value for default value in flink options (apache#6453) * [HUDI-4584] Cleaning up Spark utilities (apache#6351) Cleans up Spark utilities and removes duplication * [HUDI-4686] Flip option 'write.ignore.failed' to default false (apache#6467) Also fix the flaky test * [HUDI-4515] Fix savepoints will be cleaned in keeping latest versions policy (apache#6267) * [HUDI-4637] Release thread in RateLimiter doesn't been terminated (apache#6433) * [HUDI-4698] Rename the package 'org.apache.flink.table.data' to avoid conflicts with flink table core (apache#6481) * HUDI-4687 add show_invalid_parquet procedure (apache#6480) Co-authored-by: zhanshaoxiong <shaoxiong0001@@gmail.com> * [HUDI-4584] Fixing `SQLConf` not being propagated to executor (apache#6352) Fixes `HoodieSparkUtils.createRDD` to make sure `SQLConf` is properly propagated to the executor (required by `AvroSerializer`) * [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies (apache#6170) * [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false (apache#6450) * [HUDI-4713] Fix flaky ITTestHoodieDataSource#testAppendWrite (apache#6490) * [HUDI-4696] Fix flaky TestHoodieCombineHiveInputFormat (apache#6494) * Revert "[HUDI-3669] Add a remote request retry mechanism for 'Remotehoodietablefiles… (apache#5884)" (apache#6501) This reverts commit 660177b. * [Stacked on 6386] Fixing `DebeziumSource` to properly commit offsets; (apache#6416) * [HUDI-4399][RFC-57] Protobuf support in DeltaStreamer (apache#6111) * [HUDI-4703] use the historical schema to response time travel query (apache#6499) * [HUDI-4703] use the historical schema to response time travel query * [HUDI-4549] Remove avro from hudi-hive-sync-bundle and hudi-aws-bundle (apache#6472) * Remove avro shading from hudi-hive-sync-bundle and hudi-aws-bundle. Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-4482] remove guava and use caffeine instead for cache (apache#6240) * [HUDI-4483] Fix checkstyle in integ-test module (apache#6523) * [HUDI-4340] fix not parsable text DateTimeParseException by addng a method parseDateFromInstantTimeSafely for parsing timestamp when output metrics (apache#6000) * [DOCS] Add docs about javax.security.auth.login.LoginException when starting Hudi Sink Connector (apache#6255) * [HUDI-4327] Fixing flaky deltastreamer test (testCleanerDeleteReplacedDataWithArchive) (apache#6533) * [HUDI-4730] Fix batch job cannot clean old commits files (apache#6515) * [HUDI-4370] Fix batch job cannot clean old commits files Co-authored-by: jian.feng <jian.feng@shopee.com> * [HUDI-4740] Add metadata fields for hive catalog #createTable (apache#6541) * [HUDI-4695] Fixing flaky TestInlineCompaction#testCompactionRetryOnFailureBasedOnTime (apache#6534) * [HUDI-4193] change protoc version to unblock hudi compilation on m1 mac (apache#6535) * [HUDI-4438] Fix flaky TestCopyOnWriteActionExecutor#testPartitionMetafileFormat (apache#6546) * [MINOR] Fix typo in HoodieArchivalConfig (apache#6542) * [HUDI-4582] Support batch synchronization of partition to HMS to avoid timeout (apache#6347) Co-authored-by: xxhua <xxhua@freewheel.tv> * [HUDI-4742] Fix AWS Glue partition's location is wrong when updatePartition (apache#6545) Co-authored-by: xxhua <xxhua@freewheel.tv> * [HUDI-4418] Add support for ProtoKafkaSource (apache#6135) - Adds PROTO to Source.SourceType enum. - Handles PROTO type in SourceFormatAdapter by converting to Avro from proto Message objects. Conversion to Row goes Proto -> Avro -> Row currently. - Added ProtoClassBasedSchemaProvider to generate schemas for a proto class that is currently on the classpath. - Added ProtoKafkaSource which parses byte[] into a class that is on the path. - Added ProtoConversionUtil which exposes methods for creating schemas and translating from Proto messages to Avro GenericRecords. - Added KafkaSource which provides a base class for the other Kafka sources to use. * [HUDI-4642] Adding support to hudi-cli to repair deprecated partition (apache#6438) * [HUDI-4751] Fix owner instants for transaction manager api callers (apache#6549) * [HUDI-4739] Wrong value returned when key's length equals 1 (apache#6539) * extracts key fields Co-authored-by: 吴文池 <wuwenchi@deepexi.com> * [HUDI-4528] Add diff tool to compare commit metadata (apache#6485) * Add diff tool to compare commit metadata * Add partition level info to commits and compaction command * Partition support for compaction archived timeline * Add diff command test * [HUDI-4648] Support rename partition through CLI (apache#6569) * [HUDI-4775] Fixing incremental source for MOR table (apache#6587) * Fixing incremental source for MOR table * Remove unused import Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> * [HUDI-4694] Print testcase running time for CI jobs (apache#6586) * [RFC] Claim RFC-62 for Diagnostic Reporter (apache#6599) Co-authored-by: yuezhang <yuezhang@freewheel.tv> * [minor] following HUDI-4739, fix the extraction for simple record keys (apache#6594) * [HUDI-4619] Add a remote request retry mechanism for 'Remotehoodietablefilesystemview'. (apache#6393) * [HUDI-4720] Fix HoodieInternalRow return wrong num of fields when source not contains meta fields (apache#6500) Co-authored-by: wangzixuan.wzxuan <wangzixuan.wzxuan@bytedance.com> * [HUDI-4389] Make HoodieStreamingSink idempotent (apache#6098) * Support checkpoint and idempotent writes in HoodieStreamingSink - Use batchId as the checkpoint key and add to commit metadata - Support multi-writer for checkpoint data model * Walk back previous commits until checkpoint is found * Handle delete operation and fix test * [MINOR] Remove redundant braces (apache#6604) * [HUDI-4618] Separate log word for CommitUitls class (apache#6392) * [HUDI-4776] Fix merge into use unresolved assignment (apache#6589) * [HUDI-4795] Fix KryoException when bulk insert into a not bucket index hudi table Co-authored-by: hbg <bingeng.huang@shopee.com> * [HUDI-4615] Return checkpoint as null for empty data from events queue. (apache#6387) Co-authored-by: sivabalan <n.siva.b@gmail.com> * [HUDI-4782] Support TIMESTAMP_LTZ type for flink (apache#6607) * [HUDI-4731] Shutdown CloudWatch reporter when query completes (apache#6468) * [HUDI-4793] Fixing ScalaTest tests to properly respect Log4j2 configs (apache#6617) * [HUDI-4766] Strengthen flink clustering job (apache#6566) * Allow rollbacks if required during clustering * Allow size to be defined in Long instead of Integer * Fix bug where clustering will produce files of 120MB in the same filegroup * Added clean task * Fix scheduling config to be consistent with that with compaction * Fix filter mode getting ignored issue * Add --instant-time parameter * Prevent no execute() calls exception from being thrown (clustering & compaction) * [HUDI-4797] fix merge into table for source table with different column order (apache#6620) Co-authored-by: zhanshaoxiong <shaoxiong0001@@gmail.com> * [MINOR] Typo fix for kryo in flink-bundle (apache#6639) * [HUDI-4811] Fix the checkstyle of hudi flink (apache#6633) * [HUDI-4465] Optimizing file-listing sequence of Metadata Table (apache#6016) Optimizes file-listing sequence of the Metadata Table to make sure it's on par or better than FS-based file-listing Change log: - Cleaned up avoidable instantiations of Hadoop's Path - Replaced new Path w/ createUnsafePath where possible - Cached TimestampFormatter, DateFormatter for timezone - Avoid loading defaults in Hadoop conf when init-ing HFile reader - Avoid re-instantiating BaseTableMetadata twice w/in BaseHoodieTableFileIndex - Avoid looking up FileSystem for every partition when listing partitioned table, instead do it just once * [HUDI-4807] Use base table instant for metadata initialization (apache#6629) * [HUDI-3453] Fix HoodieBackedTableMetadata concurrent reading issue (apache#5091) Co-authored-by: yuezhang <yuezhang@freewheel.tv> Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> * [HUDI-4518] Add unit test for reentrant lock in diff lockProvider (apache#6624) * [HUDI-4810] Fixing Hudi bundles requiring log4j2 on the classpath (apache#6631) Downgrading all of the log4j2 deps to "provided" scope, since these are not API modules (as advertised), but rather fully-fledged implementations adding dependency on other modules (like log4j2 in the case of "log4j-1.2-api") * [HUDI-4826] Update RemoteHoodieTableFileSystemView to allow base path in UTF-8 (apache#6544) * [HUDI-4763] Allow hoodie read client to choose index (apache#6506) Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [DOCS] Fix Slack invite link in README.md (apache#6648) * [HUDI-3558] Consistent bucket index: bucket resizing (split&merge) & concurrent write during resizing (apache#4958) RFC-42 implementation - Implement bucket resizing for consistent hashing index. - Support concurrent write during bucket resizing. This change added tests and can be verified as follows: - The test of the consistent bucket index is enhanced to include the case of bucket resizing. - Tests of different bucket resizing cases. - Tests of concurrent resizing, and concurrent writes during resizing. * [MINOR] Add dev setup and spark 3.3 profile to readme (apache#6656) * [HUDI-4831] Fix AWSDmsAvroPayload#getInsertValue,combineAndGetUpdateValue to invoke correct api (apache#6637) Co-authored-by: Rahil Chertara <rchertar@amazon.com> * [HUDI-4806] Use Avro version from the root pom for Flink bundle (apache#6628) Co-authored-by: Shawn Chang <yxchang@amazon.com> * [HUDI-4833] Add Postgres Schema Name to Postgres Debezium Source (apache#6616) * [HUDI-4825] Remove redundant fields in serialized commit metadata in JSON (apache#6646) * [MINOR] Insert should call validateInsertSchema in HoodieFlinkWriteClient (apache#5919) Co-authored-by: 徐帅 <xushuai@MacBook-Pro-6.local> * [HUDI-3879] Suppress exceptions that are not fatal in HoodieMetadataTableValidator (apache#5344) Co-authored-by: yuezhang <yuezhang@freewheel.tv> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-3998] Fix getCommitsSinceLastCleaning failed when async cleaning (apache#5478) - The last completed commit timestamp is used to calculate how many commit have been completed since the last clean. we might need to save this w/ clean plan so that next time when we trigger clean, we can start calculating from that. * [HUDI-3994] - Added support for initializing DeltaStreamer without a defined Spark Master (apache#5630) That will enable the usage of DeltaStreamer on environments such as AWS Glue or other serverless environments where the spark master is inherited and we do not have access to it. Co-authored-by: Angel Conde Manjon <acmanjon@amazon.com> * [HUDI-4628] Hudi-flink support GLOBAL_BLOOM，GLOBAL_SIMPLE，BUCKET index type (apache#6406) Co-authored-by: xiaoxingstack <xiaoxingstack@didiglobal.com> * [HUDI-4814] Schedules new clustering plan based on latest clustering instant (apache#6574) * Keep a clustering running at the same time * Simplify filtering logic Co-authored-by: dongsj <dongsj@asiainfo.com> * [HUDI-4817] Delete markers after full-record bootstrap operation (apache#6667) * [HUDI-4691] Cleaning up duplicated classes in Spark 3.3 module (apache#6550) As part of adding support for Spark 3.3 in Hudi 0.12, a lot of the logic from Spark 3.2 module has been simply copied over. This PR is rectifying that by: 1. Creating new module "hudi-spark3.2plus-common" (that is shared across Spark 3.2 and Spark 3.3) 2. Moving shared components under "hudi-spark3.2plus-common" * [HUDI-4752] Add dedup support for MOR table in cli (apache#6608) * [HUDI-4837] Stop sleeping where it is not necessary after the success (apache#6270) Co-authored-by: Volodymyr Burenin <volodymyr.burenin@cloudkitchens.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-4843] Delete the useless timer in BaseRollbackActionExecutor (apache#6671) Co-authored-by: 吴文池 <wuwenchi@deepexi.com> * [HUDI-4780] hoodie.logfile.max.size It does not take effect, causing the log file to be too large (apache#6602) * hoodie.logfile.max.size It does not take effect, causing the log file to be too large Co-authored-by: 854194341@qq.com <loukey_7821> * [HUDI-4844] Skip partition value resolving when the field does not exists for MergeOnReadInputFormat#getReader (apache#6678) * [MINOR] Fix the Spark job status description for metadata-only bootstrap operation (apache#6666) * [HUDI-3403] Ensure keygen props are set for bootstrap (apache#6645) * [HUDI-4193] Upgrade Protobuf to 3.21.5 (apache#5784) * [HUDI-4785] Fix partition discovery in bootstrap operation (apache#6673) Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-4706] Fix InternalSchemaChangeApplier#applyAddChange error to add nest type (apache#6486) InternalSchemaChangeApplier#applyAddChange forget to remove parent name when calling ColumnAddChange#addColumns * [HUDI-4851] Fixing CSI not handling `InSet` operator properly (apache#6685) * [HUDI-4796] MetricsReporter stop bug (apache#6619) * [HUDI-3861] update tblp 'path' when rename table (apache#5320) * [HUDI-4853] Get field by name for OverwriteNonDefaultsWithLatestAvroPayload to avoid schema mismatch (apache#6689) * [HUDI-4813] Fix infer keygen not work in sparksql side issue (apache#6634) * [HUDI-4813] Fix infer keygen not work in sparksql side issue Co-authored-by: xiaoxingstack <xiaoxingstack@didiglobal.com> * [HUDI-4856] Missing option for HoodieCatalogFactory (apache#6693) * [HUDI-4864] Fix AWSDmsAvroPayload#combineAndGetUpdateValue when using MOR snapshot query after delete operations with test (apache#6688) Co-authored-by: Rahil Chertara <rchertar@amazon.com> * [HUDI-4841] Fix sort idempotency issue (apache#6669) * [HUDI-4865] Optimize HoodieAvroUtils#isMetadataField to use O(1) complexity (apache#6702) * [HUDI-4736] Fix inflight clean action preventing clean service to continue when multiple cleans are not allowed (apache#6536) * [HUDI-4842] Support compaction strategy based on delta log file num (apache#6670) Co-authored-by: 苏承祥 <sucx@tuya.com> * [HUDI-4282] Repair IOException in CHDFS when check block corrupted in HoodieLogFileReader (apache#6031) Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-4757] Create pyspark examples (apache#6672) * [HUDI-3959] Rename class name for spark rdd reader (apache#5409) Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-4828] Fix the extraction of record keys which may be cut out (apache#6650) Co-authored-by: yangshuo3 <yangshuo3@kingsoft.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-4873] Report number of messages to be processed via metrics (apache#6271) Co-authored-by: Volodymyr Burenin <volodymyr.burenin@cloudkitchens.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-4870] Improve compaction config description (apache#6706) * [HUDI-3304] Support partial update payload (apache#4676) Co-authored-by: jian.feng <jian.feng@shopee.com> * [HUDI-4808] Fix HoodieSimpleBucketIndex not consider bucket num in lo… (apache#6630) * [HUDI-4808] Fix HoodieSimpleBucketIndex not consider bucket num in log file issue Co-authored-by: xiaoxingstack <xiaoxingstack@didiglobal.com> * [HUDI-4485] Bump spring shell to 2.1.1 in CLI (apache#6489) Bumped spring shell to 2.1.1 and updated the default value for show fsview all `pathRegex` parameter. * [minor] following 3304, some code refactoring (apache#6713) * [HUDI-4832] Fix drop partition meta sync (apache#6662) * [HUDI-4810] Fix log4j imports to use bridge API (apache#6710) Co-authored-by: dongsj <dongsj@asiainfo.com> * [HUDI-4877] Fix org.apache.hudi.index.bucket.TestHoodieSimpleBucketIndex#testTagLocation not work correct issue (apache#6717) Co-authored-by: xiaoxingstack <xiaoxingstack@didiglobal.com> * [HUDI-4326] add updateTableSerDeInfo for HiveSyncTool (apache#5920) - This pull request fix [SUPPORT] Hudi spark datasource error after migrate from 0.8 to 0.11 apache#5861* - The issue is caused by after changing the table to spark data source table, the table SerDeInfo is missing. * Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> * [MINOR] fix indent to make build pass (apache#6721) * [HUDI-3478] Implement CDC Write in Spark (apache#6697) * [HUDI-4326] Fix hive sync serde properties (apache#6722) * [HUDI-4875] Fix NoSuchTableException when dropping temporary view after applied HoodieSparkSessionExtension in Spark 3.2 (apache#6709) * [DOCS] Improve the quick start guide for Kafka Connect Sink (apache#6708) * [HUDI-4729] Fix file group pending compaction cannot be queried when query _ro table (apache#6516) File group in pending compaction can not be queried when query _ro table with spark. This commit fixes that. Co-authored-by: zhanshaoxiong <shaoxiong0001@@gmail.com> Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> * [HUDI-3983] Fix ClassNotFoundException when using hudi-spark-bundle to write table with hbase index (apache#6715) * [HUDI-4758] Add validations to java spark examples (apache#6615) * [HUDI-4792] Batch clean files to delete (apache#6580) This patch makes use of batch call to get fileGroup to delete during cleaning instead of 1 call per partition. This limit the number of call to the view and should fix the trouble with metadata table in context of lot of partitions. Fixes issue apache#6373 Co-authored-by: sivabalan <n.siva.b@gmail.com> * [HUDI-4363] Support Clustering row writer to improve performance (apache#6046) * [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data (apache#6734) * [HUDI-4851] Fixing handling of `UTF8String` w/in `InSet` operator (apache#6739) Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-3901] Correct the description of hoodie.index.type (apache#6749) * [MINOR] Add .mvn directory to gitignore (apache#6746) Co-authored-by: Rahil Chertara <rchertar@amazon.com> * add support for unraveling proto schemas * fix some compile issues * [HUDI-4901] Add avro.version to Flink profiles (apache#6757) * Add avro.version to Flink profiles Co-authored-by: Shawn Chang <yxchang@amazon.com> * [HUDI-4559] Support hiveSync command based on Call Produce Command (apache#6322) * [HUDI-4883] Supporting delete savepoint for MOR (apache#6744) Users could delete unnecessary savepoints and unblock archival for MOR table. * [HUDI-4897] Refactor the merge handle in CDC mode (apache#6740) * [HUDI-3523] Introduce AddColumnSchemaPostProcessor to support add columns to the end of a schema (apache#5031) * Revert "[HUDI-3523] Introduce AddColumnSchemaPostProcessor to support add columns to the end of a schema (apache#5031)" (apache#6768) This reverts commit 092375f. * [HUDI-3523] Introduce AddPrimitiveColumnSchemaPostProcessor to support add new primitive column to the end of a schema (apache#6769) * [HUDI-4903] Fix TestHoodieLogFormat`s minor typo (apache#6762) * [MINOR] Drastically reducing concurrency level (to avoid CI flakiness) (apache#6754) * Update HoodieIndex.java Fix a typo * [HUDI-4906] Fix the local tests for hudi-flink (apache#6763) * [HUDI-4899] Fixing compatibility w/ Spark 3.2.2 (apache#6755) * [HUDI-4892] Fix hudi-spark3-bundle (apache#6735) * [MINOR] Fix a few typos in HoodieIndex (apache#6784) Co-authored-by: xingjunwang <xingjunwang@tencent.com> * [HUDI-4412] Fix multi writer INSERT_OVERWRITE NPE bug (apache#6130) There are two minor issues fixed here: 1. When the insert_overwrite operation is performed, the clusteringPlan in the requestedReplaceMetadata will be null. Calling getFileIdsFromRequestedReplaceMetadata will cause NPE. 2. When insert_overwrite operation, inflightCommitMetadata!=null, getOperationType should be obtained from getHoodieInflightReplaceMetadata, the original code will have a null pointer. * [MINOR] retain avro's namespace (apache#6783) * [MINOR] Simple logging fix in LockManager (apache#6765) Co-authored-by: 苏承祥 <sucx@tuya.com> * [HUDI-4433] hudi-cli repair deduplicate not working with non-partitioned dataset (apache#6349) When using the repair deduplicate command with hudi-cli, there is no way to run it on the unpartitioned dataset, so modify the cli parameter. Co-authored-by: Xingjun Wang <wongxingjun@126.com> * [RFC-51][HUDI-3478] Update RFC: CDC support (apache#6256) * [HUDI-4915] improve avro serializer/deserializer (apache#6788) * [HUDI-3478] Implement CDC Read in Spark (apache#6727) * naming and style updates * [HUDI-4830] Fix testNoGlobalConfFileConfigured when add hudi-defaults.conf in default dir (apache#6652) * make test data random, reuse code * [HUDI-4760] Fixing repeated trigger of data file creations w/ clustering (apache#6561) - Apparently in clustering, data file creations are triggered twice since we don't cache the write status and for doing some validation, we do isEmpty on JavaRDD which ended up retriggering the action. Fixing the double de-referencing in this patch. * [HUDI-4914] Managed memory weight should be set when sort clustering is enabled (apache#6792) * [HUDI-4910] Fix unknown variable or type "Cast" (apache#6778) * [HUDI-4918] Fix bugs about when trying to show the non -existing key from env, NullPointException occurs. (apache#6794) * [HUDI-4718] Add Kerberos kinit command support. (apache#6719) * add test for 2 different recursion depths, fix schema cache key * add unsigned long support * better handle other types * rebase on 4904 * get all tests working * fix oneof expected schema, update tests after rebase * [HUDI-4902] Set default partitioner for SIMPLE BUCKET index (apache#6759) * [MINOR] Update PR template with documentation update (apache#6748) * revert scala binary change * try a different method to avoid avro version * [HUDI-4904] Add support for unraveling proto schemas in ProtoClassBasedSchemaProvider (apache#6761) If a user provides a recursive proto schema, it will fail when we write to parquet. We need to allow the user to specify how many levels of recursion they want before truncating the remaining data. Main changes to existing code: ProtoClassBasedSchemaProvider tracks number of times a message descriptor is seen within a branch of the schema traversal once the number of times that descriptor is seen exceeds the user provided limit, set the field to preset record that will contain two fields: 1) the remaining data serialized as a proto byte array, 2) the descriptors full name for context about what is in that byte array Converting from a proto to an avro now accounts for this truncation of the input * delete unused file * [HUDI-4907] Prevent single commit multi instant issue (apache#6766) Co-authored-by: TengHuo <teng_huo@outlook.com> Co-authored-by: yuzhao.cyz <yuzhao.cyz@gmail.com> * [HUDI-4923] Fix flaky TestHoodieReadClient.testReadFilterExistAfterBulkInsertPrepped (apache#6801) Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-4848] Fixing repair deprecated partition tool (apache#6731) * [HUDI-4913] Fix HoodieSnapshotExporter for writing to a different S3 bucket or FS (apache#6785) * address PR feedback, update decimal precision * fix isNullable issue, check if class is Int64value * checkstyle fix * change wrapper descriptor set initialization * add in testing for unsigned long to BigInteger conversion * [HUDI-4453] Fix schema to include partition columns in bootstrap operation (apache#6676) Turn off the type inference of the partition column to be consistent with existing behavior. Add notes around partition column type inference. * [HUDI-2780] Fix the issue of Mor log skipping complete blocks when reading data (apache#4015) Co-authored-by: huangjing02 <huangjing02@bilibili.com> Co-authored-by: sivabalan <n.siva.b@gmail.com> * [HUDI-4924] Auto-tune dedup parallelism (apache#6802) * [HUDI-4687] Avoid setAccessible which breaks strong encapsulation (apache#6657) Use JOL GraphLayout for estimating deep size. * [MINOR] fixing validate async operations to poll completed clean instances (apache#6814) * [HUDI-4734] Deltastreamer table config change validation (apache#6753) Co-authored-by: sivabalan <n.siva.b@gmail.com> * [HUDI-4934] Revert batch clean files (apache#6813) * Revert "[HUDI-4792] Batch clean files to delete (apache#6580)" This reverts commit cbf9b83. * [HUDI-4722] Added locking metrics for Hudi (apache#6502) * [HUDI-4936] Fix `as.of.instant` not recognized as hoodie config (apache#5616) Co-authored-by: leon <leon@leondeMacBook-Pro.local> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-4861] Relaxing `MERGE INTO` constraints to permit limited casting operations w/in matched-on conditions (apache#6820) * [HUDI-4885] Adding org.apache.avro to hudi-hive-sync bundle (apache#6729) * [HUDI-4951] Fix incorrect use of Long.getLong() (apache#6828) * [MINOR] Use base path URI in ITTestDataStreamWrite (apache#6826) * [HUDI-4308] READ_OPTIMIZED read mode will temporary loss of data when compaction (apache#6664) Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-4237] Fixing empty partition-values being sync'd to HMS (apache#6821) Co-authored-by: dujunling <dujunling@bytedance.com> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-4925] Should Force to use ExpressionPayload in MergeIntoTableCommand (apache#6355) Co-authored-by: jian.feng <jian.feng@shopee.com> * [HUDI-4850] Add incremental source from GCS to Hudi (apache#6665) Adds an incremental source from GCS based on a similar design as https://hudi.apache.org/blog/2021/08/23/s3-events-source * [HUDI-4957] Shade JOL in bundles to fix NoClassDefFoundError:GraphLayout (apache#6839) * [HUDI-4718] Add Kerberos kdestroy command support (apache#6810) * [HUDI-4916] Implement change log feed for Flink (apache#6840) * [HUDI-4769] Option read.streaming.skip_compaction skips delta commit (apache#6848) * [HUDI-4949] optimize cdc read to avoid the problem of reusing buffer underlying the Row (apache#6805) * [HUDI-4966] Add a partition extractor to handle partition values with slashes (apache#6851) * [MINOR] Fix testUpdateRejectForClustering (apache#6852) * [HUDI-4962] Move cloud dependencies to cloud modules (apache#6846) * [HOTFIX] Fix source release validate script (apache#6865) * [HUDI-4980] Calculate avg record size using commit only (apache#6864) Calculate average record size for Spark upsert partitioner based on commit instants only. Previously it's based on commit and replacecommit, of which the latter may be created by clustering which has inaccurately smaller average record sizes, which could result in OOM due to size underestimation. * shade protobuf dependency * Revert "[HUDI-4915] improve avro serializer/deserializer (apache#6788)" (apache#6809) This reverts commit 79b3e2b. * [HUDI-4970] Update kafka-connect readme and refactor HoodieConfig#create (apache#6857) * Enhancing README for multi-writer tests (apache#6870) * [MINOR] Fix deploy script for flink 1.15 (apache#6872) * [HUDI-4992] Fixing invalid min/max record key stats in Parquet metadata (apache#6883) * Revert "shade protobuf dependency" This reverts commit f03f961. * [HUDI-4972] Fixes to make unit tests work on m1 mac (apache#6751) * [HUDI-2786] Docker demo on mac aarch64 (apache#6859) * [HUDI-4971] Fix shading kryo-shaded with reusing configs (apache#6873) * [HUDI-3900] [UBER] Support log compaction action for MOR tables (apache#5958) - Adding log compaction support to MOR table. subsequent log blocks can now be compacted into larger log blocks without needing to go for full compaction (by merging w/ base file). - New timeline action is introduced for the purpose. Co-authored-by: sivabalan <n.siva.b@gmail.com> * Relocate apache http package (apache#6874) * [HUDI-4975] Fix datahub bundle dependency (apache#6896) * [HUDI-4999] Refactor FlinkOptions#allOptions and CatalogOptions#allOptions (apache#6901) * [MINOR] Update GitHub setting for merge button (apache#6922) Only allow squash and merge. Disable merge and rebase * [HUDI-4993] Make DataPlatform name and Dataset env configurable in DatahubSyncTool (apache#6885) * [MINOR] Fix name spelling for RunBootstrapProcedure * [HUDI-4754] Add compliance check in github actions (apache#6575) * [HUDI-4963] Extend InProcessLockProvider to support multiple table ingestion (apache#6847) Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local> * [HUDI-4994] Fix bug that prevents re-ingestion of soft-deleted Datahub entities (apache#6886) * Implement Create/Drop/Show/Refresh Secondary Index (apache#5933) * remove oss pr compliance * different approach for shutdown all metrics instances * remove flink testing, update metrics shutdown Co-authored-by: Qi Ji <qjqqyy@users.noreply.github.com> Co-authored-by: wuwenchi <wuwenchihdu@hotmail.com> Co-authored-by: 吴文池 <wuwenchi@deepexi.com> Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> Co-authored-by: Danny Chan <yuzhao.cyz@gmail.com> Co-authored-by: Nicholas Jiang <programgeek@163.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> Co-authored-by: Alexey Kudinkin <alexey@infinilake.com> Co-authored-by: 董可伦 <dongkelun01@inspur.com> Co-authored-by: 冯健 <fengjian428@gmail.com> Co-authored-by: jian.feng <jian.feng@shopee.com> Co-authored-by: hehuiyuan <471627698@qq.com> Co-authored-by: Zouxxyy <zouxxyy@qq.com> Co-authored-by: Manu <36392121+xicm@users.noreply.github.com> Co-authored-by: shaoxiong.zhan <31836510+microbearz@users.noreply.github.com> Co-authored-by: zhanshaoxiong <shaoxiong0001@@gmail.com> Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com> Co-authored-by: Shiyan Xu <2701446+xushiyan@users.noreply.github.com> Co-authored-by: Yann Byron <biyan900116@gmail.com> Co-authored-by: KnightChess <981159963@qq.com> Co-authored-by: Teng <teng_huo@outlook.com> Co-authored-by: leandro-rouberte <37634317+leandro-rouberte@users.noreply.github.com> Co-authored-by: Jon Vexler <jbvexler@gmail.com> Co-authored-by: smilecrazy <smilecrazy1h@gmail.com> Co-authored-by: xxhua <xxhua@freewheel.tv> Co-authored-by: YueZhang <69956021+zhangyue19921010@users.noreply.github.com> Co-authored-by: yuezhang <yuezhang@freewheel.tv> Co-authored-by: HunterXHunter <1356469429@qq.com> Co-authored-by: komao <masterwangzx@gmail.com> Co-authored-by: wangzixuan.wzxuan <wangzixuan.wzxuan@bytedance.com> Co-authored-by: felixYyu <felix2003@live.cn> Co-authored-by: Bingeng Huang <304979636@qq.com> Co-authored-by: hbg <bingeng.huang@shopee.com> Co-authored-by: Vinish Reddy <vinishreddygunner17@gmail.com> Co-authored-by: junyuc25 <10862251+junyuc25@users.noreply.github.com> Co-authored-by: voonhous <voonhousu@gmail.com> Co-authored-by: Xingcan Cui <xcui@wealthsimple.com> Co-authored-by: Yuwei XIAO <ywxiaozero@gmail.com> Co-authored-by: wangp-nhlab <95683046+wangp-nhlab@users.noreply.github.com> Co-authored-by: Nicolas Paris <nicolas.paris@riseup.net> Co-authored-by: Rahil C <32500120+rahil-c@users.noreply.github.com> Co-authored-by: Rahil Chertara <rchertar@amazon.com> Co-authored-by: Shawn Chang <42792772+CTTY@users.noreply.github.com> Co-authored-by: Shawn Chang <yxchang@amazon.com> Co-authored-by: Abhishek Modi <modi@makenotion.com> Co-authored-by: shuai.xu <chiggics@gmail.com> Co-authored-by: 徐帅 <xushuai@MacBook-Pro-6.local> Co-authored-by: Angel Conde <neuw84@gmail.com> Co-authored-by: Angel Conde Manjon <acmanjon@amazon.com> Co-authored-by: FocusComputing <xiaoxingstack@gmail.com> Co-authored-by: xiaoxingstack <xiaoxingstack@didiglobal.com> Co-authored-by: eric9204 <90449228+eric9204@users.noreply.github.com> Co-authored-by: dongsj <dongsj@asiainfo.com> Co-authored-by: Volodymyr Burenin <vburenin@gmail.com> Co-authored-by: Volodymyr Burenin <volodymyr.burenin@cloudkitchens.com> Co-authored-by: luokey <loukey.j@gmail.com> Co-authored-by: Sylwester Lachiewicz <slachiewicz@apache.org> Co-authored-by: 苏承祥 <scx_white@aliyun.com> Co-authored-by: 苏承祥 <sucx@tuya.com> Co-authored-by: 5herhom <543872547@qq.com> Co-authored-by: Jon Vexler <jon@onehouse.ai> Co-authored-by: simonsssu <barley0806@gmail.com> Co-authored-by: y0908105023 <283999377@qq.com> Co-authored-by: yangshuo3 <yangshuo3@kingsoft.com> Co-authored-by: Paul Zhang <xzhangyao@126.com> Co-authored-by: Kyle Zhike Chen <zk.chan007@gmail.com> Co-authored-by: dohongdayi <dohongdayi@126.com> Co-authored-by: RexAn <bonean131@gmail.com> Co-authored-by: ForwardXu <forwardxu315@gmail.com> Co-authored-by: wangxianghu <wangxianghu@apache.org> Co-authored-by: wulei <wulei.1023@bytedance.com> Co-authored-by: Xingjun Wang <wongxingjun@126.com> Co-authored-by: Prasanna Rajaperumal <prasanna.raj@live.com> Co-authored-by: xingjunwang <xingjunwang@tencent.com> Co-authored-by: liujinhui <965147871@qq.com> Co-authored-by: ChanKyeong Won <brightwon.dev@gmail.com> Co-authored-by: Forus <70357858+Forus0322@users.noreply.github.com> Co-authored-by: hj2016 <hj3245459@163.com> Co-authored-by: huangjing02 <huangjing02@bilibili.com> Co-authored-by: jsbali <jsbali@uber.com> Co-authored-by: Leon Tsao <31072303+gnailJC@users.noreply.github.com> Co-authored-by: leon <leon@leondeMacBook-Pro.local> Co-authored-by: 申胜利 <48829688+shenshengli@users.noreply.github.com> Co-authored-by: aiden.dong <782112163@qq.com> Co-authored-by: dujunling <dujunling@bytedance.com> Co-authored-by: Pramod Biligiri <pramodbiligiri@gmail.com> Co-authored-by: Zouxxyy <zouxinyu.zxy@alibaba-inc.com> Co-authored-by: Alexey Kudinkin <alexey.kudinkin@gmail.com> Co-authored-by: Surya Prasanna <syalla@uber.com> Co-authored-by: Rajesh Mahindra <76502047+rmahindra123@users.noreply.github.com> Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local> Co-authored-by: huberylee <shibei.lh@foxmail.com>

* [DOCS] Fix Slack invite link in README.md (apache#6648) * [HUDI-3558] Consistent bucket index: bucket resizing (split&merge) & concurrent write during resizing (apache#4958) RFC-42 implementation - Implement bucket resizing for consistent hashing index. - Support concurrent write during bucket resizing. This change added tests and can be verified as follows: - The test of the consistent bucket index is enhanced to include the case of bucket resizing. - Tests of different bucket resizing cases. - Tests of concurrent resizing, and concurrent writes during resizing. * [MINOR] Add dev setup and spark 3.3 profile to readme (apache#6656) * [HUDI-4831] Fix AWSDmsAvroPayload#getInsertValue,combineAndGetUpdateValue to invoke correct api (apache#6637) Co-authored-by: Rahil Chertara <rchertar@amazon.com> * [HUDI-4806] Use Avro version from the root pom for Flink bundle (apache#6628) Co-authored-by: Shawn Chang <yxchang@amazon.com> * [HUDI-4833] Add Postgres Schema Name to Postgres Debezium Source (apache#6616) * [HUDI-4825] Remove redundant fields in serialized commit metadata in JSON (apache#6646) * [MINOR] Insert should call validateInsertSchema in HoodieFlinkWriteClient (apache#5919) Co-authored-by: 徐帅 <xushuai@MacBook-Pro-6.local> * [HUDI-3879] Suppress exceptions that are not fatal in HoodieMetadataTableValidator (apache#5344) Co-authored-by: yuezhang <yuezhang@freewheel.tv> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-3998] Fix getCommitsSinceLastCleaning failed when async cleaning (apache#5478) - The last completed commit timestamp is used to calculate how many commit have been completed since the last clean. we might need to save this w/ clean plan so that next time when we trigger clean, we can start calculating from that. * [HUDI-3994] - Added support for initializing DeltaStreamer without a defined Spark Master (apache#5630) That will enable the usage of DeltaStreamer on environments such as AWS Glue or other serverless environments where the spark master is inherited and we do not have access to it. Co-authored-by: Angel Conde Manjon <acmanjon@amazon.com> * [HUDI-4628] Hudi-flink support GLOBAL_BLOOM，GLOBAL_SIMPLE，BUCKET index type (apache#6406) Co-authored-by: xiaoxingstack <xiaoxingstack@didiglobal.com> * [HUDI-4814] Schedules new clustering plan based on latest clustering instant (apache#6574) * Keep a clustering running at the same time * Simplify filtering logic Co-authored-by: dongsj <dongsj@asiainfo.com> * [HUDI-4817] Delete markers after full-record bootstrap operation (apache#6667) * [HUDI-4691] Cleaning up duplicated classes in Spark 3.3 module (apache#6550) As part of adding support for Spark 3.3 in Hudi 0.12, a lot of the logic from Spark 3.2 module has been simply copied over. This PR is rectifying that by: 1. Creating new module "hudi-spark3.2plus-common" (that is shared across Spark 3.2 and Spark 3.3) 2. Moving shared components under "hudi-spark3.2plus-common" * [HUDI-4752] Add dedup support for MOR table in cli (apache#6608) * [HUDI-4837] Stop sleeping where it is not necessary after the success (apache#6270) Co-authored-by: Volodymyr Burenin <volodymyr.burenin@cloudkitchens.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-4843] Delete the useless timer in BaseRollbackActionExecutor (apache#6671) Co-authored-by: 吴文池 <wuwenchi@deepexi.com> * [HUDI-4780] hoodie.logfile.max.size It does not take effect, causing the log file to be too large (apache#6602) * hoodie.logfile.max.size It does not take effect, causing the log file to be too large Co-authored-by: 854194341@qq.com <loukey_7821> * [HUDI-4844] Skip partition value resolving when the field does not exists for MergeOnReadInputFormat#getReader (apache#6678) * [MINOR] Fix the Spark job status description for metadata-only bootstrap operation (apache#6666) * [HUDI-3403] Ensure keygen props are set for bootstrap (apache#6645) * [HUDI-4193] Upgrade Protobuf to 3.21.5 (apache#5784) * [HUDI-4785] Fix partition discovery in bootstrap operation (apache#6673) Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-4706] Fix InternalSchemaChangeApplier#applyAddChange error to add nest type (apache#6486) InternalSchemaChangeApplier#applyAddChange forget to remove parent name when calling ColumnAddChange#addColumns * [HUDI-4851] Fixing CSI not handling `InSet` operator properly (apache#6685) * [HUDI-4796] MetricsReporter stop bug (apache#6619) * [HUDI-3861] update tblp 'path' when rename table (apache#5320) * [HUDI-4853] Get field by name for OverwriteNonDefaultsWithLatestAvroPayload to avoid schema mismatch (apache#6689) * [HUDI-4813] Fix infer keygen not work in sparksql side issue (apache#6634) * [HUDI-4813] Fix infer keygen not work in sparksql side issue Co-authored-by: xiaoxingstack <xiaoxingstack@didiglobal.com> * [HUDI-4856] Missing option for HoodieCatalogFactory (apache#6693) * [HUDI-4864] Fix AWSDmsAvroPayload#combineAndGetUpdateValue when using MOR snapshot query after delete operations with test (apache#6688) Co-authored-by: Rahil Chertara <rchertar@amazon.com> * [HUDI-4841] Fix sort idempotency issue (apache#6669) * [HUDI-4865] Optimize HoodieAvroUtils#isMetadataField to use O(1) complexity (apache#6702) * [HUDI-4736] Fix inflight clean action preventing clean service to continue when multiple cleans are not allowed (apache#6536) * [HUDI-4842] Support compaction strategy based on delta log file num (apache#6670) Co-authored-by: 苏承祥 <sucx@tuya.com> * [HUDI-4282] Repair IOException in CHDFS when check block corrupted in HoodieLogFileReader (apache#6031) Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-4757] Create pyspark examples (apache#6672) * [HUDI-3959] Rename class name for spark rdd reader (apache#5409) Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-4828] Fix the extraction of record keys which may be cut out (apache#6650) Co-authored-by: yangshuo3 <yangshuo3@kingsoft.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-4873] Report number of messages to be processed via metrics (apache#6271) Co-authored-by: Volodymyr Burenin <volodymyr.burenin@cloudkitchens.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-4870] Improve compaction config description (apache#6706) * [HUDI-3304] Support partial update payload (apache#4676) Co-authored-by: jian.feng <jian.feng@shopee.com> * [HUDI-4808] Fix HoodieSimpleBucketIndex not consider bucket num in lo… (apache#6630) * [HUDI-4808] Fix HoodieSimpleBucketIndex not consider bucket num in log file issue Co-authored-by: xiaoxingstack <xiaoxingstack@didiglobal.com> * [HUDI-4485] Bump spring shell to 2.1.1 in CLI (apache#6489) Bumped spring shell to 2.1.1 and updated the default value for show fsview all `pathRegex` parameter. * [minor] following 3304, some code refactoring (apache#6713) * [HUDI-4832] Fix drop partition meta sync (apache#6662) * [HUDI-4810] Fix log4j imports to use bridge API (apache#6710) Co-authored-by: dongsj <dongsj@asiainfo.com> * [HUDI-4877] Fix org.apache.hudi.index.bucket.TestHoodieSimpleBucketIndex#testTagLocation not work correct issue (apache#6717) Co-authored-by: xiaoxingstack <xiaoxingstack@didiglobal.com> * [HUDI-4326] add updateTableSerDeInfo for HiveSyncTool (apache#5920) - This pull request fix [SUPPORT] Hudi spark datasource error after migrate from 0.8 to 0.11 apache#5861* - The issue is caused by after changing the table to spark data source table, the table SerDeInfo is missing. * Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> * [MINOR] fix indent to make build pass (apache#6721) * [HUDI-3478] Implement CDC Write in Spark (apache#6697) * [HUDI-4326] Fix hive sync serde properties (apache#6722) * [HUDI-4875] Fix NoSuchTableException when dropping temporary view after applied HoodieSparkSessionExtension in Spark 3.2 (apache#6709) * [DOCS] Improve the quick start guide for Kafka Connect Sink (apache#6708) * [HUDI-4729] Fix file group pending compaction cannot be queried when query _ro table (apache#6516) File group in pending compaction can not be queried when query _ro table with spark. This commit fixes that. Co-authored-by: zhanshaoxiong <shaoxiong0001@@gmail.com> Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> * [HUDI-3983] Fix ClassNotFoundException when using hudi-spark-bundle to write table with hbase index (apache#6715) * [HUDI-4758] Add validations to java spark examples (apache#6615) * [HUDI-4792] Batch clean files to delete (apache#6580) This patch makes use of batch call to get fileGroup to delete during cleaning instead of 1 call per partition. This limit the number of call to the view and should fix the trouble with metadata table in context of lot of partitions. Fixes issue apache#6373 Co-authored-by: sivabalan <n.siva.b@gmail.com> * [HUDI-4363] Support Clustering row writer to improve performance (apache#6046) * [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data (apache#6734) * [HUDI-4851] Fixing handling of `UTF8String` w/in `InSet` operator (apache#6739) Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-3901] Correct the description of hoodie.index.type (apache#6749) * [MINOR] Add .mvn directory to gitignore (apache#6746) Co-authored-by: Rahil Chertara <rchertar@amazon.com> * add support for unraveling proto schemas * fix some compile issues * [HUDI-4901] Add avro.version to Flink profiles (apache#6757) * Add avro.version to Flink profiles Co-authored-by: Shawn Chang <yxchang@amazon.com> * [HUDI-4559] Support hiveSync command based on Call Produce Command (apache#6322) * [HUDI-4883] Supporting delete savepoint for MOR (apache#6744) Users could delete unnecessary savepoints and unblock archival for MOR table. * [HUDI-4897] Refactor the merge handle in CDC mode (apache#6740) * [HUDI-3523] Introduce AddColumnSchemaPostProcessor to support add columns to the end of a schema (apache#5031) * Revert "[HUDI-3523] Introduce AddColumnSchemaPostProcessor to support add columns to the end of a schema (apache#5031)" (apache#6768) This reverts commit 092375f. * [HUDI-3523] Introduce AddPrimitiveColumnSchemaPostProcessor to support add new primitive column to the end of a schema (apache#6769) * [HUDI-4903] Fix TestHoodieLogFormat`s minor typo (apache#6762) * [MINOR] Drastically reducing concurrency level (to avoid CI flakiness) (apache#6754) * Update HoodieIndex.java Fix a typo * [HUDI-4906] Fix the local tests for hudi-flink (apache#6763) * [HUDI-4899] Fixing compatibility w/ Spark 3.2.2 (apache#6755) * [HUDI-4892] Fix hudi-spark3-bundle (apache#6735) * [MINOR] Fix a few typos in HoodieIndex (apache#6784) Co-authored-by: xingjunwang <xingjunwang@tencent.com> * [HUDI-4412] Fix multi writer INSERT_OVERWRITE NPE bug (apache#6130) There are two minor issues fixed here: 1. When the insert_overwrite operation is performed, the clusteringPlan in the requestedReplaceMetadata will be null. Calling getFileIdsFromRequestedReplaceMetadata will cause NPE. 2. When insert_overwrite operation, inflightCommitMetadata!=null, getOperationType should be obtained from getHoodieInflightReplaceMetadata, the original code will have a null pointer. * [MINOR] retain avro's namespace (apache#6783) * [MINOR] Simple logging fix in LockManager (apache#6765) Co-authored-by: 苏承祥 <sucx@tuya.com> * [HUDI-4433] hudi-cli repair deduplicate not working with non-partitioned dataset (apache#6349) When using the repair deduplicate command with hudi-cli, there is no way to run it on the unpartitioned dataset, so modify the cli parameter. Co-authored-by: Xingjun Wang <wongxingjun@126.com> * [RFC-51][HUDI-3478] Update RFC: CDC support (apache#6256) * [HUDI-4915] improve avro serializer/deserializer (apache#6788) * [HUDI-3478] Implement CDC Read in Spark (apache#6727) * naming and style updates * [HUDI-4830] Fix testNoGlobalConfFileConfigured when add hudi-defaults.conf in default dir (apache#6652) * make test data random, reuse code * [HUDI-4760] Fixing repeated trigger of data file creations w/ clustering (apache#6561) - Apparently in clustering, data file creations are triggered twice since we don't cache the write status and for doing some validation, we do isEmpty on JavaRDD which ended up retriggering the action. Fixing the double de-referencing in this patch. * [HUDI-4914] Managed memory weight should be set when sort clustering is enabled (apache#6792) * [HUDI-4910] Fix unknown variable or type "Cast" (apache#6778) * [HUDI-4918] Fix bugs about when trying to show the non -existing key from env, NullPointException occurs. (apache#6794) * [HUDI-4718] Add Kerberos kinit command support. (apache#6719) * add test for 2 different recursion depths, fix schema cache key * add unsigned long support * better handle other types * rebase on 4904 * get all tests working * fix oneof expected schema, update tests after rebase * [HUDI-4902] Set default partitioner for SIMPLE BUCKET index (apache#6759) * [MINOR] Update PR template with documentation update (apache#6748) * revert scala binary change * try a different method to avoid avro version * [HUDI-4904] Add support for unraveling proto schemas in ProtoClassBasedSchemaProvider (apache#6761) If a user provides a recursive proto schema, it will fail when we write to parquet. We need to allow the user to specify how many levels of recursion they want before truncating the remaining data. Main changes to existing code: ProtoClassBasedSchemaProvider tracks number of times a message descriptor is seen within a branch of the schema traversal once the number of times that descriptor is seen exceeds the user provided limit, set the field to preset record that will contain two fields: 1) the remaining data serialized as a proto byte array, 2) the descriptors full name for context about what is in that byte array Converting from a proto to an avro now accounts for this truncation of the input * delete unused file * [HUDI-4907] Prevent single commit multi instant issue (apache#6766) Co-authored-by: TengHuo <teng_huo@outlook.com> Co-authored-by: yuzhao.cyz <yuzhao.cyz@gmail.com> * [HUDI-4923] Fix flaky TestHoodieReadClient.testReadFilterExistAfterBulkInsertPrepped (apache#6801) Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-4848] Fixing repair deprecated partition tool (apache#6731) * [HUDI-4913] Fix HoodieSnapshotExporter for writing to a different S3 bucket or FS (apache#6785) * address PR feedback, update decimal precision * fix isNullable issue, check if class is Int64value * checkstyle fix * change wrapper descriptor set initialization * add in testing for unsigned long to BigInteger conversion * [HUDI-4453] Fix schema to include partition columns in bootstrap operation (apache#6676) Turn off the type inference of the partition column to be consistent with existing behavior. Add notes around partition column type inference. * [HUDI-2780] Fix the issue of Mor log skipping complete blocks when reading data (apache#4015) Co-authored-by: huangjing02 <huangjing02@bilibili.com> Co-authored-by: sivabalan <n.siva.b@gmail.com> * [HUDI-4924] Auto-tune dedup parallelism (apache#6802) * [HUDI-4687] Avoid setAccessible which breaks strong encapsulation (apache#6657) Use JOL GraphLayout for estimating deep size. * [MINOR] fixing validate async operations to poll completed clean instances (apache#6814) * [HUDI-4734] Deltastreamer table config change validation (apache#6753) Co-authored-by: sivabalan <n.siva.b@gmail.com> * [HUDI-4934] Revert batch clean files (apache#6813) * Revert "[HUDI-4792] Batch clean files to delete (apache#6580)" This reverts commit cbf9b83. * [HUDI-4722] Added locking metrics for Hudi (apache#6502) * [HUDI-4936] Fix `as.of.instant` not recognized as hoodie config (apache#5616) Co-authored-by: leon <leon@leondeMacBook-Pro.local> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-4861] Relaxing `MERGE INTO` constraints to permit limited casting operations w/in matched-on conditions (apache#6820) * [HUDI-4885] Adding org.apache.avro to hudi-hive-sync bundle (apache#6729) * [HUDI-4951] Fix incorrect use of Long.getLong() (apache#6828) * [MINOR] Use base path URI in ITTestDataStreamWrite (apache#6826) * [HUDI-4308] READ_OPTIMIZED read mode will temporary loss of data when compaction (apache#6664) Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-4237] Fixing empty partition-values being sync'd to HMS (apache#6821) Co-authored-by: dujunling <dujunling@bytedance.com> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-4925] Should Force to use ExpressionPayload in MergeIntoTableCommand (apache#6355) Co-authored-by: jian.feng <jian.feng@shopee.com> * [HUDI-4850] Add incremental source from GCS to Hudi (apache#6665) Adds an incremental source from GCS based on a similar design as https://hudi.apache.org/blog/2021/08/23/s3-events-source * [HUDI-4957] Shade JOL in bundles to fix NoClassDefFoundError:GraphLayout (apache#6839) * [HUDI-4718] Add Kerberos kdestroy command support (apache#6810) * [HUDI-4916] Implement change log feed for Flink (apache#6840) * [HUDI-4769] Option read.streaming.skip_compaction skips delta commit (apache#6848) * [HUDI-4949] optimize cdc read to avoid the problem of reusing buffer underlying the Row (apache#6805) * [HUDI-4966] Add a partition extractor to handle partition values with slashes (apache#6851) * [MINOR] Fix testUpdateRejectForClustering (apache#6852) * [HUDI-4962] Move cloud dependencies to cloud modules (apache#6846) * [HOTFIX] Fix source release validate script (apache#6865) * [HUDI-4980] Calculate avg record size using commit only (apache#6864) Calculate average record size for Spark upsert partitioner based on commit instants only. Previously it's based on commit and replacecommit, of which the latter may be created by clustering which has inaccurately smaller average record sizes, which could result in OOM due to size underestimation. * shade protobuf dependency * Revert "[HUDI-4915] improve avro serializer/deserializer (apache#6788)" (apache#6809) This reverts commit 79b3e2b. * [HUDI-4970] Update kafka-connect readme and refactor HoodieConfig#create (apache#6857) * Enhancing README for multi-writer tests (apache#6870) * [MINOR] Fix deploy script for flink 1.15 (apache#6872) * [HUDI-4992] Fixing invalid min/max record key stats in Parquet metadata (apache#6883) * Revert "shade protobuf dependency" This reverts commit f03f961. * [HUDI-4972] Fixes to make unit tests work on m1 mac (apache#6751) * [HUDI-2786] Docker demo on mac aarch64 (apache#6859) * [HUDI-4971] Fix shading kryo-shaded with reusing configs (apache#6873) * [HUDI-3900] [UBER] Support log compaction action for MOR tables (apache#5958) - Adding log compaction support to MOR table. subsequent log blocks can now be compacted into larger log blocks without needing to go for full compaction (by merging w/ base file). - New timeline action is introduced for the purpose. Co-authored-by: sivabalan <n.siva.b@gmail.com> * Relocate apache http package (apache#6874) * [HUDI-4975] Fix datahub bundle dependency (apache#6896) * [HUDI-4999] Refactor FlinkOptions#allOptions and CatalogOptions#allOptions (apache#6901) * [MINOR] Update GitHub setting for merge button (apache#6922) Only allow squash and merge. Disable merge and rebase * [HUDI-4993] Make DataPlatform name and Dataset env configurable in DatahubSyncTool (apache#6885) * [MINOR] Fix name spelling for RunBootstrapProcedure * [HUDI-4754] Add compliance check in github actions (apache#6575) * [HUDI-4963] Extend InProcessLockProvider to support multiple table ingestion (apache#6847) Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local> * [HUDI-4994] Fix bug that prevents re-ingestion of soft-deleted Datahub entities (apache#6886) * Implement Create/Drop/Show/Refresh Secondary Index (apache#5933) * [MINOR] Moved readme from .github to the workflows folder (apache#6932) * [HUDI-4952] Fixing reading from metadata table when there are no inflight commits (apache#6836) * Fixing reading from metadata table when there are no inflight commits * Fixing reading from metadata if not fully built out * addressing minor comments * fixing sql conf and options interplay * addressing minor refactoring * [HUDI-1575][RFC-56] Early Conflict Detection For Multi-writer (apache#6003) Co-authored-by: yuezhang <yuezhang@yuezhang-mac.freewheelmedia.net> Co-authored-by: yuezhang <yuezhang@freewheel.tv> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-5006] Use the same wrapper for timestamp type metadata for parquet and log files (apache#6918) Before this patch, for timestamp type, we use LongWrapper for parquet and TimestampMicrosWrapper for avro log, they may keep different precision val here, for example, with timestamp(3), LongWrapper keeps the val as a millisecond long from EPOCH instant, while TimestampMicrosWrapper keeps the val as micro-seconds. For spark, it uses micro-seconds internally for timestamp type value, while flink uses the TimestampData internally, we better keeps the same precision for better compatibility here. * [HUDI-5016] Flink clustering does not reserve commit metadata (apache#6929) * [HUDI-3900] Fixing hdfs setup and tear down in tests to avoid flakiness (apache#6912) * [HUDI-5002] Remove deprecated API usage in SparkHoodieHBaseIndex#generateStatement (apache#6909) Co-authored-by: slfan1989 <louj1988@@> * [HUDI-5010] Fix flink hive catalog external config not work (apache#6923) * fix flink catalog external config not work * [HUDI-4948] Improve CDC Write (apache#6818) * improve cdc write to support multiple log files * update: use map to store the cdc stats * [HUDI-5030] Fix TestPartialUpdateAvroPayload.testUseLatestRecordMetaValue(apache#6948) * [HUDI-5033] Fix Broken Link In MultipleSparkJobExecutionStrategy (apache#6951) Co-authored-by: slfan1989 <louj1988@@> * [HUDI-5037] Upgrade org.apache.thrift:libthrift to 0.14.0 (apache#6941) * [MINOR] Fixing verbosity of docker set up (apache#6944) * [HUDI-5022] Make better error messages for pr compliance (apache#6934) * [HUDI-5003] Fix the type of InLineFileSystem`startOffset to long (apache#6916) * [HUDI-4855] Add missing table configs for bootstrap in Deltastreamer (apache#6694) * [MINOR] Handling null event time (apache#6876) * [MINOR] Update DOAP with 0.12.1 Release (apache#6988) * [MINOR] Increase maxParameters size in scalastyle (apache#6987) * [HUDI-3900] Closing resources in TestHoodieLogRecord (apache#6995) * [MINOR] Test case for hoodie.merge.allow.duplicate.on.inserts (apache#6949) * [HUDI-4982] Add validation job for spark bundles in GitHub Actions (apache#6954) * [HUDI-5041] Fix lock metric register confict error (apache#6968) Co-authored-by: hbg <bingeng.huang@shopee.com> * [HUDI-4998] Infer partition extractor class first from meta sync partition fields (apache#6899) * [HUDI-4781] Allow omit metadata fields for hive sync (apache#6471) Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-4997] Use jackson-v2 import instead of jackson-v1 (apache#6893) Co-authored-by: slfan1989 <louj1988@@> * [HUDI-3900] Fixing tempDir usage in TestHoodieLogFormat (apache#6981) * [HUDI-4995] Relocate httpcomponents (apache#6906) * [MINOR] Update GitHub setting for branch protection (apache#7008) - require at least 1 approving review * [HUDI-4960] Upgrade jetty version for timeline server (apache#6844) Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-5046] Support all the hive sync options for flink sql (apache#6985) * [MINOR] fix cdc flake ut (apache#7016) * [MINOR] Remove redundant space in PR compliance check (apache#7022) * [HUDI-5063] Enabling run time stats to be serialized with commit metadata (apache#7006) * [HUDI-5070] Adding lock provider to testCleaner tests since async cleaning is invoked (apache#7023) * [HUDI-5070] Move flaky cleaner tests to separate class (apache#7034) * [HUDI-4971] Remove direct use of kryo from `SerDeUtils` (apache#7014) Co-authored-by: Alexey Kudinkin <alexey@infinilake.com> * [HUDI-5081] Tests clean up in hudi-utilities (apache#7033) * [HUDI-5027] Replace hardcoded hbase config keys with constant variables (apache#6946) * [MINOR] add commit_action output in show_commits (apache#7012) Co-authored-by: 苏承祥 <sucx@tuya.com> * [HUDI-5061] bulk insert operation don't throw other exception except IOE Exception (apache#7001) Co-authored-by: liufangqi.chenfeng <liufangqi.chenfeng@BYTEDANCE.COM> * [MINOR] Skip loading last completed txn for single writer (apache#6660) Co-authored-by: sivabalan <n.siva.b@gmail.com> * [HUDI-4281] Using hudi to build a large number of tables in spark on hive causes OOM (apache#5903) * [HUDI-5042] Fix clustering schedule problem in flink when enable schedule clustering and disable async clustering (apache#6976) Co-authored-by: hbg <bingeng.huang@shopee.com> * [HUDI-4753] more accurate record size estimation for log writing and spillable map (apache#6632) * [HUDI-4201] Cli tool to get warned about empty non-completed instants from timeline (apache#6867) * [HUDI-5038] Increase default num_instants to fetch for incremental source (apache#6955) * [HUDI-5049] Supports dropPartition for Flink catalog (apache#6991) * for both dfs and hms catalogs * [HUDI-4809] glue support drop partitions (apache#7007) Co-authored-by: xxhua <xxhua@freewheel.tv> * [HUDI-5057] Fix msck repair hudi table (apache#6999) * [HUDI-4959] Fixing Avro's `Utf8` serialization in Kryo (apache#7024) * temp_view_support (apache#6990) Co-authored-by: 苏承祥 <sucx@tuya.com> * [HUDI-4982] Add Utilities and Utilities Slim + Spark Bundle testing to GH Actions (apache#7005) Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-5085]When a flink job has multiple sink tables, the index loading status is abnormal (apache#7051) * [HUDI-5089] Refactor HoodieCommitMetadata deserialization (apache#7055) * [HUDI-5058] Fix flink catalog read spark table error : primary key col can not be nullable (apache#7009) * [HUDI-5087] Fix incorrect merging sequence for Column Stats Record in `HoodieMetadataPayload` (apache#7053) * [HUDI-5087]Fix incorrect maxValue getting from metatable [HUDI-5087]Fix incorrect maxValue getting from metatable * Fixed `HoodieMetadataPayload` merging seq; Added test * Fixing handling of deletes; Added tests for handling deletes; * Added tests for combining partition files-list record Co-authored-by: Alexey Kudinkin <alexey@infinilake.com> * [HUDI-4946] fix merge into with no preCombineField having dup row by only insert (apache#6824) * [HUDI-5072] Extract `ExecutionStrategy#transform` duplicate code (apache#7030) * [HUDI-3287] Remove hudi-spark dependencies from hudi-kafka-connect-bundle (apache#6079) * [HUDI-5000] Support schema evolution for Hive/presto (apache#6989) Co-authored-by: z00484332 <zhaolong36@huawei.com> * [HUDI-4716] Avoid parquet-hadoop-bundle in hudi-hadoop-mr (apache#6930) * [HUDI-5035] Remove usage of deprecated HoodieTimer constructor (apache#6952) Co-authored-by: slfan1989 <louj1988@@> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-5083]Fixed a bug when schema evolution (apache#7045) * [HUDI-5102] source operator(monitor and reader) support user uid (apache#7085) * Update HoodieTableSource.java Co-authored-by: chenzhiming <chenzhm@chinatelecom.cn> * [HUDI-5057] Fix msck repair external hudi table (apache#7084) * [MINOR] Fix typos in Spark client related classes (apache#7083) * [HUDI-4741] hotfix to avoid partial failover cause restored subtask timeout (apache#6796) Co-authored-by: jian.feng <jian.feng@shopee.com> * [MINOR] use default maven version since it already fix the warnings recently (apache#6863) Co-authored-by: jian.feng <jian.feng@shopee.com> * Revert "[HUDI-4741] hotfix to avoid partial failover cause restored subtask timeout (apache#6796)" (apache#7090) This reverts commit e222693. * [MINOR] Fix doc of org.apache.hudi.sink.meta.CkpMetadata#bootstrap (apache#7048) Co-authored-by: xiaoxingstack <xiaoxingstack@didiglobal.com> * [HUDI-4799] improve analyzer exception tip when cannot resolve expression (apache#6625) * [HUDI-5096] Upgrade jcommander to 1.78 (apache#7068) - resolves security vulnerability - resolves NPE issues with HiveSyncTool args parsing Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-5105] Add Call show_commit_extra_metadata for spark sql (apache#7091) * [HUDI-5105] Add Call show_commit_extra_metadata for spark sql * remove pr compliance from open source * fix test issues * fix bad merge files * ignoring Spark3DDL tests, as they are failing in OSS master too against spark3.2 , scala2.12 * remove flakey test case * Update HoodieMultiTableCommitStatsManager when creating job info (apache#122) * Update HoodieMultiTableCommitStatsManager when creating job info * Tidying up Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> Co-authored-by: Yuwei XIAO <ywxiaozero@gmail.com> Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> Co-authored-by: Rahil C <32500120+rahil-c@users.noreply.github.com> Co-authored-by: Rahil Chertara <rchertar@amazon.com> Co-authored-by: Shawn Chang <42792772+CTTY@users.noreply.github.com> Co-authored-by: Shawn Chang <yxchang@amazon.com> Co-authored-by: Abhishek Modi <modi@makenotion.com> Co-authored-by: shuai.xu <chiggics@gmail.com> Co-authored-by: 徐帅 <xushuai@MacBook-Pro-6.local> Co-authored-by: YueZhang <69956021+zhangyue19921010@users.noreply.github.com> Co-authored-by: yuezhang <yuezhang@freewheel.tv> Co-authored-by: 董可伦 <dongkelun01@inspur.com> Co-authored-by: Angel Conde <neuw84@gmail.com> Co-authored-by: Angel Conde Manjon <acmanjon@amazon.com> Co-authored-by: FocusComputing <xiaoxingstack@gmail.com> Co-authored-by: xiaoxingstack <xiaoxingstack@didiglobal.com> Co-authored-by: eric9204 <90449228+eric9204@users.noreply.github.com> Co-authored-by: dongsj <dongsj@asiainfo.com> Co-authored-by: Alexey Kudinkin <alexey@infinilake.com> Co-authored-by: Manu <36392121+xicm@users.noreply.github.com> Co-authored-by: Volodymyr Burenin <vburenin@gmail.com> Co-authored-by: Volodymyr Burenin <volodymyr.burenin@cloudkitchens.com> Co-authored-by: wuwenchi <wuwenchihdu@hotmail.com> Co-authored-by: 吴文池 <wuwenchi@deepexi.com> Co-authored-by: luokey <loukey.j@gmail.com> Co-authored-by: Danny Chan <yuzhao.cyz@gmail.com> Co-authored-by: Sylwester Lachiewicz <slachiewicz@apache.org> Co-authored-by: komao <masterwangzx@gmail.com> Co-authored-by: KnightChess <981159963@qq.com> Co-authored-by: voonhous <voonhousu@gmail.com> Co-authored-by: 苏承祥 <scx_white@aliyun.com> Co-authored-by: 苏承祥 <sucx@tuya.com> Co-authored-by: 5herhom <543872547@qq.com> Co-authored-by: Jon Vexler <jon@onehouse.ai> Co-authored-by: simonsssu <barley0806@gmail.com> Co-authored-by: y0908105023 <283999377@qq.com> Co-authored-by: yangshuo3 <yangshuo3@kingsoft.com> Co-authored-by: 冯健 <fengjian428@gmail.com> Co-authored-by: jian.feng <jian.feng@shopee.com> Co-authored-by: Paul Zhang <xzhangyao@126.com> Co-authored-by: Kyle Zhike Chen <zk.chan007@gmail.com> Co-authored-by: Yann Byron <biyan900116@gmail.com> Co-authored-by: Shiyan Xu <2701446+xushiyan@users.noreply.github.com> Co-authored-by: dohongdayi <dohongdayi@126.com> Co-authored-by: shaoxiong.zhan <31836510+microbearz@users.noreply.github.com> Co-authored-by: zhanshaoxiong <shaoxiong0001@@gmail.com> Co-authored-by: Nicolas Paris <nicolas.paris@riseup.net> Co-authored-by: sivabalan <n.siva.b@gmail.com> Co-authored-by: RexAn <bonean131@gmail.com> Co-authored-by: ForwardXu <forwardxu315@gmail.com> Co-authored-by: wangxianghu <wangxianghu@apache.org> Co-authored-by: wulei <wulei.1023@bytedance.com> Co-authored-by: Xingjun Wang <wongxingjun@126.com> Co-authored-by: Prasanna Rajaperumal <prasanna.raj@live.com> Co-authored-by: xingjunwang <xingjunwang@tencent.com> Co-authored-by: liujinhui <965147871@qq.com> Co-authored-by: ChanKyeong Won <brightwon.dev@gmail.com> Co-authored-by: Zouxxyy <zouxxyy@qq.com> Co-authored-by: Nicholas Jiang <programgeek@163.com> Co-authored-by: Forus <70357858+Forus0322@users.noreply.github.com> Co-authored-by: TengHuo <teng_huo@outlook.com> Co-authored-by: hj2016 <hj3245459@163.com> Co-authored-by: huangjing02 <huangjing02@bilibili.com> Co-authored-by: jsbali <jsbali@uber.com> Co-authored-by: Leon Tsao <31072303+gnailJC@users.noreply.github.com> Co-authored-by: leon <leon@leondeMacBook-Pro.local> Co-authored-by: 申胜利 <48829688+shenshengli@users.noreply.github.com> Co-authored-by: aiden.dong <782112163@qq.com> Co-authored-by: dujunling <dujunling@bytedance.com> Co-authored-by: Pramod Biligiri <pramodbiligiri@gmail.com> Co-authored-by: Zouxxyy <zouxinyu.zxy@alibaba-inc.com> Co-authored-by: Alexey Kudinkin <alexey.kudinkin@gmail.com> Co-authored-by: Surya Prasanna <syalla@uber.com> Co-authored-by: Rajesh Mahindra <76502047+rmahindra123@users.noreply.github.com> Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local> Co-authored-by: huberylee <shibei.lh@foxmail.com> Co-authored-by: yuezhang <yuezhang@yuezhang-mac.freewheelmedia.net> Co-authored-by: slfan1989 <55643692+slfan1989@users.noreply.github.com> Co-authored-by: slfan1989 <louj1988@@> Co-authored-by: 吴祥平 <408317717@qq.com> Co-authored-by: wangzeyu <hameizi369@gmail.com> Co-authored-by: vvsd <40269480+vvsd@users.noreply.github.com> Co-authored-by: Zhaojing Yu <yuzhaojing@bytedance.com> Co-authored-by: Bingeng Huang <304979636@qq.com> Co-authored-by: hbg <bingeng.huang@shopee.com> Co-authored-by: that's cool <1059023054@qq.com> Co-authored-by: liufangqi.chenfeng <liufangqi.chenfeng@BYTEDANCE.COM> Co-authored-by: gavin <zhangrenhuaman@163.com> Co-authored-by: Jon Vexler <jbvexler@gmail.com> Co-authored-by: Xixi Hua <smilecrazy1h@gmail.com> Co-authored-by: xxhua <xxhua@freewheel.tv> Co-authored-by: YangXiao <919869387@qq.com> Co-authored-by: chao chen <59957056+waywtdcc@users.noreply.github.com> Co-authored-by: Zhangshunyu <zhangshunyu1990@126.com> Co-authored-by: Long Zhao <294514940@qq.com> Co-authored-by: z00484332 <zhaolong36@huawei.com> Co-authored-by: 矛始 <1032851561@qq.com> Co-authored-by: chenzhiming <chenzhm@chinatelecom.cn> Co-authored-by: lvhu-goodluck <81349721+lvhu-goodluck@users.noreply.github.com> Co-authored-by: harshal patil <harshal.j.patil@gmail.com> Co-authored-by: Vinish Reddy <vinishreddypannala@infinilake.com>

* [HUDI-4282] Repair IOException in CHDFS when check block corrupted in HoodieLogFileReader (apache#6031) Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-4757] Create pyspark examples (apache#6672) * [HUDI-3959] Rename class name for spark rdd reader (apache#5409) Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-4828] Fix the extraction of record keys which may be cut out (apache#6650) Co-authored-by: yangshuo3 <yangshuo3@kingsoft.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-4873] Report number of messages to be processed via metrics (apache#6271) Co-authored-by: Volodymyr Burenin <volodymyr.burenin@cloudkitchens.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-4870] Improve compaction config description (apache#6706) * [HUDI-3304] Support partial update payload (apache#4676) Co-authored-by: jian.feng <jian.feng@shopee.com> * [HUDI-4808] Fix HoodieSimpleBucketIndex not consider bucket num in lo… (apache#6630) * [HUDI-4808] Fix HoodieSimpleBucketIndex not consider bucket num in log file issue Co-authored-by: xiaoxingstack <xiaoxingstack@didiglobal.com> * [HUDI-4485] Bump spring shell to 2.1.1 in CLI (apache#6489) Bumped spring shell to 2.1.1 and updated the default value for show fsview all `pathRegex` parameter. * [minor] following 3304, some code refactoring (apache#6713) * [HUDI-4832] Fix drop partition meta sync (apache#6662) * [HUDI-4810] Fix log4j imports to use bridge API (apache#6710) Co-authored-by: dongsj <dongsj@asiainfo.com> * [HUDI-4877] Fix org.apache.hudi.index.bucket.TestHoodieSimpleBucketIndex#testTagLocation not work correct issue (apache#6717) Co-authored-by: xiaoxingstack <xiaoxingstack@didiglobal.com> * [HUDI-4326] add updateTableSerDeInfo for HiveSyncTool (apache#5920) - This pull request fix [SUPPORT] Hudi spark datasource error after migrate from 0.8 to 0.11 apache#5861* - The issue is caused by after changing the table to spark data source table, the table SerDeInfo is missing. * Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> * [MINOR] fix indent to make build pass (apache#6721) * [HUDI-3478] Implement CDC Write in Spark (apache#6697) * [HUDI-4326] Fix hive sync serde properties (apache#6722) * [HUDI-4875] Fix NoSuchTableException when dropping temporary view after applied HoodieSparkSessionExtension in Spark 3.2 (apache#6709) * [DOCS] Improve the quick start guide for Kafka Connect Sink (apache#6708) * [HUDI-4729] Fix file group pending compaction cannot be queried when query _ro table (apache#6516) File group in pending compaction can not be queried when query _ro table with spark. This commit fixes that. Co-authored-by: zhanshaoxiong <shaoxiong0001@@gmail.com> Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> * [HUDI-3983] Fix ClassNotFoundException when using hudi-spark-bundle to write table with hbase index (apache#6715) * [HUDI-4758] Add validations to java spark examples (apache#6615) * [HUDI-4792] Batch clean files to delete (apache#6580) This patch makes use of batch call to get fileGroup to delete during cleaning instead of 1 call per partition. This limit the number of call to the view and should fix the trouble with metadata table in context of lot of partitions. Fixes issue apache#6373 Co-authored-by: sivabalan <n.siva.b@gmail.com> * [HUDI-4363] Support Clustering row writer to improve performance (apache#6046) * [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data (apache#6734) * [HUDI-4851] Fixing handling of `UTF8String` w/in `InSet` operator (apache#6739) Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-3901] Correct the description of hoodie.index.type (apache#6749) * [MINOR] Add .mvn directory to gitignore (apache#6746) Co-authored-by: Rahil Chertara <rchertar@amazon.com> * add support for unraveling proto schemas * fix some compile issues * [HUDI-4901] Add avro.version to Flink profiles (apache#6757) * Add avro.version to Flink profiles Co-authored-by: Shawn Chang <yxchang@amazon.com> * [HUDI-4559] Support hiveSync command based on Call Produce Command (apache#6322) * [HUDI-4883] Supporting delete savepoint for MOR (apache#6744) Users could delete unnecessary savepoints and unblock archival for MOR table. * [HUDI-4897] Refactor the merge handle in CDC mode (apache#6740) * [HUDI-3523] Introduce AddColumnSchemaPostProcessor to support add columns to the end of a schema (apache#5031) * Revert "[HUDI-3523] Introduce AddColumnSchemaPostProcessor to support add columns to the end of a schema (apache#5031)" (apache#6768) This reverts commit 092375f. * [HUDI-3523] Introduce AddPrimitiveColumnSchemaPostProcessor to support add new primitive column to the end of a schema (apache#6769) * [HUDI-4903] Fix TestHoodieLogFormat`s minor typo (apache#6762) * [MINOR] Drastically reducing concurrency level (to avoid CI flakiness) (apache#6754) * Update HoodieIndex.java Fix a typo * [HUDI-4906] Fix the local tests for hudi-flink (apache#6763) * [HUDI-4899] Fixing compatibility w/ Spark 3.2.2 (apache#6755) * [HUDI-4892] Fix hudi-spark3-bundle (apache#6735) * [MINOR] Fix a few typos in HoodieIndex (apache#6784) Co-authored-by: xingjunwang <xingjunwang@tencent.com> * [HUDI-4412] Fix multi writer INSERT_OVERWRITE NPE bug (apache#6130) There are two minor issues fixed here: 1. When the insert_overwrite operation is performed, the clusteringPlan in the requestedReplaceMetadata will be null. Calling getFileIdsFromRequestedReplaceMetadata will cause NPE. 2. When insert_overwrite operation, inflightCommitMetadata!=null, getOperationType should be obtained from getHoodieInflightReplaceMetadata, the original code will have a null pointer. * [MINOR] retain avro's namespace (apache#6783) * [MINOR] Simple logging fix in LockManager (apache#6765) Co-authored-by: 苏承祥 <sucx@tuya.com> * [HUDI-4433] hudi-cli repair deduplicate not working with non-partitioned dataset (apache#6349) When using the repair deduplicate command with hudi-cli, there is no way to run it on the unpartitioned dataset, so modify the cli parameter. Co-authored-by: Xingjun Wang <wongxingjun@126.com> * [RFC-51][HUDI-3478] Update RFC: CDC support (apache#6256) * [HUDI-4915] improve avro serializer/deserializer (apache#6788) * [HUDI-3478] Implement CDC Read in Spark (apache#6727) * naming and style updates * [HUDI-4830] Fix testNoGlobalConfFileConfigured when add hudi-defaults.conf in default dir (apache#6652) * make test data random, reuse code * [HUDI-4760] Fixing repeated trigger of data file creations w/ clustering (apache#6561) - Apparently in clustering, data file creations are triggered twice since we don't cache the write status and for doing some validation, we do isEmpty on JavaRDD which ended up retriggering the action. Fixing the double de-referencing in this patch. * [HUDI-4914] Managed memory weight should be set when sort clustering is enabled (apache#6792) * [HUDI-4910] Fix unknown variable or type "Cast" (apache#6778) * [HUDI-4918] Fix bugs about when trying to show the non -existing key from env, NullPointException occurs. (apache#6794) * [HUDI-4718] Add Kerberos kinit command support. (apache#6719) * add test for 2 different recursion depths, fix schema cache key * add unsigned long support * better handle other types * rebase on 4904 * get all tests working * fix oneof expected schema, update tests after rebase * [HUDI-4902] Set default partitioner for SIMPLE BUCKET index (apache#6759) * [MINOR] Update PR template with documentation update (apache#6748) * revert scala binary change * try a different method to avoid avro version * [HUDI-4904] Add support for unraveling proto schemas in ProtoClassBasedSchemaProvider (apache#6761) If a user provides a recursive proto schema, it will fail when we write to parquet. We need to allow the user to specify how many levels of recursion they want before truncating the remaining data. Main changes to existing code: ProtoClassBasedSchemaProvider tracks number of times a message descriptor is seen within a branch of the schema traversal once the number of times that descriptor is seen exceeds the user provided limit, set the field to preset record that will contain two fields: 1) the remaining data serialized as a proto byte array, 2) the descriptors full name for context about what is in that byte array Converting from a proto to an avro now accounts for this truncation of the input * delete unused file * [HUDI-4907] Prevent single commit multi instant issue (apache#6766) Co-authored-by: TengHuo <teng_huo@outlook.com> Co-authored-by: yuzhao.cyz <yuzhao.cyz@gmail.com> * [HUDI-4923] Fix flaky TestHoodieReadClient.testReadFilterExistAfterBulkInsertPrepped (apache#6801) Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-4848] Fixing repair deprecated partition tool (apache#6731) * [HUDI-4913] Fix HoodieSnapshotExporter for writing to a different S3 bucket or FS (apache#6785) * address PR feedback, update decimal precision * fix isNullable issue, check if class is Int64value * checkstyle fix * change wrapper descriptor set initialization * add in testing for unsigned long to BigInteger conversion * [HUDI-4453] Fix schema to include partition columns in bootstrap operation (apache#6676) Turn off the type inference of the partition column to be consistent with existing behavior. Add notes around partition column type inference. * [HUDI-2780] Fix the issue of Mor log skipping complete blocks when reading data (apache#4015) Co-authored-by: huangjing02 <huangjing02@bilibili.com> Co-authored-by: sivabalan <n.siva.b@gmail.com> * [HUDI-4924] Auto-tune dedup parallelism (apache#6802) * [HUDI-4687] Avoid setAccessible which breaks strong encapsulation (apache#6657) Use JOL GraphLayout for estimating deep size. * [MINOR] fixing validate async operations to poll completed clean instances (apache#6814) * [HUDI-4734] Deltastreamer table config change validation (apache#6753) Co-authored-by: sivabalan <n.siva.b@gmail.com> * [HUDI-4934] Revert batch clean files (apache#6813) * Revert "[HUDI-4792] Batch clean files to delete (apache#6580)" This reverts commit cbf9b83. * [HUDI-4722] Added locking metrics for Hudi (apache#6502) * [HUDI-4936] Fix `as.of.instant` not recognized as hoodie config (apache#5616) Co-authored-by: leon <leon@leondeMacBook-Pro.local> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-4861] Relaxing `MERGE INTO` constraints to permit limited casting operations w/in matched-on conditions (apache#6820) * [HUDI-4885] Adding org.apache.avro to hudi-hive-sync bundle (apache#6729) * [HUDI-4951] Fix incorrect use of Long.getLong() (apache#6828) * [MINOR] Use base path URI in ITTestDataStreamWrite (apache#6826) * [HUDI-4308] READ_OPTIMIZED read mode will temporary loss of data when compaction (apache#6664) Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-4237] Fixing empty partition-values being sync'd to HMS (apache#6821) Co-authored-by: dujunling <dujunling@bytedance.com> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-4925] Should Force to use ExpressionPayload in MergeIntoTableCommand (apache#6355) Co-authored-by: jian.feng <jian.feng@shopee.com> * [HUDI-4850] Add incremental source from GCS to Hudi (apache#6665) Adds an incremental source from GCS based on a similar design as https://hudi.apache.org/blog/2021/08/23/s3-events-source * [HUDI-4957] Shade JOL in bundles to fix NoClassDefFoundError:GraphLayout (apache#6839) * [HUDI-4718] Add Kerberos kdestroy command support (apache#6810) * [HUDI-4916] Implement change log feed for Flink (apache#6840) * [HUDI-4769] Option read.streaming.skip_compaction skips delta commit (apache#6848) * [HUDI-4949] optimize cdc read to avoid the problem of reusing buffer underlying the Row (apache#6805) * [HUDI-4966] Add a partition extractor to handle partition values with slashes (apache#6851) * [MINOR] Fix testUpdateRejectForClustering (apache#6852) * [HUDI-4962] Move cloud dependencies to cloud modules (apache#6846) * [HOTFIX] Fix source release validate script (apache#6865) * [HUDI-4980] Calculate avg record size using commit only (apache#6864) Calculate average record size for Spark upsert partitioner based on commit instants only. Previously it's based on commit and replacecommit, of which the latter may be created by clustering which has inaccurately smaller average record sizes, which could result in OOM due to size underestimation. * shade protobuf dependency * Revert "[HUDI-4915] improve avro serializer/deserializer (apache#6788)" (apache#6809) This reverts commit 79b3e2b. * [HUDI-4970] Update kafka-connect readme and refactor HoodieConfig#create (apache#6857) * Enhancing README for multi-writer tests (apache#6870) * [MINOR] Fix deploy script for flink 1.15 (apache#6872) * [HUDI-4992] Fixing invalid min/max record key stats in Parquet metadata (apache#6883) * Revert "shade protobuf dependency" This reverts commit f03f961. * [HUDI-4972] Fixes to make unit tests work on m1 mac (apache#6751) * [HUDI-2786] Docker demo on mac aarch64 (apache#6859) * [HUDI-4971] Fix shading kryo-shaded with reusing configs (apache#6873) * [HUDI-3900] [UBER] Support log compaction action for MOR tables (apache#5958) - Adding log compaction support to MOR table. subsequent log blocks can now be compacted into larger log blocks without needing to go for full compaction (by merging w/ base file). - New timeline action is introduced for the purpose. Co-authored-by: sivabalan <n.siva.b@gmail.com> * Relocate apache http package (apache#6874) * [HUDI-4975] Fix datahub bundle dependency (apache#6896) * [HUDI-4999] Refactor FlinkOptions#allOptions and CatalogOptions#allOptions (apache#6901) * [MINOR] Update GitHub setting for merge button (apache#6922) Only allow squash and merge. Disable merge and rebase * [HUDI-4993] Make DataPlatform name and Dataset env configurable in DatahubSyncTool (apache#6885) * [MINOR] Fix name spelling for RunBootstrapProcedure * [HUDI-4754] Add compliance check in github actions (apache#6575) * [HUDI-4963] Extend InProcessLockProvider to support multiple table ingestion (apache#6847) Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local> * [HUDI-4994] Fix bug that prevents re-ingestion of soft-deleted Datahub entities (apache#6886) * Implement Create/Drop/Show/Refresh Secondary Index (apache#5933) * [MINOR] Moved readme from .github to the workflows folder (apache#6932) * [HUDI-4952] Fixing reading from metadata table when there are no inflight commits (apache#6836) * Fixing reading from metadata table when there are no inflight commits * Fixing reading from metadata if not fully built out * addressing minor comments * fixing sql conf and options interplay * addressing minor refactoring * [HUDI-1575][RFC-56] Early Conflict Detection For Multi-writer (apache#6003) Co-authored-by: yuezhang <yuezhang@yuezhang-mac.freewheelmedia.net> Co-authored-by: yuezhang <yuezhang@freewheel.tv> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-5006] Use the same wrapper for timestamp type metadata for parquet and log files (apache#6918) Before this patch, for timestamp type, we use LongWrapper for parquet and TimestampMicrosWrapper for avro log, they may keep different precision val here, for example, with timestamp(3), LongWrapper keeps the val as a millisecond long from EPOCH instant, while TimestampMicrosWrapper keeps the val as micro-seconds. For spark, it uses micro-seconds internally for timestamp type value, while flink uses the TimestampData internally, we better keeps the same precision for better compatibility here. * [HUDI-5016] Flink clustering does not reserve commit metadata (apache#6929) * [HUDI-3900] Fixing hdfs setup and tear down in tests to avoid flakiness (apache#6912) * [HUDI-5002] Remove deprecated API usage in SparkHoodieHBaseIndex#generateStatement (apache#6909) Co-authored-by: slfan1989 <louj1988@@> * [HUDI-5010] Fix flink hive catalog external config not work (apache#6923) * fix flink catalog external config not work * [HUDI-4948] Improve CDC Write (apache#6818) * improve cdc write to support multiple log files * update: use map to store the cdc stats * [HUDI-5030] Fix TestPartialUpdateAvroPayload.testUseLatestRecordMetaValue(apache#6948) * [HUDI-5033] Fix Broken Link In MultipleSparkJobExecutionStrategy (apache#6951) Co-authored-by: slfan1989 <louj1988@@> * [HUDI-5037] Upgrade org.apache.thrift:libthrift to 0.14.0 (apache#6941) * [MINOR] Fixing verbosity of docker set up (apache#6944) * [HUDI-5022] Make better error messages for pr compliance (apache#6934) * [HUDI-5003] Fix the type of InLineFileSystem`startOffset to long (apache#6916) * [HUDI-4855] Add missing table configs for bootstrap in Deltastreamer (apache#6694) * [MINOR] Handling null event time (apache#6876) * [MINOR] Update DOAP with 0.12.1 Release (apache#6988) * [MINOR] Increase maxParameters size in scalastyle (apache#6987) * [HUDI-3900] Closing resources in TestHoodieLogRecord (apache#6995) * [MINOR] Test case for hoodie.merge.allow.duplicate.on.inserts (apache#6949) * [HUDI-4982] Add validation job for spark bundles in GitHub Actions (apache#6954) * [HUDI-5041] Fix lock metric register confict error (apache#6968) Co-authored-by: hbg <bingeng.huang@shopee.com> * [HUDI-4998] Infer partition extractor class first from meta sync partition fields (apache#6899) * [HUDI-4781] Allow omit metadata fields for hive sync (apache#6471) Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-4997] Use jackson-v2 import instead of jackson-v1 (apache#6893) Co-authored-by: slfan1989 <louj1988@@> * [HUDI-3900] Fixing tempDir usage in TestHoodieLogFormat (apache#6981) * [HUDI-4995] Relocate httpcomponents (apache#6906) * [MINOR] Update GitHub setting for branch protection (apache#7008) - require at least 1 approving review * [HUDI-4960] Upgrade jetty version for timeline server (apache#6844) Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-5046] Support all the hive sync options for flink sql (apache#6985) * [MINOR] fix cdc flake ut (apache#7016) * [MINOR] Remove redundant space in PR compliance check (apache#7022) * [HUDI-5063] Enabling run time stats to be serialized with commit metadata (apache#7006) * [HUDI-5070] Adding lock provider to testCleaner tests since async cleaning is invoked (apache#7023) * [HUDI-5070] Move flaky cleaner tests to separate class (apache#7034) * [HUDI-4971] Remove direct use of kryo from `SerDeUtils` (apache#7014) Co-authored-by: Alexey Kudinkin <alexey@infinilake.com> * [HUDI-5081] Tests clean up in hudi-utilities (apache#7033) * [HUDI-5027] Replace hardcoded hbase config keys with constant variables (apache#6946) * [MINOR] add commit_action output in show_commits (apache#7012) Co-authored-by: 苏承祥 <sucx@tuya.com> * [HUDI-5061] bulk insert operation don't throw other exception except IOE Exception (apache#7001) Co-authored-by: liufangqi.chenfeng <liufangqi.chenfeng@BYTEDANCE.COM> * [MINOR] Skip loading last completed txn for single writer (apache#6660) Co-authored-by: sivabalan <n.siva.b@gmail.com> * [HUDI-4281] Using hudi to build a large number of tables in spark on hive causes OOM (apache#5903) * [HUDI-5042] Fix clustering schedule problem in flink when enable schedule clustering and disable async clustering (apache#6976) Co-authored-by: hbg <bingeng.huang@shopee.com> * [HUDI-4753] more accurate record size estimation for log writing and spillable map (apache#6632) * [HUDI-4201] Cli tool to get warned about empty non-completed instants from timeline (apache#6867) * [HUDI-5038] Increase default num_instants to fetch for incremental source (apache#6955) * [HUDI-5049] Supports dropPartition for Flink catalog (apache#6991) * for both dfs and hms catalogs * [HUDI-4809] glue support drop partitions (apache#7007) Co-authored-by: xxhua <xxhua@freewheel.tv> * [HUDI-5057] Fix msck repair hudi table (apache#6999) * [HUDI-4959] Fixing Avro's `Utf8` serialization in Kryo (apache#7024) * temp_view_support (apache#6990) Co-authored-by: 苏承祥 <sucx@tuya.com> * [HUDI-4982] Add Utilities and Utilities Slim + Spark Bundle testing to GH Actions (apache#7005) Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-5085]When a flink job has multiple sink tables, the index loading status is abnormal (apache#7051) * [HUDI-5089] Refactor HoodieCommitMetadata deserialization (apache#7055) * [HUDI-5058] Fix flink catalog read spark table error : primary key col can not be nullable (apache#7009) * [HUDI-5087] Fix incorrect merging sequence for Column Stats Record in `HoodieMetadataPayload` (apache#7053) * [HUDI-5087]Fix incorrect maxValue getting from metatable [HUDI-5087]Fix incorrect maxValue getting from metatable * Fixed `HoodieMetadataPayload` merging seq; Added test * Fixing handling of deletes; Added tests for handling deletes; * Added tests for combining partition files-list record Co-authored-by: Alexey Kudinkin <alexey@infinilake.com> * [HUDI-4946] fix merge into with no preCombineField having dup row by only insert (apache#6824) * [HUDI-5072] Extract `ExecutionStrategy#transform` duplicate code (apache#7030) * [HUDI-3287] Remove hudi-spark dependencies from hudi-kafka-connect-bundle (apache#6079) * [HUDI-5000] Support schema evolution for Hive/presto (apache#6989) Co-authored-by: z00484332 <zhaolong36@huawei.com> * [HUDI-4716] Avoid parquet-hadoop-bundle in hudi-hadoop-mr (apache#6930) * [HUDI-5035] Remove usage of deprecated HoodieTimer constructor (apache#6952) Co-authored-by: slfan1989 <louj1988@@> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-5083]Fixed a bug when schema evolution (apache#7045) * [HUDI-5102] source operator(monitor and reader) support user uid (apache#7085) * Update HoodieTableSource.java Co-authored-by: chenzhiming <chenzhm@chinatelecom.cn> * [HUDI-5057] Fix msck repair external hudi table (apache#7084) * [MINOR] Fix typos in Spark client related classes (apache#7083) * [HUDI-4741] hotfix to avoid partial failover cause restored subtask timeout (apache#6796) Co-authored-by: jian.feng <jian.feng@shopee.com> * [MINOR] use default maven version since it already fix the warnings recently (apache#6863) Co-authored-by: jian.feng <jian.feng@shopee.com> * Revert "[HUDI-4741] hotfix to avoid partial failover cause restored subtask timeout (apache#6796)" (apache#7090) This reverts commit e222693. * [MINOR] Fix doc of org.apache.hudi.sink.meta.CkpMetadata#bootstrap (apache#7048) Co-authored-by: xiaoxingstack <xiaoxingstack@didiglobal.com> * [HUDI-4799] improve analyzer exception tip when cannot resolve expression (apache#6625) * [HUDI-5096] Upgrade jcommander to 1.78 (apache#7068) - resolves security vulnerability - resolves NPE issues with HiveSyncTool args parsing Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-5105] Add Call show_commit_extra_metadata for spark sql (apache#7091) * [HUDI-5105] Add Call show_commit_extra_metadata for spark sql * [HUDI-5107] Fix hadoop config in DirectWriteMarkers, HoodieFlinkEngineContext and StreamerUtil are not consistent issue (apache#7094) Co-authored-by: xiaoxingstack <xiaoxingstack@didiglobal.com> * [MINOR] Fix OverwriteWithLatestAvroPayload full class name (apache#7096) * [HUDI-5074] Warn if table for metastore sync has capitals in it (apache#7077) Co-authored-by: Jonathan Vexler <=> * [HUDI-5124] Fix HoodieInternalRowFileWriter#canWrite error return tag. (apache#7107) Co-authored-by: slfan1989 <louj1988@@> * [MINOR] update commons-codec:commons-codec 1.4 to 1.13 (apache#6959) * [HUDI-5148] Claim RFC-63 for Index on Function and Logical Partitioning (apache#7114) * [HUDI-5065] Call close on SparkRDDWriteClient in HoodieCleaner (apache#7101) Co-authored-by: Jonathan Vexler <=> * [HUDI-4624] Implement Closable for S3EventsSource (apache#7086) Co-authored-by: Jonathan Vexler <=> * [HUDI-5045] Adding support to configure index type with integ tests (apache#6982) Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [HUDI-3963] Use Lock-Free Message Queue Disruptor Improving Hoodie Writing Efficiency (apache#5416) https://issues.apache.org/jira/browse/HUDI-3963 RFC design : apache#5567 Add Lock-Free executor to improve hoodie writing throughput and optimize execution efficiency. Disruptor linked: https://lmax-exchange.github.io/disruptor/user-guide/index.html#_introduction. Existing BoundedInMemory is the default. Users can enable on a need basis. Co-authored-by: yuezhang <yuezhang@freewheel.tv> * [HUDI-5076] Fixing non serializable path used in engineContext with metadata table intialization (apache#7036) * [HUDI-5032] Add archive to cli (apache#7076) Adding archiving capability to cli. Co-authored-by: Jonathan Vexler <=> * [HUDI-4880] Fix corrupted parquet file issue left over by cancelled compaction task (apache#6733) * [HUDI-5147] Flink data skipping doesn't work when HepPlanner calls copy()… (apache#7113) * [HUDI-5147] Flink data skipping doesn't work when HepPlanner calls copy() on HoodieTableSource * [MINOR] Fixing broken test (apache#7123) * [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table (apache#6741) * [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table * Update HiveAvroSerializer.java otherwise payload string type combine field will cause cast exception * [HUDI-5126] Delete duplicate configuration items PAYLOAD_CLASS_NAME (apache#7103) * [HUDI-4989] Fixing deltastreamer init failures (apache#6862) Fixing handling missing hoodie.properties * [MINOR] Fix flaky test in ITTestHoodieDataSource (apache#7134) * [HUDI-4071] Remove default value for mandatory record key field (apache#6681) * [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table (apache#7056) * sync `_hoodie_operation` meta field if changelog mode is enabled. * [MINOR] Removing spark2 scala12 combinations from readme (apache#7112) * [HUDI-5153] Fix the write token name resolution of cdc log file (apache#7128) * [HUDI-5066] Support flink hoodie source metaclient cache (apache#7017) * [HUDI-5132] Add hadoop-mr bundle validation (apache#7157) * [HUDI-2673] Add kafka connect bundle to validation test (apache#7131) * [HUDI-5082] Improve the cdc log file name format (apache#7042) * [HUDI-5154] Improve hudi-spark-client Lambada writing (apache#7127) Co-authored-by: slfan1989 <louj1988@@> * [HUDI-5178] Add Call show_table_properties for spark sql (apache#7161) * [HUDI-5067] Merge the columns stats of multiple log blocks from the same log file (apache#7018) * [HUDI-5025] Rollback failed with log file not found when rollOver in rollback process (apache#6939) * fix rollback file not found * [HUDI-4526] Improve spillableMapBasePath when disk directory is full (apache#6284) * [minor] Refactor the code for CkpMetadata (apache#7166) * [HUDI-5111] Improve integration test coverage (apache#7092) Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-5187] Remove the preCondition check of BucketAssigner assign state (apache#7170) * [HUDI-5145] Avoid starting HDFS in hudi-utilities tests (apache#7171) * [MINOR] Performance improvement of flink ITs with reused miniCluster (apache#7151) * implement MiniCluster extension compatible with junit5 * Make local build work * Delete files removed in OSS * Fix bug in testing * Upgrade to version release-v0.10.0 Co-authored-by: 5herhom <543872547@qq.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> Co-authored-by: Jon Vexler <jon@onehouse.ai> Co-authored-by: simonsssu <barley0806@gmail.com> Co-authored-by: y0908105023 <283999377@qq.com> Co-authored-by: yangshuo3 <yangshuo3@kingsoft.com> Co-authored-by: Volodymyr Burenin <vburenin@gmail.com> Co-authored-by: Volodymyr Burenin <volodymyr.burenin@cloudkitchens.com> Co-authored-by: 冯健 <fengjian428@gmail.com> Co-authored-by: jian.feng <jian.feng@shopee.com> Co-authored-by: FocusComputing <xiaoxingstack@gmail.com> Co-authored-by: xiaoxingstack <xiaoxingstack@didiglobal.com> Co-authored-by: Paul Zhang <xzhangyao@126.com> Co-authored-by: Danny Chan <yuzhao.cyz@gmail.com> Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> Co-authored-by: eric9204 <90449228+eric9204@users.noreply.github.com> Co-authored-by: dongsj <dongsj@asiainfo.com> Co-authored-by: Kyle Zhike Chen <zk.chan007@gmail.com> Co-authored-by: Yann Byron <biyan900116@gmail.com> Co-authored-by: Shiyan Xu <2701446+xushiyan@users.noreply.github.com> Co-authored-by: dohongdayi <dohongdayi@126.com> Co-authored-by: shaoxiong.zhan <31836510+microbearz@users.noreply.github.com> Co-authored-by: zhanshaoxiong <shaoxiong0001@@gmail.com> Co-authored-by: Manu <36392121+xicm@users.noreply.github.com> Co-authored-by: Nicolas Paris <nicolas.paris@riseup.net> Co-authored-by: sivabalan <n.siva.b@gmail.com> Co-authored-by: RexAn <bonean131@gmail.com> Co-authored-by: Alexey Kudinkin <alexey@infinilake.com> Co-authored-by: Rahil C <32500120+rahil-c@users.noreply.github.com> Co-authored-by: Rahil Chertara <rchertar@amazon.com> Co-authored-by: Timothy Brown <tim@onehouse.ai> Co-authored-by: Shawn Chang <42792772+CTTY@users.noreply.github.com> Co-authored-by: Shawn Chang <yxchang@amazon.com> Co-authored-by: ForwardXu <forwardxu315@gmail.com> Co-authored-by: wangxianghu <wangxianghu@apache.org> Co-authored-by: wulei <wulei.1023@bytedance.com> Co-authored-by: Xingjun Wang <wongxingjun@126.com> Co-authored-by: Prasanna Rajaperumal <prasanna.raj@live.com> Co-authored-by: xingjunwang <xingjunwang@tencent.com> Co-authored-by: liujinhui <965147871@qq.com> Co-authored-by: 苏承祥 <scx_white@aliyun.com> Co-authored-by: 苏承祥 <sucx@tuya.com> Co-authored-by: ChanKyeong Won <brightwon.dev@gmail.com> Co-authored-by: Zouxxyy <zouxxyy@qq.com> Co-authored-by: Nicholas Jiang <programgeek@163.com> Co-authored-by: KnightChess <981159963@qq.com> Co-authored-by: Forus <70357858+Forus0322@users.noreply.github.com> Co-authored-by: voonhous <voonhousu@gmail.com> Co-authored-by: TengHuo <teng_huo@outlook.com> Co-authored-by: hj2016 <hj3245459@163.com> Co-authored-by: huangjing02 <huangjing02@bilibili.com> Co-authored-by: jsbali <jsbali@uber.com> Co-authored-by: Leon Tsao <31072303+gnailJC@users.noreply.github.com> Co-authored-by: leon <leon@leondeMacBook-Pro.local> Co-authored-by: 申胜利 <48829688+shenshengli@users.noreply.github.com> Co-authored-by: aiden.dong <782112163@qq.com> Co-authored-by: dujunling <dujunling@bytedance.com> Co-authored-by: Pramod Biligiri <pramodbiligiri@gmail.com> Co-authored-by: Zouxxyy <zouxinyu.zxy@alibaba-inc.com> Co-authored-by: Alexey Kudinkin <alexey.kudinkin@gmail.com> Co-authored-by: Surya Prasanna <syalla@uber.com> Co-authored-by: Rajesh Mahindra <76502047+rmahindra123@users.noreply.github.com> Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local> Co-authored-by: huberylee <shibei.lh@foxmail.com> Co-authored-by: YueZhang <69956021+zhangyue19921010@users.noreply.github.com> Co-authored-by: yuezhang <yuezhang@yuezhang-mac.freewheelmedia.net> Co-authored-by: yuezhang <yuezhang@freewheel.tv> Co-authored-by: slfan1989 <55643692+slfan1989@users.noreply.github.com> Co-authored-by: slfan1989 <louj1988@@> Co-authored-by: 吴祥平 <408317717@qq.com> Co-authored-by: wangzeyu <hameizi369@gmail.com> Co-authored-by: vvsd <40269480+vvsd@users.noreply.github.com> Co-authored-by: Zhaojing Yu <yuzhaojing@bytedance.com> Co-authored-by: Bingeng Huang <304979636@qq.com> Co-authored-by: hbg <bingeng.huang@shopee.com> Co-authored-by: that's cool <1059023054@qq.com> Co-authored-by: liufangqi.chenfeng <liufangqi.chenfeng@BYTEDANCE.COM> Co-authored-by: Yuwei XIAO <ywxiaozero@gmail.com> Co-authored-by: gavin <zhangrenhuaman@163.com> Co-authored-by: Jon Vexler <jbvexler@gmail.com> Co-authored-by: Xixi Hua <smilecrazy1h@gmail.com> Co-authored-by: xxhua <xxhua@freewheel.tv> Co-authored-by: YangXiao <919869387@qq.com> Co-authored-by: chao chen <59957056+waywtdcc@users.noreply.github.com> Co-authored-by: Zhangshunyu <zhangshunyu1990@126.com> Co-authored-by: Long Zhao <294514940@qq.com> Co-authored-by: z00484332 <zhaolong36@huawei.com> Co-authored-by: 矛始 <1032851561@qq.com> Co-authored-by: chenzhiming <chenzhm@chinatelecom.cn> Co-authored-by: lvhu-goodluck <81349721+lvhu-goodluck@users.noreply.github.com> Co-authored-by: alberic <cnuliuweiren@gmail.com> Co-authored-by: lxxyyds <114218541+lxxawfl@users.noreply.github.com> Co-authored-by: Alexander Trushev <42293632+trushev@users.noreply.github.com> Co-authored-by: xiarixiaoyao <mengtao0326@qq.com> Co-authored-by: windWheel <1817802738@qq.com> Co-authored-by: Alexander Trushev <trushev.alex@gmail.com> Co-authored-by: Shizhi Chen <107476116+chenshzh@users.noreply.github.com>

paul8263 force-pushed the HUDI-4485 branch 5 times, most recently from a6e48d3 to a0e2f52 Compare August 26, 2022 01:38

paul8263 force-pushed the HUDI-4485 branch from a0e2f52 to ee8c930 Compare August 26, 2022 06:46

nsivabalan requested a review from codope August 28, 2022 00:10

paul8263 force-pushed the HUDI-4485 branch from ee8c930 to 22757de Compare August 29, 2022 01:01

paul8263 force-pushed the HUDI-4485 branch from 22757de to 00fdd9c Compare September 2, 2022 06:24

paul8263 force-pushed the HUDI-4485 branch 4 times, most recently from 2528206 to 61586bd Compare September 5, 2022 02:12

paul8263 force-pushed the HUDI-4485 branch 2 times, most recently from 8aa3985 to e019b79 Compare September 5, 2022 09:04

yihua added dependencies Pull requests that update a dependency file cli priority:critical production down; pipelines stalled; Need help asap. labels Sep 6, 2022

yihua assigned codope Sep 6, 2022

apache deleted a comment from hudi-bot Sep 6, 2022

paul8263 force-pushed the HUDI-4485 branch 2 times, most recently from 8b67443 to 82f1001 Compare September 6, 2022 07:58

paul8263 force-pushed the HUDI-4485 branch from 82f1001 to 3ae4fb8 Compare September 7, 2022 06:32

paul8263 force-pushed the HUDI-4485 branch from 252b9f4 to 3388d2e Compare September 9, 2022 07:54

codope reviewed Sep 9, 2022

View reviewed changes

paul8263 force-pushed the HUDI-4485 branch 2 times, most recently from 543b108 to 7ea1f72 Compare September 13, 2022 08:59

rahil-c reviewed Sep 16, 2022

View reviewed changes

paul8263 force-pushed the HUDI-4485 branch from 7ea1f72 to 7f4ac87 Compare September 16, 2022 02:09

codope approved these changes Sep 16, 2022

View reviewed changes

yihua mentioned this pull request Sep 17, 2022

[HUDI-4433] Hudi-CLI repair deduplicate not working with non-partitio… #6349

Merged

4 tasks

[HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default v…

7ea965d

…alue for show fsview all pathRegex parameter.

paul8263 force-pushed the HUDI-4485 branch from 7f4ac87 to 7ea965d Compare September 19, 2022 03:01

codope approved these changes Sep 19, 2022

View reviewed changes

codope merged commit c0eae6d into apache:master Sep 19, 2022

yuzhaojing pushed a commit to yuzhaojing/hudi that referenced this pull request Sep 26, 2022

[HUDI-4485] Bump spring shell to 2.1.1 in CLI (apache#6489)

29b214f

Bumped spring shell to 2.1.1 and updated the default value for show fsview all `pathRegex` parameter.

yuzhaojing pushed a commit to yuzhaojing/hudi that referenced this pull request Sep 29, 2022

[HUDI-4485] Bump spring shell to 2.1.1 in CLI (apache#6489)

1438bf4

Bumped spring shell to 2.1.1 and updated the default value for show fsview all `pathRegex` parameter.

yuzhaojing pushed a commit that referenced this pull request Sep 29, 2022

[HUDI-4485] Bump spring shell to 2.1.1 in CLI (#6489)

62e1192

Bumped spring shell to 2.1.1 and updated the default value for show fsview all `pathRegex` parameter.

xushiyan reviewed Oct 4, 2022

View reviewed changes

TengHuo pushed a commit to TengHuo/hudi that referenced this pull request Nov 28, 2022

[HUDI-4485] Bump spring shell to 2.1.1 in CLI (apache#6489)

9a68495

Bumped spring shell to 2.1.1 and updated the default value for show fsview all `pathRegex` parameter.

fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Apr 5, 2023

[HUDI-4485] Bump spring shell to 2.1.1 in CLI (apache#6489)

f72dd75

Bumped spring shell to 2.1.1 and updated the default value for show fsview all `pathRegex` parameter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default … #6489

[HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default … #6489

paul8263 commented Aug 25, 2022 •

edited

Loading

paul8263 commented Aug 26, 2022

paul8263 commented Aug 29, 2022

paul8263 commented Sep 2, 2022

paul8263 commented Sep 2, 2022

paul8263 commented Sep 5, 2022

paul8263 commented Sep 7, 2022

paul8263 commented Sep 8, 2022 •

edited

Loading

paul8263 commented Sep 9, 2022

paul8263 commented Sep 9, 2022 •

edited

Loading

codope left a comment •

edited

Loading

codope Sep 9, 2022

paul8263 Sep 13, 2022

codope Sep 9, 2022

paul8263 Sep 13, 2022

codope Sep 9, 2022

paul8263 Sep 13, 2022

paul8263 commented Sep 13, 2022

yihua commented Sep 14, 2022

rahil-c commented Sep 15, 2022

rahil-c Sep 16, 2022

paul8263 Sep 16, 2022

rahil-c Sep 16, 2022

paul8263 commented Sep 16, 2022

codope left a comment •

edited

Loading

paul8263 commented Sep 19, 2022 •

edited

Loading

hudi-bot commented Sep 19, 2022

codope left a comment

xushiyan Oct 4, 2022

xushiyan Oct 4, 2022

[HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default … #6489

[HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default … #6489

Conversation

paul8263 commented Aug 25, 2022 • edited Loading

Change Logs

Impact

Contributor's checklist

paul8263 commented Aug 26, 2022

paul8263 commented Aug 29, 2022

paul8263 commented Sep 2, 2022

paul8263 commented Sep 2, 2022

paul8263 commented Sep 5, 2022

paul8263 commented Sep 7, 2022

paul8263 commented Sep 8, 2022 • edited Loading

paul8263 commented Sep 9, 2022

paul8263 commented Sep 9, 2022 • edited Loading

codope left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paul8263 commented Sep 13, 2022

yihua commented Sep 14, 2022

rahil-c commented Sep 15, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paul8263 commented Sep 16, 2022

codope left a comment • edited Loading

Choose a reason for hiding this comment

paul8263 commented Sep 19, 2022 • edited Loading

hudi-bot commented Sep 19, 2022

CI report:

codope left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paul8263 commented Aug 25, 2022 •

edited

Loading

paul8263 commented Sep 8, 2022 •

edited

Loading

paul8263 commented Sep 9, 2022 •

edited

Loading

codope left a comment •

edited

Loading

codope left a comment •

edited

Loading

paul8263 commented Sep 19, 2022 •

edited

Loading