[HUDI-4132] Fixing determining target table schema for delta sync with empty batch #5648

nsivabalan · 2022-05-20T16:12:12Z

What is the purpose of the pull request

When empty batch has been pulled in from source in deltastreamer and if commit has to be done so as to move the checkpoint, we need to pull the schema from table to get the latest table schema. But if table itself is empty, we were running into issues. This patch fixes that.

Brief change log

Fixed DeltaStreamer to set the right schema with HoodieWriteConfig when empty batch is being committed. Fix is to fetch table schema and set it.

Verify this pull request

TestHoodieDeltaStreamer.testParquetDFSSourceForEmptyBatch

Committer checklist

Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

hudi-bot · 2022-05-23T17:39:11Z

CI report:

fa11e50 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

…h empty batch (apache#5648)

* [HUDI-3972] Fixing hoodie.properties/tableConfig for no preCombine field with writes (apache#5424) Fixed instantiation of new table to set the null for preCombine if not explicitly set by the user. * [HUDI-3478] Claim RFC 51 For CDC (apache#5437) * [MINOR] Update alter rename command class type for pattern matching (apache#5381) * [HUDI-3977] Flink hudi table with date type partition path throws HoodieNotSupportedException (apache#5432) * Claim RFC 52 for Introduce Secondary Index to Improve HUDI Query Performance (apache#5441) * [HUDI-3945] After the async compaction operation is complete, the task should exit. (apache#5391) Co-authored-by: y00617041 <yangxuan42@huawei.com> * [HUDI-3815] Fix docs description of metadata.compaction.delta_commits default value error (apache#5368) Co-authored-by: pusheng.li01 <pusheng.li01@liulishuo.com> * [HUDI-3943] Some description fixes for 0.10.1 docs (apache#5447) * [MINOR] support different cleaning policy for flink (apache#5459) * [HUDI-3758] Fix duplicate fileId error in MOR table type with flink bucket hash Index (apache#5185) * fix duplicate fileId with bucket Index * replace to load FileGroup from FileSystemView * [MINOR] Fix CI by ignoring SparkContext error (apache#5468) Sets spark.driver.allowMultipleContexts = true when constructing Spark conf in UtilHelpers * [HUDI-3862] Fix default configurations of HoodieHBaseIndexConfig (apache#5308) Co-authored-by: xicm <xicm@asiainfo.com> * [HUDI-3978] Fix use of partition path field as hive partition field in flink (apache#5434) * Fix partition path fields as hive sync partition fields error * [MINOR] Update DOAP for release 0.11.0 (apache#5467) * [HUDI-3211][RFC-44] Add RFC for Hudi Connector for Presto (apache#4563) * Add RFC doc Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> * Add note regarding catalog naming Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> * [MINOR] Update RFC status (apache#5486) * [HUDI-4005] Update release scripts to help validation (apache#5479) * [HUDI-4031] Avoid clustering update handling when no pending replacecommit (apache#5487) * [HUDI-3667] Run unit tests of hudi-integ-tests in CI (apache#5078) * [MINOR] Optimize code logic (apache#5499) * [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor exit gracefully (apache#4264) * [HUDI-4042] Support truncate-partition for Spark-3.2 (apache#5506) * [HUDI-4017] Improve spark sql coverage in CI (apache#5512) Add GitHub actions tasks to run spark sql UTs under spark 3.1 and 3.2. * [HUDI-3675] Adding post write termination strategy to deltastreamer continuous mode (apache#5073) - Added a postWriteTerminationStrategy to deltastreamer continuous mode. One can enable by setting the appropriate termination strategy using DeltastreamerConfig.postWriteTerminationStrategyClass. If not, continuous mode is expected to run forever. - Added one concrete impl for termination strategy as NoNewDataTerminationStrategy which shuts down deltastreamer if there is no new data to consume from source for N consecutive rounds. * [HUDI-3849] AvroDeserializer supports AVRO_REBASE_MODE_IN_READ configuration (apache#5287) * [MINOR] Fixing class not found when using flink and enable metadata table (apache#5527) * [MINOR] fixing flaky tests in deltastreamer tests (apache#5521) * [HUDI-4055]refactor ratelimiter to avoid stack overflow (apache#5530) * [MINOR] Fixing close for HoodieCatalog's test (apache#5531) * [MINOR] Fixing close for HoodieCatalog's test * [HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOpti… (apache#5526) * [HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOptimized Co-authored-by: xicm <xicm@asiainfo.com> * [HUDI-3995] Making perf optimizations for bulk insert row writer path (apache#5462) - Avoid using udf for key generator for SimpleKeyGen and NonPartitionedKeyGen. - Fixed NonPartitioned Key generator to directly fetch record key from row rather than involving GenericRecord. - Other minor fixes around using static values instead of looking up hashmap. * [HUDI-4044] When reading data from flink-hudi to external storage, the … (apache#5516) Co-authored-by: aliceyyan <aliceyyan@tencent.com> * [HUDI-4003] Try to read all the log file to parse schema (apache#5473) * [HUDI-4038] Avoid calling `getDataSize` after every record written (apache#5497) - getDataSize has non-trivial overhead in the current ParquetWriter impl, requiring traversal of already composed Column Groups in memory. Instead we can sample these calls to getDataSize to amortize its cost. Co-authored-by: sivabalan <n.siva.b@gmail.com> * [HUDI-4079] Supports showing table comment for hudi with spark3 (apache#5546) * [HUDI-4085] Fixing flakiness with parquet empty batch tests in TestHoodieDeltaStreamer (apache#5559) * [HUDI-3963][Claim RFC number 53] Use Lock-Free Message Queue Improving Hoodie Writing Efficiency. (apache#5562) Co-authored-by: yuezhang <yuezhang@freewheel.tv> * [HUDI-4018][HUDI-4027] Adding integ test yamls for immutable use-cases. Added delete partition support to integ tests (apache#5501) - Added pure immutable test yamls to integ test framework. Added SparkBulkInsertNode as part of it. - Added delete_partition support to integ test framework using spark-datasource. - Added a single yaml to test all non core write operations (insert overwrite, insert overwrite table and delete partitions) - Added tests for 4 concurrent spark datasource writers (multi-writer tests). - Fixed readme w/ sample commands for multi-writer. * [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (apache#5528) * [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink * [MINOR] Fix a NPE for Option (apache#5461) * [HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compact… (apache#5545) * [HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compaction files * [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (apache#5574) * [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink * [HUDI-4072] Fix NULL schema for empty batches in deltastreamer (apache#5543) * [HUDI-4097] add table info to jobStatus (apache#5529) Co-authored-by: wqwl611 <wqwl611@gmail.com> * [HUDI-3980] Suport kerberos hbase index (apache#5464) - Add configurations in HoodieHBaseIndexConfig.java to support kerberos hbase connection. Co-authored-by: xicm <xicm@asiainfo.com> * [HUDI-4001] Filter the properties should not be used when create table for Spark SQL (apache#5495) * fix hive sync no partition table error (apache#5585) * [HUDI-3123] consistent hashing index: basic write path (upsert/insert) (apache#4480) 1. basic write path(insert/upsert) implementation 2. adapt simple bucket index * [HUDI-4098] Metadata table heartbeat for instant has expired, last heartbeat 0 (apache#5583) * [HUDI-4103] [HUDI-4001] Filter the properties should not be used when create table for Spark SQL * [HUDI-3654] Preparations for hudi metastore. (apache#5572) * [HUDI-3654] Preparations for hudi metastore. Co-authored-by: gengxiaoyu <gengxiaoyu@bytedance.com> * [HUDI-4104] DeltaWriteProfile includes the pending compaction file slice when deciding small buckets (apache#5594) * [HUDI-4101] BucketIndexPartitioner should take partition path for better dispersion (apache#5590) * [HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand (apache#5564) * [HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand * Set hoodie.query.as.ro.table in serde properties * [HUDI-4110] Clean the marker files for flink compaction (apache#5604) * [MINOR] Fixing spark long running yaml for non-partitioned (apache#5607) * [minor] Some code refactoring for LogFileComparator and Instant instantiation (apache#5600) * [HUDI-4109] Copy the old record directly when it is chosen for merging (apache#5603) * Clean the marker files for flink compaction (apache#5611) Co-authored-by: 854194341@qq.com <loukey_7821> * [HUDI-3942] [RFC-50] Improve Timeline Server (apache#5392) * [HUDI-4111] Bump ANTLR runtime version in Spark 3.x (apache#5606) * Revert "[HUDI-3870] Add timeout rollback for flink online compaction (apache#5314)" (apache#5622) This reverts commit 6f9b02d. * [HUDI-4116] Unify clustering/compaction related procedures' output type (apache#5620) * Unify clustering/compaction related procedures' output type * Address review comments * [HUDI-4114] Remove the unnecessary fs view sync for BaseWriteClient#initTable (apache#5617) No need to #sync actively because the table instance is instantiated freshly, its view manager has empty fiew instantces, the fs view would be synced lazily when is it requested. * [HUDI-4119] the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi (apache#5626) * HUDI-4119 the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi Co-authored-by: aliceyyan <aliceyyan@tencent.com> * [HUDI-4130] Remove the upgrade/downgrade for flink #initTable (apache#5642) * [HUDI-3985] Refactor DLASyncTool to support read hoodie table as spark datasource table (apache#5532) * [MINOR] Minor fixes to exception log and removing unwanted metrics flush in integ test (apache#5646) * [HUDI-4122] Fix NPE caused by adding kafka nodes (apache#5632) * [MINOR] remove unused gson test dependency (apache#5652) * [HUDI-3858] Shade javax.servlet for Spark bundle jar (apache#5295) Co-authored-by: yuezhang <yuezhang@freewheel.tv> * [HUDI-4100] CTAS failed to clean up when given an illegal MANAGED table definition (apache#5588) * [HUDI-3890] fix rat plugin issue with sql files (apache#5644) * [HUDI-4051] Allow nested field as primary key and preCombineField in spark sql (apache#5517) * [HUDI-4051] Allow nested field as preCombineField in spark sql * relax validation for primary key * [HUDI-4129] Initializes a new fs view for WriteProfile#reload (apache#5640) Co-authored-by: zhangyuang <zhangyuang@corp.netease.com> * [HUDI-4142] Claim RFC-54 for new table APIs (apache#5665) * [HUDI-3933] Add UT cases to cover different key gen (apache#5638) * [MINOR] Removing redundant semicolons and line breaks (apache#5662) * [HUDI-4134] Fix Method naming consistency issues in FSUtils (apache#5655) * [HUDI-4084] Add support to test async table services with integ test suite framework (apache#5557) * Add support to test async table services with integ test suite framework * Make await time for validation configurable * [HUDI-4138] Fix the concurrency modification of hoodie table config for flink (apache#5660) * Remove the metadata cleaning strategy for flink, that means the multi-modal index may be affected * Improve the HoodieTable#clearMetadataTablePartitionsConfig to only update table config when necessary * Remove the modification of read code path in HoodieTableConfig * [HUDI-2473] Fixing compaction write operation in commit metadata (apache#5203) * [HUDI-4145] Archives the metadata file in HoodieInstant.State sequence (apache#5669) * [HUDI-4135] remove netty and netty-all (apache#5663) * [HUDI-2207] Support independent flink hudi clustering function * [HUDI-4132] Fixing determining target table schema for delta sync with empty batch (apache#5648) * [MINOR] Fix a potential NPE and some finer points of hudi cli (apache#5656) * [HUDI-4146] Claim RFC-55 for Improve Hive/Meta sync class design and hierachies (apache#5682) * [HUDI-3193] Decouple hudi-aws from hudi-client-common (apache#5666) Move HoodieMetricsCloudWatchConfig to hudi-client-common Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com> Co-authored-by: Yann Byron <biyan900116@gmail.com> Co-authored-by: KnightChess <981159963@qq.com> Co-authored-by: Danny Chan <yuzhao.cyz@gmail.com> Co-authored-by: huberylee <shibei.lh@foxmail.com> Co-authored-by: watermelon12138 <49849410+watermelon12138@users.noreply.github.com> Co-authored-by: y00617041 <yangxuan42@huawei.com> Co-authored-by: Ibson <pushengli@163.com> Co-authored-by: pusheng.li01 <pusheng.li01@liulishuo.com> Co-authored-by: LiChuang <64473732+CodeCooker17@users.noreply.github.com> Co-authored-by: Gary Li <yanjia.gary.li@gmail.com> Co-authored-by: 吴祥平 <408317717@qq.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> Co-authored-by: xicm <36392121+xicm@users.noreply.github.com> Co-authored-by: xicm <xicm@asiainfo.com> Co-authored-by: Wangyh <763941163@qq.com> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> Co-authored-by: Todd Gao <todd.gao.2013@gmail.com> Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> Co-authored-by: qianchutao <72595723+qianchutao@users.noreply.github.com> Co-authored-by: guanziyue <30882822+guanziyue@users.noreply.github.com> Co-authored-by: Jin Xing <jinxing.corey@gmail.com> Co-authored-by: cxzl25 <cxzl25@users.noreply.github.com> Co-authored-by: BruceLin <brucekellan@gmail.com> Co-authored-by: ForwardXu <forwardxu315@gmail.com> Co-authored-by: aliceyyan <104287562+aliceyyan@users.noreply.github.com> Co-authored-by: aliceyyan <aliceyyan@tencent.com> Co-authored-by: Lanyuanxiaoyao <lanyuanxiaoyao@gmail.com> Co-authored-by: Alexey Kudinkin <alexey@infinilake.com> Co-authored-by: YueZhang <69956021+zhangyue19921010@users.noreply.github.com> Co-authored-by: yuezhang <yuezhang@freewheel.tv> Co-authored-by: Bo Cui <cuibo0108@163.com> Co-authored-by: Xingcan Cui <xcui@wealthsimple.com> Co-authored-by: wqwl611 <67826098+wqwl611@users.noreply.github.com> Co-authored-by: wqwl611 <wqwl611@gmail.com> Co-authored-by: 董可伦 <dongkelun01@inspur.com> Co-authored-by: 陈浩 <bettermouse94@gmail.com> Co-authored-by: Yuwei XIAO <ywxiaozero@gmail.com> Co-authored-by: Shawy Geng <gengxiaoyu1996@gmail.com> Co-authored-by: gengxiaoyu <gengxiaoyu@bytedance.com> Co-authored-by: luokey <854194341@qq.com> Co-authored-by: Zhaojing Yu <yuzhaojing@bytedance.com> Co-authored-by: wangxianghu <wangxianghu@apache.org> Co-authored-by: uday08bce <uday08bce@gmail.com> Co-authored-by: YuangZhang <z_yuang@foxmail.com> Co-authored-by: zhangyuang <zhangyuang@corp.netease.com> Co-authored-by: felixYyu <felix2003@live.cn> Co-authored-by: Heap <35054152+h1ap@users.noreply.github.com> Co-authored-by: liujinhui <965147871@qq.com> Co-authored-by: luoyajun <luoyajun1010@gmail.com> Co-authored-by: 冯健 <fengjian428@gmail.com> Co-authored-by: Rajesh Mahindra <rmahindra@gmail.com>

…che#37) * [MINOR] Update alter rename command class type for pattern matching (apache#5381) * [HUDI-3977] Flink hudi table with date type partition path throws HoodieNotSupportedException (apache#5432) * Claim RFC 52 for Introduce Secondary Index to Improve HUDI Query Performance (apache#5441) * [HUDI-3945] After the async compaction operation is complete, the task should exit. (apache#5391) Co-authored-by: y00617041 <yangxuan42@huawei.com> * [HUDI-3815] Fix docs description of metadata.compaction.delta_commits default value error (apache#5368) Co-authored-by: pusheng.li01 <pusheng.li01@liulishuo.com> * [HUDI-3943] Some description fixes for 0.10.1 docs (apache#5447) * [MINOR] support different cleaning policy for flink (apache#5459) * [HUDI-3758] Fix duplicate fileId error in MOR table type with flink bucket hash Index (apache#5185) * fix duplicate fileId with bucket Index * replace to load FileGroup from FileSystemView * [MINOR] Fix CI by ignoring SparkContext error (apache#5468) Sets spark.driver.allowMultipleContexts = true when constructing Spark conf in UtilHelpers * [HUDI-3862] Fix default configurations of HoodieHBaseIndexConfig (apache#5308) Co-authored-by: xicm <xicm@asiainfo.com> * [HUDI-3978] Fix use of partition path field as hive partition field in flink (apache#5434) * Fix partition path fields as hive sync partition fields error * [MINOR] Update DOAP for release 0.11.0 (apache#5467) * [HUDI-3211][RFC-44] Add RFC for Hudi Connector for Presto (apache#4563) * Add RFC doc Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> * Add note regarding catalog naming Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> * [MINOR] Update RFC status (apache#5486) * [HUDI-4005] Update release scripts to help validation (apache#5479) * [HUDI-4031] Avoid clustering update handling when no pending replacecommit (apache#5487) * [HUDI-3667] Run unit tests of hudi-integ-tests in CI (apache#5078) * [MINOR] Optimize code logic (apache#5499) * [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor exit gracefully (apache#4264) * [HUDI-4042] Support truncate-partition for Spark-3.2 (apache#5506) * [HUDI-4017] Improve spark sql coverage in CI (apache#5512) Add GitHub actions tasks to run spark sql UTs under spark 3.1 and 3.2. * [HUDI-3675] Adding post write termination strategy to deltastreamer continuous mode (apache#5073) - Added a postWriteTerminationStrategy to deltastreamer continuous mode. One can enable by setting the appropriate termination strategy using DeltastreamerConfig.postWriteTerminationStrategyClass. If not, continuous mode is expected to run forever. - Added one concrete impl for termination strategy as NoNewDataTerminationStrategy which shuts down deltastreamer if there is no new data to consume from source for N consecutive rounds. * [HUDI-3849] AvroDeserializer supports AVRO_REBASE_MODE_IN_READ configuration (apache#5287) * [MINOR] Fixing class not found when using flink and enable metadata table (apache#5527) * [MINOR] fixing flaky tests in deltastreamer tests (apache#5521) * [HUDI-4055]refactor ratelimiter to avoid stack overflow (apache#5530) * [MINOR] Fixing close for HoodieCatalog's test (apache#5531) * [MINOR] Fixing close for HoodieCatalog's test * [HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOpti… (apache#5526) * [HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOptimized Co-authored-by: xicm <xicm@asiainfo.com> * [HUDI-3995] Making perf optimizations for bulk insert row writer path (apache#5462) - Avoid using udf for key generator for SimpleKeyGen and NonPartitionedKeyGen. - Fixed NonPartitioned Key generator to directly fetch record key from row rather than involving GenericRecord. - Other minor fixes around using static values instead of looking up hashmap. * [HUDI-4044] When reading data from flink-hudi to external storage, the … (apache#5516) Co-authored-by: aliceyyan <aliceyyan@tencent.com> * [HUDI-4003] Try to read all the log file to parse schema (apache#5473) * [HUDI-4038] Avoid calling `getDataSize` after every record written (apache#5497) - getDataSize has non-trivial overhead in the current ParquetWriter impl, requiring traversal of already composed Column Groups in memory. Instead we can sample these calls to getDataSize to amortize its cost. Co-authored-by: sivabalan <n.siva.b@gmail.com> * [HUDI-4079] Supports showing table comment for hudi with spark3 (apache#5546) * [HUDI-4085] Fixing flakiness with parquet empty batch tests in TestHoodieDeltaStreamer (apache#5559) * [HUDI-3963][Claim RFC number 53] Use Lock-Free Message Queue Improving Hoodie Writing Efficiency. (apache#5562) Co-authored-by: yuezhang <yuezhang@freewheel.tv> * [HUDI-4018][HUDI-4027] Adding integ test yamls for immutable use-cases. Added delete partition support to integ tests (apache#5501) - Added pure immutable test yamls to integ test framework. Added SparkBulkInsertNode as part of it. - Added delete_partition support to integ test framework using spark-datasource. - Added a single yaml to test all non core write operations (insert overwrite, insert overwrite table and delete partitions) - Added tests for 4 concurrent spark datasource writers (multi-writer tests). - Fixed readme w/ sample commands for multi-writer. * [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (apache#5528) * [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink * [MINOR] Fix a NPE for Option (apache#5461) * [HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compact… (apache#5545) * [HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compaction files * [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (apache#5574) * [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink * [HUDI-4072] Fix NULL schema for empty batches in deltastreamer (apache#5543) * [HUDI-4097] add table info to jobStatus (apache#5529) Co-authored-by: wqwl611 <wqwl611@gmail.com> * [HUDI-3980] Suport kerberos hbase index (apache#5464) - Add configurations in HoodieHBaseIndexConfig.java to support kerberos hbase connection. Co-authored-by: xicm <xicm@asiainfo.com> * [HUDI-4001] Filter the properties should not be used when create table for Spark SQL (apache#5495) * fix hive sync no partition table error (apache#5585) * [HUDI-3123] consistent hashing index: basic write path (upsert/insert) (apache#4480) 1. basic write path(insert/upsert) implementation 2. adapt simple bucket index * [HUDI-4098] Metadata table heartbeat for instant has expired, last heartbeat 0 (apache#5583) * [HUDI-4103] [HUDI-4001] Filter the properties should not be used when create table for Spark SQL * [HUDI-3654] Preparations for hudi metastore. (apache#5572) * [HUDI-3654] Preparations for hudi metastore. Co-authored-by: gengxiaoyu <gengxiaoyu@bytedance.com> * [HUDI-4104] DeltaWriteProfile includes the pending compaction file slice when deciding small buckets (apache#5594) * [HUDI-4101] BucketIndexPartitioner should take partition path for better dispersion (apache#5590) * [HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand (apache#5564) * [HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand * Set hoodie.query.as.ro.table in serde properties * [HUDI-4110] Clean the marker files for flink compaction (apache#5604) * [MINOR] Fixing spark long running yaml for non-partitioned (apache#5607) * [minor] Some code refactoring for LogFileComparator and Instant instantiation (apache#5600) * [HUDI-4109] Copy the old record directly when it is chosen for merging (apache#5603) * Clean the marker files for flink compaction (apache#5611) Co-authored-by: 854194341@qq.com <loukey_7821> * [HUDI-3942] [RFC-50] Improve Timeline Server (apache#5392) * [HUDI-4111] Bump ANTLR runtime version in Spark 3.x (apache#5606) * Revert "[HUDI-3870] Add timeout rollback for flink online compaction (apache#5314)" (apache#5622) This reverts commit 6f9b02d. * [HUDI-4116] Unify clustering/compaction related procedures' output type (apache#5620) * Unify clustering/compaction related procedures' output type * Address review comments * [HUDI-4114] Remove the unnecessary fs view sync for BaseWriteClient#initTable (apache#5617) No need to #sync actively because the table instance is instantiated freshly, its view manager has empty fiew instantces, the fs view would be synced lazily when is it requested. * [HUDI-4119] the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi (apache#5626) * HUDI-4119 the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi Co-authored-by: aliceyyan <aliceyyan@tencent.com> * [HUDI-4130] Remove the upgrade/downgrade for flink #initTable (apache#5642) * [HUDI-3985] Refactor DLASyncTool to support read hoodie table as spark datasource table (apache#5532) * [MINOR] Minor fixes to exception log and removing unwanted metrics flush in integ test (apache#5646) * [HUDI-4122] Fix NPE caused by adding kafka nodes (apache#5632) * [MINOR] remove unused gson test dependency (apache#5652) * [HUDI-3858] Shade javax.servlet for Spark bundle jar (apache#5295) Co-authored-by: yuezhang <yuezhang@freewheel.tv> * [HUDI-4100] CTAS failed to clean up when given an illegal MANAGED table definition (apache#5588) * [HUDI-3890] fix rat plugin issue with sql files (apache#5644) * [HUDI-4051] Allow nested field as primary key and preCombineField in spark sql (apache#5517) * [HUDI-4051] Allow nested field as preCombineField in spark sql * relax validation for primary key * [HUDI-4129] Initializes a new fs view for WriteProfile#reload (apache#5640) Co-authored-by: zhangyuang <zhangyuang@corp.netease.com> * [HUDI-4142] Claim RFC-54 for new table APIs (apache#5665) * [HUDI-3933] Add UT cases to cover different key gen (apache#5638) * [MINOR] Removing redundant semicolons and line breaks (apache#5662) * [HUDI-4134] Fix Method naming consistency issues in FSUtils (apache#5655) * [HUDI-4084] Add support to test async table services with integ test suite framework (apache#5557) * Add support to test async table services with integ test suite framework * Make await time for validation configurable * [HUDI-4138] Fix the concurrency modification of hoodie table config for flink (apache#5660) * Remove the metadata cleaning strategy for flink, that means the multi-modal index may be affected * Improve the HoodieTable#clearMetadataTablePartitionsConfig to only update table config when necessary * Remove the modification of read code path in HoodieTableConfig * [HUDI-2473] Fixing compaction write operation in commit metadata (apache#5203) * [HUDI-4145] Archives the metadata file in HoodieInstant.State sequence (apache#5669) * [HUDI-4135] remove netty and netty-all (apache#5663) * [HUDI-2207] Support independent flink hudi clustering function * [HUDI-4132] Fixing determining target table schema for delta sync with empty batch (apache#5648) * [MINOR] Fix a potential NPE and some finer points of hudi cli (apache#5656) * [HUDI-4146] Claim RFC-55 for Improve Hive/Meta sync class design and hierachies (apache#5682) * [HUDI-3193] Decouple hudi-aws from hudi-client-common (apache#5666) Move HoodieMetricsCloudWatchConfig to hudi-client-common * [HUDI-4145] Archives the metadata file in HoodieInstant.State sequence (part2) (apache#5676) * [HUDI-4040] Bulk insert Support CustomColumnsSortPartitioner with Row (apache#5502) * Along the lines of RDDCustomColumnsSortPartitioner but for Row * [HUDI-4023] Decouple hudi-spark from hudi-utilities-slim-bundle (apache#5641) * [HUDI-4124] Add valid check in Spark Datasource configs (apache#5637) Co-authored-by: wangzixuan.wzxuan <wangzixuan.wzxuan@bytedance.com> * [HUDI-3963][RFC-53] Use Lock-Free Message Queue Disruptor Improving Hoodie Writing Efficiency (apache#5567) Co-authored-by: yuezhang <yuezhang@freewheel.tv> * [HUDI-4162] Fixed some constant mapping issues. (apache#5700) Co-authored-by: y00617041 <yangxuan42@huawei.com> * [HUDI-4161] Make sure partition values are taken from partition path (apache#5699) * [MINOR] Fix the issue when handling conf hoodie.datasource.write.operation=bulk_insert in sql mode (apache#5679) Co-authored-by: Rex An <bonean131@gmail.com> * [HUDI-4151] flink split_reader supports rocksdb (apache#5675) * [HUDI-4151] flink split_reader supports rocksdb * [HUDI-4160] Make database regex of MaxwellJsonKafkaSourcePostProcessor optional (apache#5697) * [MINOR] Fix Hive and meta sync config for sql statement (apache#5316) * [HUDI-4166] Added SimpleClient plugin for integ test (apache#5710) * [HUDI-3551] Add the Oracle Cloud Infrastructure (oci) Object Storage URI scheme (apache#4952) * [HUDI-3551] Fix testStorageSchemes for oci storage (apache#5711) * [HUDI-4086] Use CustomizedThreadFactory in async compaction and clustering (apache#5563) Co-authored-by: 苏承祥 <sucx@tuya.com> * [HUDI-4163] Catch general exception instead of IOException while fetching rollback plan during rollback (apache#5703) If the avro file is corrupted, an InvalidAvroMagicException throws. * [HUDI-4149] Drop-Table fails when underlying table directory is broken (apache#5672) * [HUDI-4107] Added --sync-tool-classes config option in HoodieMultiTableDeltaStreamer (apache#5597) * added --sync-tool-classes config option in multitable delta streamer * added a testcase to assert if syncClientToolClassNames is getting picked to the deltastreamer execution context * [HUDI-4174] Add hive conf dir option for flink sink (apache#5725) * [HUDI-4011] Add hudi-aws-bundle (apache#5674) Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-3670] free temp views in sql transformers (apache#5080) * [HUDI-4167] Remove the timeline refresh with initializing hoodie table (apache#5716) The timeline refresh on table initialization invokes the fs view #sync, which has two actions now: 1. reload the timeline of the fs view, so that the next fs view request is based on this timeline metadata 2. if this is a local fs view, clear all the local states; if this is a remote fs view, send request to sync the remote fs view But, let's see the construction, the meta client is instantiated freshly so the timeline is already the latest, the table is also constructed freshly, so the fs view has no local states, that means, the #sync is unnecessary totally. In this patch, the metadata lifecycle and data set fs view are kept in sync, when the fs view is refreshed, the underneath metadata is also refreshed synchronouly. The freshness of the metadata follows the same rules as data fs view: 1. if the fs view is local, the visibility is based on the client table metadata client's latest commit 2. if the fs view is remote, the timeline server would #sync the fs view and metadata together based on the lagging server local timeline From the perspective of client, no need to care about the refresh action anymore no matter whether the metadata table is enabled or not. That make the client logic more clear and less error-prone. Removes the timeline refresh has another benefit: if avoids unncecessary #refresh of the remote fs view, if all the clients send request to #sync the remote fs view, the server would encounter conflicts and the client encounters a response error. * [HUDI-4179] Cluster with sort cloumns invalid (apache#5739) * [HUDI-4183] Fix using HoodieCatalog to create non-hudi tables (apache#5743) * [HUDI-4187] Fix partition order in aws glue sync (apache#5731) * [HUDI-4168] Add Call Procedure for marker deletion (apache#5738) * Add Call Procedure for marker deletion * [HUDI-4190] Include hbase-protocol for shading in the bundles (apache#5750) * [HUDI-4192] HoodieHFileReader scan top cells after bottom cells throw NullPointerException (apache#5755) SeekTo top cells avoid NullPointerException * [HUDI-4188] Fix flaky ITTestDataSTreamWrite.testWriteCopyOnWrite (apache#5749) * [HUDI-4195] Bulk insert should use right keygen for non-partitioned table (apache#5759) * [HUDI-4101] When BucketIndexPartitioner take partition path for dispersion may cause the fileID of the task to not be loaded correctly (apache#5763) Co-authored-by: john.wick <john.wick@vipshop.com> * [HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata` parsing (apache#5733) As has been outlined in HUDI-4176, we've hit a roadblock while testing Hudi on a large dataset (~1Tb) having pretty fat commits where Hudi's commit metadata could reach into 100s of Mbs. Given the size some of ours commit metadata instances Spark's parsing and resolving phase (when spark.sql(...) is involved, but before returned Dataset is dereferenced) starts to dominate some of our queries' execution time. - Rebased onto new APIs to avoid excessive Hadoop's Path allocations - Eliminated hasOperationField completely to avoid repeatitive computations - Cleaning up duplication in HoodieActiveTimeline - Added caching for common instances of HoodieCommitMetadata - Made tableStructSchema lazy; * [HUDI-4140] Fixing hive style partitioning and default partition with bulk insert row writer with SimpleKeyGen and virtual keys (apache#5664) Bulk insert row writer code path had a gap wrt hive style partitioning and default partition when virtual keys are enabled with SimpleKeyGen. This patch fixes the issue. * [HUDI-4197] Fix Async indexer to support building FILES partition (apache#5766) - When async indexer is invoked only with "FILES" partition, it fails. Fixing it to work with Async indexer. Also, if metadata table itself is not initialized, and if someone is looking to build indexes via AsyncIndexer, first they are expected to index "FILES" partition followed by other partitions. In general, we have a limitation of building only one index at a time w/ AsyncIndexer and hence. Have added guards to ensure these conditions are met. * [HUDI-4171] Fixing Non partitioned with virtual keys in read path (apache#5747) - When Non partitioned key gen is used with virtual keys, read path could break since partition path may not exist. * [MINOR] Mark AWSGlueCatalogSyncClient experimental (apache#5775) * [MINOR][RFC-53] Fix typos (apache#5764) Co-authored-by: yuezhang <yuezhang@freewheel.tv> * [HUDI-4200] Fixing sorting of keys fetched from metadata table (apache#5773) - Key fetched from metadata table especially from base file reader is not sorted. and hence may result in throwing NPE (key prefix search) or unnecessary seeks to starting of Hfile (full key look ups). Fixing the same in this patch. This is not an issue with log blocks, since sorting is taking care within HoodieHfileDataBlock. - Commit where the sorting was mistakenly reverted [HUDI-3760] Adding capability to fetch Metadata Records by prefix apache#5208 * [HUDI-4198] Fix hive config for AWSGlueClientFactory (apache#5768) * HiveConf needs to load fs conf to allow instantiation via AWSGlueClientFactory * Resolve metastore uri config before loading fs conf * Skip hiveql due to CI issue Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> * [HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration (apache#5737) There are multiple issues with our current DataSource V2 integrations: b/c we advertise Hudi tables as V2, Spark expects it to implement certain APIs which are not implemented at the moment, instead we're using custom Resolution rule (in HoodieSpark3Analysis) to instead manually fallback to V1 APIs. This commit fixes the issue by reverting DSv2 APIs and making Spark use V1, except for schema evaluation logic. * [MINOR][DOCS] Update the README.md file in hudi-examples (apache#5803) * [MINOR] FlinkStateBackendConverter add more exception message (apache#5809) * [MINOR] FlinkStateBackendConverter add more exception message * [HUDI-4213] Infer keygen clazz for Spark SQL (apache#5815) * [HUDI-4139]improvement for flink write operator name to identify tables easily (apache#5744) Co-authored-by: yanenze <yanenze@keytop.com.cn> * [HUDI-3889] Do not validate table config if save mode is set to Overwrite (apache#5619) Co-authored-by: xicm <xicm@asiainfo.com> * [HUDI-4221] Fixing getAllPartitionPaths perf hit w/ FileSystemBackedMetadata (apache#5829) * [HUDI-4223] Fix NullPointerException from getLogRecordScanner when reading metadata table (apache#5840) When explicitly specifying the metadata table path for reading in spark, the "hoodie.metadata.enable" is overwritten to true for proper read behavior. * [HUDI-4205] Fix NullPointerException in HFile reader creation (apache#5841) Replace SerializableConfiguration with SerializableWritable for broadcasting the hadoop configuration before initializing HFile readers * [HUDI-4224] Fix CI issues (apache#5842) - Upgrade junit to 5.7.2 - Downgrade surefire and failsafe to 2.22.2 - Fix test failures that were previously not reported - Improve azure pipeline configs Co-authored-by: liujinhui1994 <965147871@qq.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> * [MINOR] fix AvroSchemaConverter duplicate branch in 'switch' (apache#5813) * Strip extra spaces when creating new configuration (apache#5849) Co-authored-by: superche <superche@tencent.com> * [HUDI-3682] testReaderFilterRowKeys fails in TestHoodieOrcReaderWriter (apache#5790) TestReaderFilterRowKeys needs to get the key from RECORD_KEY_METADATA_FIELD, but the writer in current UT does not populate the meta field and the schema does not contains meta fields. This fix writes data with schema which contains meta fields and calls writeAvroWithMetadata for writing. Co-authored-by: xicm <xicm@asiainfo.com> * [HUDI-3863] Add UT for drop partition column in deltastreamer testsuite (apache#5727) * [HUDI-4006] failOnDataLoss on delta-streamer kafka sources (apache#5718) add new config key hoodie.deltastreamer.source.kafka.enable.failOnDataLoss when failOnDataLoss=false (current behaviour, the default), log a warning instead of seeking to earliest silently when failOnDataLoss is set, fail explicitly * [HUDI-4207] HoodieFlinkWriteClient.getOrCreateWriteHandle throws an e… (apache#5788) Adding more logs to assist in debugging with HoodieFlinkWriteClient.getOrCreateWriteHandle throwing exception * [MINOR] Fix typo of DisruptorExecutor in RFC 53 (apache#5860) * [minor] Following HUDI-4207, remote the new wrapper #init method (apache#5865) * [HUDI-4255] Make the flink merge and replace handle intermediate file visible (apache#5866) * [HUDI-3499] Add Call Procedure for show rollbacks (apache#5848) * Add Call Procedure for show rollbacks * fix * add ut for show_rollback_detail and exception handle Co-authored-by: superche <superche@tencent.com> * [HUDI-4218] [HUDI-4218] Expose the real exception information when an exception occurs in the tableExists method (apache#5827) * [HUDI-4217] improve repeat init object in ExpressionPayload (apache#5825) * [HUDI-4214] improve repeat init write schema in ExpressionPayload (apache#5820) * [HUDI-4214] improve repeat init write schema in ExpressionPayload * [HUDI-4265] Deprecate useless targetTableName parameter in HoodieMultiTableDeltaStreamer (apache#5883) * [HUDI-4165] Support Create/Drop/Show/Refresh Index Syntax for Spark SQL (apache#5761) * Support Create/Drop/Show/Refresh Index Syntax for Spark SQL * [HUDI-3507] Support export command based on Call Produce Command (apache#5901) * [HUDI-4275] Refactor rollback inflight instant for clustering/compaction to reuse some code (apache#5894) * [MINOR] Add "spillable_map_path" in FlinkCompactionConfig. To avoid the disk space of "/tmp" full when compacting offline. (apache#5905) * [HUDI-4277] supoort flink table source with computed column (apache#5897) Co-authored-by: chenshizhi <chenshizhi@bilibili.com> * fix remove redundant Variable (apache#5806) * [HUDI-4259] Flink create avro schema not conformance to standards (apache#5878) * flink create avro schema not conformance to standards Co-authored-by: 854194341@qq.com <loukey_7821> * [HUDI-4258] Fix when HoodieTable removes data file before the end of Flink job (apache#5876) * [HUDI-4258] Fix when HoodieTable removes data file before the end of Flink job * [MINOR] Update DOAP with 0.11.1 Release (apache#5908) * [HUDI-4173] Fix wrong results if the user read no base files hudi table by glob paths (apache#5723) * [HUDI-4251] Fix the problem that the command 'commits sync' description does not match. (apache#5881) * [HUDI-4177] Fix hudi-cli rollback with rollbackUsingMarkers method call (apache#5734) * Fix hudi-cli rollback with rollbackUsingMarkers method call * Add test for hudi-cli rollbackUsingMarkers Co-authored-by: Shawn Chang <yxchang@amazon.com> * [HUDI-4270] Bootstrap op data loading missing (apache#5888) * [HUDI-3475] Initialize hudi table management module. * udate * Revert master (apache#5925) * Revert "udate" This reverts commit 092e35c. * Revert "[HUDI-3475] Initialize hudi table management module." This reverts commit 4640a3b. * [HUDI-4279] Strength the remote fs view lagging check when latest commit refresh is enabled (apache#5917) Signed-off-by: LinMingQiang <1356469429@qq.com> * [minor] following 4270, add unit tests for the keys lost case (apache#5918) * [HUDI-3508] Add call procedure for FileSystemViewCommand (apache#5929) * [HUDI-3508] Add call procedure for FileSystemView * minor Co-authored-by: jiimmyzhan <jiimmyzhan@tencent.com> * [HUDI-4299] Fix problem about hudi-example-java run failed on idea. (apache#5936) * [HUDI-4290] Fix fetchLatestBaseFiles to filter replaced filegroups (apache#5941) * [HUDI-4290] Fix fetchLatestBaseFiles to filter replaced filegroups * Separate out incremental sync fsview test with clustering * [HUDI-3509] Add call procedure for HoodieLogFileCommand (apache#5949) Co-authored-by: zhanshaoxiong <jiimmyzhan@tencent.com> * [HUDI-4273] Support inline schedule clustering for Flink stream (apache#5890) * [HUDI-4273] Support inline schedule clustering for Flink stream * delete deprecated clustering plan strategy and add clustering ITTest * [HUDI-3735] TestHoodieSparkMergeOnReadTableRollback is flaky (apache#5874) * [HUDI-4260] Change KEYGEN_CLASS_NAME without default value (apache#5877) * Change KEYGEN_CLASS_NAME without default value Co-authored-by: 854194341@qq.com <loukey_7821> * [HUDI-3512] Add call procedure for StatsCommand (apache#5955) Co-authored-by: zhanshaoxiong <shaoxiong0001@@gmail.com> * [TEST][DO_NOT_MERGE]fix random failed for ci (apache#5948) * Revert "[TEST][DO_NOT_MERGE]fix random failed for ci (apache#5948)" (apache#5971) This reverts commit e8fbd4d. * [HUDI-4319] Fixed Parquet's `PLAIN_DICTIONARY` encoding not being applied when bulk-inserting (apache#5966) * Fixed Dictionary encoding config not being properly propagated to Parquet writer (making it unable to apply it, substantially bloating the storage footprint) * [HUDI-4296] Fix the bug that TestHoodieSparkSqlWriter.testSchemaEvolutionForTableType is flaky (apache#5973) * [HUDI-3502] Support hdfs parquet import command based on Call Produce Command (apache#5956) * [MINOR] Remove -T option from CI build (apache#5972) * [HUDI-5246] Bumping mysql connector version due to security vulnerability (apache#5851) * [HUDI-4309] Spark3.2 custom parser should not throw exception (apache#5947) * [HUDI-4316] Support for spillable diskmap configuration when constructing HoodieMergedLogRecordScanner (apache#5959) * [HUDI-4315] Do not throw exception in BaseSpark3Adapter#toTableIdentifier (apache#5957) * [HUDI-3504] Support bootstrap command based on Call Produce Command (apache#5977) * [HUDI-4311] Fix Flink lose data on some rollback scene (apache#5950) * [HUDI-4291] Fix flaky TestCleanPlanExecutor#testKeepLatestFileVersions (apache#5930) * [HUDI-3506] Add call procedure for CommitsCommand (apache#5974) * [HUDI-3506] Add call procedure for CommitsCommand Co-authored-by: superche <superche@tencent.com> * [HUDI-4325] fix spark sql procedure cause ParseException with semicolon (apache#5982) * [HUDI-4325] fix saprk sql procedure cause ParseException with semicolon * [HUDI-4333] fix HoodieFileIndex's listFiles method log print skipping percent NaN (apache#5990) * [HUDI-4332] The current instant may be wrong under some extreme conditions in AppendWriteFunction. (apache#5988) * [HUDI-4320] Make sure `HoodieStorageConfig.PARQUET_WRITE_LEGACY_FORMAT_ENABLED` could be specified by the writer (apache#5970) Fixed sequence determining whether Parquet's legacy-format writing property should be overridden to only kick in when it has not been explicitly specified by the caller * [HUDI-1176] Upgrade hudi to log4j2 (apache#5366) * Move to log4j2 cr: https://code.amazon.com/reviews/CR-71010705 * Upgrade unit tests to log4j2 * update exclusion Co-authored-by: Brandon Scheller <bschelle@amazon.com> * [HUDI-4334] close SparkRDDWriteClient after usage in Create/Delete/RollbackSavepointsProcedure (apache#5994) * [HUDI-1575] Claim RFC-56: Early Conflict Detection For Multi-writer (apache#6002) Co-authored-by: yuezhang <yuezhang@yuezhang-mac.freewheelmedia.net> * [MINOR] Make CLI 'commit rollback' using rollbackUsingMarkers false as default (apache#5174) Co-authored-by: yuezhang <yuezhang@freewheel.tv> * [HUDI-4331] Allow loading external config file from class loader (apache#5987) Co-authored-by: Wenning Ding <wenningd@amazon.com> * [HUDI-4336] Fix records overwritten bug with binary primary key (apache#5996) * [MINOR] Following apache#2070, Fix BindException when running tests on shared machines. (apache#5951) * [HUDI-4346] Fix params not update BULKINSERT_ARE_PARTITIONER_RECORDS_SORTED (apache#5999) * [HUDI-4285] add ByteBuffer#rewind after ByteBuffer#get in AvroDeseria… (apache#5907) * [HUDI-4285] add ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer * add ut Co-authored-by: wangzixuan.wzxuan <wangzixuan.wzxuan@bytedance.com> * [HUDI-3984] Remove mandatory check of partiton path for cli command (apache#5458) * [HUDI-3634] Could read empty or partial HoodieCommitMetaData in downstream if using HDFS (apache#5048) Add the differentiated logic of creating immutable file in HDFS by first creating the file.tmp and then renaming the file * [HUDI-3953]Flink Hudi module should support low-level source and sink api (apache#5445) Co-authored-by: jerryyue <jerryyue@didiglobal.com> * [HUDI-4353] Column stats data skipping for flink (apache#6026) * [HUDI-3505] Add call procedure for UpgradeOrDowngradeCommand (apache#6012) Co-authored-by: superche <superche@tencent.com> * [HUDI-3730] Improve meta sync class design and hierarchies (apache#5854) * [HUDI-3730] Improve meta sync class design and hierarchies (apache#5754) * Implements class design proposed in RFC-55 Co-authored-by: jian.feng <fengjian428@gmial.com> Co-authored-by: jian.feng <jian.feng@shopee.com> * [HUDI-3511] Add call procedure for MetadataCommand (apache#6018) * [HUDI-3730] Add ConfigTool#toMap UT (apache#6035) Co-authored-by: voonhou.su <voonhou.su@shopee.com> * [MINOR] Improve variable names (apache#6039) * [HUDI-3116]Add a new HoodieDropPartitionsTool to let users drop table partitions through a standalone job. (apache#4459) Co-authored-by: yuezhang <yuezhang@freewheel.tv> * [HUDI-4360] Fix HoodieDropPartitionsTool based on refactored meta sync (apache#6043) * [HUDI-3836] Improve the way of fetching metadata partitions from table (apache#5286) Co-authored-by: xicm <xicm@asiainfo.com> * [HUDI-4359] Support show_fs_path_detail command on Call Produce Command (apache#6042) * [HUDI-4356] Fix the error when sync hive in CTAS (apache#6029) * [HUDI-4219] Merge Into when update expression "col=s.col+2" on precombine cause exception (apache#5828) * [HUDI-4357] Support flink 1.15.x (apache#6050) * [HUDI-4152] Flink offline compaction support compacting multi compaction plan at once (apache#5677) * [HUDI-4152] Flink offline compaction allow compact multi compaction plan at once * [HUDI-4152] Fix exception for duplicated uid when multi compaction plan are compacted * [HUDI-4152] Provider UT & IT for compact multi compaction plan * [HUDI-4152] Put multi compaction plans into one compaction plan source * [HUDI-4152] InstantCompactionPlanSelectStrategy allow multi instant by using comma * [HUDI-4152] Add IT for InstantCompactionPlanSelectStrategy * [HUDI-4309] fix spark32 repartition error (apache#6033) * [HUDI-4366] Synchronous cleaning for flink bounded source (apache#6051) * [minor] following 4152, refactor the clazz about plan selection strategy (apache#6060) * [HUDI-4367] Support copyToTable on call (apache#6054) * [HUDI-4335] Bug fixes in AWSGlueCatalogSyncClient post schema evolution. (apache#5995) * fix for updateTableParameters which is not excluding partition columns and updateTableProperties boolean check * Fix - serde parameters getting overrided on table property update * removing stale syncConfig * [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields (apache#6017) * [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields. * fix comments Co-authored-by: public (bdcee5037027) <mengtao0326@qq.com> * [HUDI-3500] Add call procedure for RepairsCommand (apache#6053) * [HUDI-2150] Rename/Restructure configs for better modularity (apache#6061) - Move clean related configuration to HoodieCleanConfig - Move Archival related configuration to HoodieArchivalConfig - hoodie.compaction.payload.class move this to HoodiePayloadConfig * [MINOR] Bump xalan from 2.7.1 to 2.7.2 (apache#6062) Bumps xalan from 2.7.1 to 2.7.2. --- updated-dependencies: - dependency-name: xalan:xalan dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [HUDI-4324] Remove use_jdbc config from hudi sync (apache#6072) * [HUDI-4324] Remove use_jdbc config from hudi sync * Users should use HIVE_SYNC_MODE instead * [HUDI-3730][RFC-55] Improve hudi-sync classes design and simplify configs (apache#5695) * [HUDI-4146] RFC for Improve Hive/Meta sync class design and hierarchies Co-authored-by: jian.feng <jian.feng@shopee.com> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-4323] Make database table names optional in sync tool (apache#6073) * [HUDI-4323] Make database table names optional in sync tool * Infer from these properties from the table config * [MINOR] Update RFCs status (apache#6078) * [HUDI-4298] When reading the mor table with QUERY_TYPE_SNAPSHOT,Unabl… (apache#5937) * [HUDI-4298] Add test case for reading mor table Signed-off-by: LinMingQiang <1356469429@qq.com> * [HUDI-4379] Bump Flink versions to 1.14.5 and 1.15.1 (apache#6080) * [HUDI-4391] Incremental read from archived commits for flink (apache#6096) * [RFC-51] [HUDI-3478] Hudi to support Change-Data-Capture (apache#5436) Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> * [HUDI-4393] Add marker file for target file when flink merge handle rolls over (apache#6103) * [HUDI-4399][RFC-57] Claim RFC 57 for DeltaStreamer proto support (apache#6112) * [HUDI-4397] Flink Inline Cluster and Compact plan distribute strategy changed from rebalance to hash to avoid potential multiple threads accessing the same file (apache#6106) Co-authored-by: jerryyue <jerryyue@didiglobal.com> * [MINOR] Disable TestHiveSyncGlobalCommitTool (apache#6119) * [HUDI-4403] Fix the end input metadata for bounded source (apache#6116) * [HUDI-4408] Reuse old rollover file as base file for flink merge handle (apache#6120) * [HUDI-3503] Add call procedure for CleanCommand (apache#6065) * [HUDI-3503] Add call procedure for CleanCommand Co-authored-by: simonssu <simonssu@tencent.com> * [HUDI-4249] Fixing in-memory `HoodieData` implementation to operate lazily (apache#5855) * [HUDI-4170] Make user can use hoodie.datasource.read.paths to read necessary files (apache#5722) * Rebase codes * Move listFileSlices to HoodieBaseRelation * Fix review * Fix style * Fix bug * Remove a few files that were removed in upstream master * Fix build issues Co-authored-by: KnightChess <981159963@qq.com> Co-authored-by: Danny Chan <yuzhao.cyz@gmail.com> Co-authored-by: huberylee <shibei.lh@foxmail.com> Co-authored-by: watermelon12138 <49849410+watermelon12138@users.noreply.github.com> Co-authored-by: y00617041 <yangxuan42@huawei.com> Co-authored-by: Ibson <pushengli@163.com> Co-authored-by: pusheng.li01 <pusheng.li01@liulishuo.com> Co-authored-by: LiChuang <64473732+CodeCooker17@users.noreply.github.com> Co-authored-by: Gary Li <yanjia.gary.li@gmail.com> Co-authored-by: 吴祥平 <408317717@qq.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> Co-authored-by: xicm <36392121+xicm@users.noreply.github.com> Co-authored-by: xicm <xicm@asiainfo.com> Co-authored-by: Wangyh <763941163@qq.com> Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com> Co-authored-by: Todd Gao <todd.gao.2013@gmail.com> Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> Co-authored-by: qianchutao <72595723+qianchutao@users.noreply.github.com> Co-authored-by: guanziyue <30882822+guanziyue@users.noreply.github.com> Co-authored-by: Jin Xing <jinxing.corey@gmail.com> Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com> Co-authored-by: cxzl25 <cxzl25@users.noreply.github.com> Co-authored-by: BruceLin <brucekellan@gmail.com> Co-authored-by: ForwardXu <forwardxu315@gmail.com> Co-authored-by: aliceyyan <104287562+aliceyyan@users.noreply.github.com> Co-authored-by: aliceyyan <aliceyyan@tencent.com> Co-authored-by: Lanyuanxiaoyao <lanyuanxiaoyao@gmail.com> Co-authored-by: Alexey Kudinkin <alexey@infinilake.com> Co-authored-by: YueZhang <69956021+zhangyue19921010@users.noreply.github.com> Co-authored-by: yuezhang <yuezhang@freewheel.tv> Co-authored-by: Bo Cui <cuibo0108@163.com> Co-authored-by: Xingcan Cui <xcui@wealthsimple.com> Co-authored-by: wqwl611 <67826098+wqwl611@users.noreply.github.com> Co-authored-by: wqwl611 <wqwl611@gmail.com> Co-authored-by: 董可伦 <dongkelun01@inspur.com> Co-authored-by: 陈浩 <bettermouse94@gmail.com> Co-authored-by: Yuwei XIAO <ywxiaozero@gmail.com> Co-authored-by: Shawy Geng <gengxiaoyu1996@gmail.com> Co-authored-by: gengxiaoyu <gengxiaoyu@bytedance.com> Co-authored-by: luokey <854194341@qq.com> Co-authored-by: Zhaojing Yu <yuzhaojing@bytedance.com> Co-authored-by: wangxianghu <wangxianghu@apache.org> Co-authored-by: uday08bce <uday08bce@gmail.com> Co-authored-by: YuangZhang <z_yuang@foxmail.com> Co-authored-by: zhangyuang <zhangyuang@corp.netease.com> Co-authored-by: felixYyu <felix2003@live.cn> Co-authored-by: Heap <35054152+h1ap@users.noreply.github.com> Co-authored-by: liujinhui <965147871@qq.com> Co-authored-by: luoyajun <luoyajun1010@gmail.com> Co-authored-by: 冯健 <fengjian428@gmail.com> Co-authored-by: RexAn <anh131@126.com> Co-authored-by: komao <masterwangzx@gmail.com> Co-authored-by: wangzixuan.wzxuan <wangzixuan.wzxuan@bytedance.com> Co-authored-by: Rex An <bonean131@gmail.com> Co-authored-by: Carter Shanklin <cartershanklin@users.noreply.github.com> Co-authored-by: 苏承祥 <scx_white@aliyun.com> Co-authored-by: 苏承祥 <sucx@tuya.com> Co-authored-by: Kumud Kumar Srivatsava Tirupati <kumudkumartirupati@users.noreply.github.com> Co-authored-by: Qi Ji <qjqqyy@users.noreply.github.com> Co-authored-by: leesf <490081539@qq.com> Co-authored-by: Nicolas Paris <nicolas.paris@riseup.net> Co-authored-by: Saisai Shao <sai.sai.shao@gmail.com> Co-authored-by: marchpure <marchpure@126.com> Co-authored-by: HunterXHunter <1356469429@qq.com> Co-authored-by: john.wick <john.wick@vipshop.com> Co-authored-by: liuzhuang2017 <95120044+liuzhuang2017@users.noreply.github.com> Co-authored-by: sandyfog <154525105@qq.com> Co-authored-by: yanenze <34880077+yanenze@users.noreply.github.com> Co-authored-by: yanenze <yanenze@keytop.com.cn> Co-authored-by: superche <73096722+hechao-ustc@users.noreply.github.com> Co-authored-by: superche <superche@tencent.com> Co-authored-by: 5herhom <35916131+5herhom@users.noreply.github.com> Co-authored-by: Shizhi Chen <107476116+chenshzh@users.noreply.github.com> Co-authored-by: chenshizhi <chenshizhi@bilibili.com> Co-authored-by: Alexander Trushev <42293632+trushev@users.noreply.github.com> Co-authored-by: Forus <70357858+Forus0322@users.noreply.github.com> Co-authored-by: Shawn Chang <42792772+CTTY@users.noreply.github.com> Co-authored-by: Shawn Chang <yxchang@amazon.com> Co-authored-by: jiz <31836510+microbearz@users.noreply.github.com> Co-authored-by: jiimmyzhan <jiimmyzhan@tencent.com> Co-authored-by: zhanshaoxiong <shaoxiong0001@@gmail.com> Co-authored-by: xiarixiaoyao <mengtao0326@qq.com> Co-authored-by: bschell <bdscheller@gmail.com> Co-authored-by: Brandon Scheller <bschelle@amazon.com> Co-authored-by: Teng <teng_huo@outlook.com> Co-authored-by: yuezhang <yuezhang@yuezhang-mac.freewheelmedia.net> Co-authored-by: wenningd <wenningding95@gmail.com> Co-authored-by: Wenning Ding <wenningd@amazon.com> Co-authored-by: miomiocat <284487410@qq.com> Co-authored-by: JerryYue-M <272614347@qq.com> Co-authored-by: jerryyue <jerryyue@didiglobal.com> Co-authored-by: jian.feng <fengjian428@gmial.com> Co-authored-by: jian.feng <jian.feng@shopee.com> Co-authored-by: voonhous <voonhousu@gmail.com> Co-authored-by: voonhou.su <voonhou.su@shopee.com> Co-authored-by: shenjiayu17 <54424149+shenjiayu17@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Luning (Lucas) Wang <rsl4@foxmail.com> Co-authored-by: Yann Byron <biyan900116@gmail.com> Co-authored-by: Tim Brown <tim.brown126@gmail.com> Co-authored-by: simonsssu <barley0806@gmail.com> Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>

nsivabalan changed the title ~~[MINOR] Fixing determining target table schema for delta sync with empty batch~~ [HUDI-4132] Fixing determining target table schema for delta sync with empty batch May 20, 2022

Fixing determining target table schema for delta sync with empty batch

fa11e50

nsivabalan force-pushed the deltaSyncEmptySchemaEmptyTableFix branch from fe54afa to fa11e50 Compare May 23, 2022 12:14

harsh1231 approved these changes May 24, 2022

View reviewed changes

nsivabalan merged commit 10363c1 into apache:master May 24, 2022

yihua pushed a commit to yihua/hudi that referenced this pull request Jun 3, 2022

[HUDI-4132] Fixing determining target table schema for delta sync wit…

723e54a

…h empty batch (apache#5648)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-4132] Fixing determining target table schema for delta sync with empty batch #5648

[HUDI-4132] Fixing determining target table schema for delta sync with empty batch #5648

nsivabalan commented May 20, 2022 •

edited

hudi-bot commented May 23, 2022

[HUDI-4132] Fixing determining target table schema for delta sync with empty batch #5648

[HUDI-4132] Fixing determining target table schema for delta sync with empty batch #5648

Conversation

nsivabalan commented May 20, 2022 • edited

What is the purpose of the pull request

Brief change log

Verify this pull request

Committer checklist

hudi-bot commented May 23, 2022

CI report:

nsivabalan commented May 20, 2022 •

edited