Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-3972] Fixing hoodie.properties/tableConfig for no preCombine field with writes #5424

Merged
merged 4 commits into from Apr 26, 2022

Conversation

nsivabalan
Copy link
Contributor

What is the purpose of the pull request

Fixing hoodie.properties/tableConfig when no precombine field is set.

Brief change log

  • Fixed instantiation of new table to set the null for preCombine if not explicitly set by the user.

Verify this pull request

This change added tests and can be verified as follows:

  • Added tests to TestCOWDataSource.testNoPreCombine and TestMORDataSource.testNoPreCombine.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@alexeykudinkin
Copy link
Contributor

@hudi-bot run azure

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan merged commit 762623a into apache:master Apr 26, 2022
yihua pushed a commit to yihua/hudi that referenced this pull request Jun 3, 2022
…eld with writes (apache#5424)

Fixed instantiation of new table to set the null for preCombine if not explicitly set by the user.
vinishjail97 pushed a commit to vinishjail97/hudi that referenced this pull request Dec 15, 2023
* [HUDI-3972] Fixing hoodie.properties/tableConfig for no preCombine field with writes (apache#5424)

Fixed instantiation of new table to set the null for preCombine if not explicitly set by the user.

* [HUDI-3478] Claim RFC 51 For CDC (apache#5437)

* [MINOR] Update alter rename command class type for pattern matching (apache#5381)

* [HUDI-3977] Flink hudi table with date type partition path throws HoodieNotSupportedException (apache#5432)

* Claim RFC 52 for Introduce Secondary Index to Improve HUDI Query Performance (apache#5441)

* [HUDI-3945] After the async compaction operation is complete, the task should exit. (apache#5391)

Co-authored-by: y00617041 <yangxuan42@huawei.com>

* [HUDI-3815] Fix docs description of metadata.compaction.delta_commits default value error (apache#5368)

Co-authored-by: pusheng.li01 <pusheng.li01@liulishuo.com>

* [HUDI-3943] Some description fixes for 0.10.1 docs (apache#5447)

* [MINOR] support different cleaning policy for flink (apache#5459)

* [HUDI-3758] Fix duplicate fileId error in MOR table type with flink bucket hash Index  (apache#5185)

* fix duplicate fileId with bucket Index
* replace to load FileGroup from FileSystemView

* [MINOR] Fix CI by ignoring SparkContext error (apache#5468)

Sets spark.driver.allowMultipleContexts = true when constructing Spark conf in UtilHelpers

* [HUDI-3862] Fix default configurations of HoodieHBaseIndexConfig (apache#5308)

Co-authored-by: xicm <xicm@asiainfo.com>

* [HUDI-3978] Fix use of partition path field as hive partition field in flink (apache#5434)

* Fix partition path fields as hive sync partition fields error

* [MINOR] Update DOAP for release 0.11.0 (apache#5467)

* [HUDI-3211][RFC-44] Add RFC for Hudi Connector for Presto (apache#4563)

* Add RFC doc

Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>

* Add note regarding catalog naming

Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>

* [MINOR] Update RFC status (apache#5486)

* [HUDI-4005] Update release scripts to help validation (apache#5479)

* [HUDI-4031] Avoid clustering update handling when no pending replacecommit (apache#5487)

* [HUDI-3667] Run unit tests of hudi-integ-tests in CI (apache#5078)

* [MINOR] Optimize code logic (apache#5499)

* [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor exit gracefully (apache#4264)

* [HUDI-4042] Support truncate-partition for Spark-3.2 (apache#5506)

* [HUDI-4017] Improve spark sql coverage in CI (apache#5512)

Add GitHub actions tasks to run spark sql UTs under spark 3.1 and 3.2.

* [HUDI-3675] Adding post write termination strategy to deltastreamer continuous mode (apache#5073)

- Added a postWriteTerminationStrategy to deltastreamer continuous mode. One can enable by setting the appropriate termination strategy using DeltastreamerConfig.postWriteTerminationStrategyClass. If not, continuous mode is expected to run forever.
- Added one concrete impl for termination strategy as NoNewDataTerminationStrategy which shuts down deltastreamer if there is no new data to consume from source for N consecutive rounds.

* [HUDI-3849] AvroDeserializer supports AVRO_REBASE_MODE_IN_READ configuration (apache#5287)

* [MINOR] Fixing class not found when using flink and enable metadata table (apache#5527)

* [MINOR] fixing flaky tests in deltastreamer tests (apache#5521)

* [HUDI-4055]refactor ratelimiter to avoid stack overflow (apache#5530)

* [MINOR] Fixing close for HoodieCatalog's test (apache#5531)

* [MINOR] Fixing close for HoodieCatalog's test

* [HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOpti… (apache#5526)

* [HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOptimized

Co-authored-by: xicm <xicm@asiainfo.com>

* [HUDI-3995] Making perf optimizations for bulk insert row writer path (apache#5462)

- Avoid using udf for key generator for SimpleKeyGen and NonPartitionedKeyGen.
- Fixed NonPartitioned Key generator to directly fetch record key from row rather than involving GenericRecord.
- Other minor fixes around using static values instead of looking up hashmap.

* [HUDI-4044] When reading data from flink-hudi to external storage, the … (apache#5516)


Co-authored-by: aliceyyan <aliceyyan@tencent.com>

* [HUDI-4003] Try to read all the log file to parse schema (apache#5473)

* [HUDI-4038] Avoid calling `getDataSize` after every record written (apache#5497)

- getDataSize has non-trivial overhead in the current ParquetWriter impl, requiring traversal of already composed Column Groups in memory. Instead we can sample these calls to getDataSize to amortize its cost.

Co-authored-by: sivabalan <n.siva.b@gmail.com>

* [HUDI-4079] Supports showing table comment for hudi with spark3 (apache#5546)

* [HUDI-4085] Fixing flakiness with parquet empty batch tests in TestHoodieDeltaStreamer (apache#5559)

* [HUDI-3963][Claim RFC number 53] Use Lock-Free Message Queue Improving Hoodie Writing Efficiency. (apache#5562)


Co-authored-by: yuezhang <yuezhang@freewheel.tv>

* [HUDI-4018][HUDI-4027] Adding integ test yamls for immutable use-cases. Added delete partition support to integ tests (apache#5501)

- Added pure immutable test yamls to integ test framework. Added SparkBulkInsertNode as part of it.
- Added delete_partition support to integ test framework using spark-datasource.
- Added a single yaml to test all non core write operations (insert overwrite, insert overwrite table and delete partitions)
- Added tests for 4 concurrent spark datasource writers (multi-writer tests).
- Fixed readme w/ sample commands for multi-writer.

* [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (apache#5528)

* [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink

* [MINOR] Fix a NPE for Option (apache#5461)

* [HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compact… (apache#5545)

* [HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compaction files

* [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (apache#5574)

* [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink

* [HUDI-4072] Fix NULL schema for empty batches in deltastreamer (apache#5543)

* [HUDI-4097] add table info to jobStatus (apache#5529)


Co-authored-by: wqwl611 <wqwl611@gmail.com>

* [HUDI-3980] Suport kerberos hbase index (apache#5464)

- Add configurations in HoodieHBaseIndexConfig.java to support kerberos hbase connection.

Co-authored-by: xicm <xicm@asiainfo.com>

* [HUDI-4001] Filter the properties should not be used when create table for Spark SQL (apache#5495)

* fix hive sync no partition table error (apache#5585)

* [HUDI-3123] consistent hashing index: basic write path (upsert/insert) (apache#4480)

 1. basic write path(insert/upsert) implementation
 2. adapt simple bucket index

* [HUDI-4098] Metadata table heartbeat for instant has expired, last heartbeat 0 (apache#5583)

* [HUDI-4103] [HUDI-4001] Filter the properties should not be used when create table for Spark SQL

* [HUDI-3654] Preparations for hudi metastore. (apache#5572)

* [HUDI-3654] Preparations for hudi metastore.

Co-authored-by: gengxiaoyu <gengxiaoyu@bytedance.com>

* [HUDI-4104] DeltaWriteProfile includes the pending compaction file slice when deciding small buckets (apache#5594)

* [HUDI-4101] BucketIndexPartitioner should take partition path for better dispersion (apache#5590)

* [HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand (apache#5564)

* [HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand

* Set hoodie.query.as.ro.table in serde properties

* [HUDI-4110] Clean the marker files for flink compaction (apache#5604)

* [MINOR] Fixing spark long running yaml for non-partitioned (apache#5607)

* [minor] Some code refactoring for LogFileComparator and Instant instantiation (apache#5600)

* [HUDI-4109] Copy the old record directly when it is chosen for merging (apache#5603)

* Clean the marker files for flink compaction (apache#5611)

Co-authored-by: 854194341@qq.com <loukey_7821>

* [HUDI-3942] [RFC-50] Improve Timeline Server (apache#5392)

* [HUDI-4111] Bump ANTLR runtime version in Spark 3.x (apache#5606)

* Revert "[HUDI-3870] Add timeout rollback for flink online compaction (apache#5314)" (apache#5622)

This reverts commit 6f9b02d.

* [HUDI-4116] Unify clustering/compaction related procedures' output type (apache#5620)

* Unify clustering/compaction related procedures' output type

* Address review comments

* [HUDI-4114] Remove the unnecessary fs view sync for BaseWriteClient#initTable (apache#5617)

No need to #sync actively because the table instance is instantiated freshly,
its view manager has empty fiew instantces, the fs view would be synced lazily when
is it requested.

* [HUDI-4119] the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi (apache#5626)

* HUDI-4119 the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi

Co-authored-by: aliceyyan <aliceyyan@tencent.com>

* [HUDI-4130] Remove the upgrade/downgrade for flink #initTable (apache#5642)

* [HUDI-3985] Refactor DLASyncTool to support read hoodie table as spark datasource table (apache#5532)

* [MINOR] Minor fixes to exception log and removing unwanted metrics flush in integ test (apache#5646)

* [HUDI-4122] Fix NPE caused by adding kafka nodes (apache#5632)

* [MINOR] remove unused gson test dependency (apache#5652)

* [HUDI-3858] Shade javax.servlet for Spark bundle jar (apache#5295)

Co-authored-by: yuezhang <yuezhang@freewheel.tv>

* [HUDI-4100] CTAS failed to clean up when given an illegal MANAGED table definition (apache#5588)

* [HUDI-3890] fix rat plugin issue with sql files (apache#5644)

* [HUDI-4051] Allow nested field as primary key and preCombineField in spark sql (apache#5517)

* [HUDI-4051] Allow nested field as preCombineField in spark sql

* relax validation for primary key

* [HUDI-4129] Initializes a new fs view for WriteProfile#reload (apache#5640)

Co-authored-by: zhangyuang <zhangyuang@corp.netease.com>

* [HUDI-4142] Claim RFC-54 for new table APIs (apache#5665)

* [HUDI-3933] Add UT cases to cover different key gen (apache#5638)

* [MINOR] Removing redundant semicolons and line breaks (apache#5662)

* [HUDI-4134] Fix Method naming consistency issues in FSUtils (apache#5655)

* [HUDI-4084] Add support to test async table services with integ test suite framework (apache#5557)

* Add support to test async table services with integ test suite framework

* Make await time for validation configurable

* [HUDI-4138] Fix the concurrency modification of hoodie table config for flink (apache#5660)

* Remove the metadata cleaning strategy for flink, that means the multi-modal index may be affected
* Improve the HoodieTable#clearMetadataTablePartitionsConfig to only update table config when necessary
* Remove the modification of read code path in HoodieTableConfig

* [HUDI-2473] Fixing compaction write operation in commit metadata (apache#5203)

* [HUDI-4145] Archives the metadata file in HoodieInstant.State sequence (apache#5669)

* [HUDI-4135] remove netty and netty-all (apache#5663)

* [HUDI-2207] Support independent flink hudi clustering function

* [HUDI-4132] Fixing determining target table schema for delta sync with empty batch (apache#5648)

* [MINOR] Fix a potential NPE and some finer points of hudi cli (apache#5656)

* [HUDI-4146] Claim RFC-55 for Improve Hive/Meta sync class design and hierachies (apache#5682)

* [HUDI-3193] Decouple hudi-aws from hudi-client-common (apache#5666)

Move HoodieMetricsCloudWatchConfig to hudi-client-common

Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
Co-authored-by: Yann Byron <biyan900116@gmail.com>
Co-authored-by: KnightChess <981159963@qq.com>
Co-authored-by: Danny Chan <yuzhao.cyz@gmail.com>
Co-authored-by: huberylee <shibei.lh@foxmail.com>
Co-authored-by: watermelon12138 <49849410+watermelon12138@users.noreply.github.com>
Co-authored-by: y00617041 <yangxuan42@huawei.com>
Co-authored-by: Ibson <pushengli@163.com>
Co-authored-by: pusheng.li01 <pusheng.li01@liulishuo.com>
Co-authored-by: LiChuang <64473732+CodeCooker17@users.noreply.github.com>
Co-authored-by: Gary Li <yanjia.gary.li@gmail.com>
Co-authored-by: 吴祥平 <408317717@qq.com>
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
Co-authored-by: xicm <36392121+xicm@users.noreply.github.com>
Co-authored-by: xicm <xicm@asiainfo.com>
Co-authored-by: Wangyh <763941163@qq.com>
Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
Co-authored-by: Todd Gao <todd.gao.2013@gmail.com>
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
Co-authored-by: qianchutao <72595723+qianchutao@users.noreply.github.com>
Co-authored-by: guanziyue <30882822+guanziyue@users.noreply.github.com>
Co-authored-by: Jin Xing <jinxing.corey@gmail.com>
Co-authored-by: cxzl25 <cxzl25@users.noreply.github.com>
Co-authored-by: BruceLin <brucekellan@gmail.com>
Co-authored-by: ForwardXu <forwardxu315@gmail.com>
Co-authored-by: aliceyyan <104287562+aliceyyan@users.noreply.github.com>
Co-authored-by: aliceyyan <aliceyyan@tencent.com>
Co-authored-by: Lanyuanxiaoyao <lanyuanxiaoyao@gmail.com>
Co-authored-by: Alexey Kudinkin <alexey@infinilake.com>
Co-authored-by: YueZhang <69956021+zhangyue19921010@users.noreply.github.com>
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
Co-authored-by: Bo Cui <cuibo0108@163.com>
Co-authored-by: Xingcan Cui <xcui@wealthsimple.com>
Co-authored-by: wqwl611 <67826098+wqwl611@users.noreply.github.com>
Co-authored-by: wqwl611 <wqwl611@gmail.com>
Co-authored-by: 董可伦 <dongkelun01@inspur.com>
Co-authored-by: 陈浩 <bettermouse94@gmail.com>
Co-authored-by: Yuwei XIAO <ywxiaozero@gmail.com>
Co-authored-by: Shawy Geng <gengxiaoyu1996@gmail.com>
Co-authored-by: gengxiaoyu <gengxiaoyu@bytedance.com>
Co-authored-by: luokey <854194341@qq.com>
Co-authored-by: Zhaojing Yu <yuzhaojing@bytedance.com>
Co-authored-by: wangxianghu <wangxianghu@apache.org>
Co-authored-by: uday08bce <uday08bce@gmail.com>
Co-authored-by: YuangZhang <z_yuang@foxmail.com>
Co-authored-by: zhangyuang <zhangyuang@corp.netease.com>
Co-authored-by: felixYyu <felix2003@live.cn>
Co-authored-by: Heap <35054152+h1ap@users.noreply.github.com>
Co-authored-by: liujinhui <965147871@qq.com>
Co-authored-by: luoyajun <luoyajun1010@gmail.com>
Co-authored-by: 冯健 <fengjian428@gmail.com>
Co-authored-by: Rajesh Mahindra <rmahindra@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants