[HUDI-5932] Make the combine step in Call run_bootstrap Procedure optional by huangxiaopingRD · Pull Request #8179 · apache/hudi

huangxiaopingRD · 2023-03-14T10:06:58Z

Change Logs

In the existing implementation, if the preCombine field is not specified, the default value (ts) of the preCombine field will be obtained, and "ts" filed will not be recognized in the case of Full record Bootstrap, resulting in failure to generate input records. Therefore, we hope that we do not need to specify the preCombine field when executing bootstrap.

Caused by: org.apache.hudi.exception.HoodieException: ts(Part -ts) field not found in record. Acceptable fields were :[timestamp, _row_key, partition_path, rider, driver, begin_lat, begin_lon, end_lat, end_lon, fare, tip_history, _hoodie_is_deleted, datestr]
	at org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:557)
	at org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldValAsString(HoodieAvroUtils.java:535)
	at org.apache.hudi.bootstrap.SparkFullBootstrapDataProviderBase.lambda$generateInputRecords$cbf13809$1(SparkFullBootstrapDataProviderBase.java:87)
	at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
	at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:193)
	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Impact

Users do not need to specify preCombine when executing bootstrap.

Risk level (write none, low medium or high below)

None

Documentation Update

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

…ional

hudi-bot · 2023-04-19T21:59:38Z

CI report:

405ddf7 Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

huangxiaopingRD force-pushed the HUDI-5932 branch 2 times, most recently from 85352a3 to 088416b Compare March 15, 2023 02:13

codope added priority:high Significant impact; potential bugs area:sql SQL interfaces labels Mar 20, 2023

[HUDI-5932] Make the combine step in Call run_bootstrap Procedure opt…

e4cbe3b

…ional

huangxiaopingRD force-pushed the HUDI-5932 branch from 0e5ea03 to e4cbe3b Compare March 27, 2023 12:08

huangxiaopingRD added 3 commits March 31, 2023 10:42

Merge branch 'apache:master' into HUDI-5932

77a3bea

Merge branch 'apache:master' into HUDI-5932

33db584

Merge branch 'apache:master' into HUDI-5932

405ddf7

github-actions bot added the size:S PR with lines of changes in (10, 100] label Feb 26, 2024

huangxiaopingRD closed this Jan 22, 2025

hudi-bot mentioned this pull request Dec 9, 2025

Make the combine step in Call run_bootstrap Procedure optional #15833

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[HUDI-5932] Make the combine step in Call run_bootstrap Procedure optional#8179

[HUDI-5932] Make the combine step in Call run_bootstrap Procedure optional#8179
huangxiaopingRD wants to merge 4 commits intoapache:masterfrom
huangxiaopingRD:HUDI-5932

huangxiaopingRD commented Mar 14, 2023

Uh oh!

hudi-bot commented Apr 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

huangxiaopingRD commented Mar 14, 2023

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

Uh oh!

hudi-bot commented Apr 19, 2023

CI report:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants