[HUDI-2214] fix the bug that residual temporary files after clustering are not cleaned up #3335

xiarixiaoyao · 2021-07-23T09:41:40Z

Tips

Thank you very much for contributing to Apache Hudi.
Please review https://hudi.apache.org/contributing.html before opening a pull request.

What is the purpose of the pull request

residual temporary files after clustering are not cleaned up

// test step

step1: do clustering

val records1 = recordsToStrings(dataGen.generateInserts("001", 1000)).toList
val inputDF1: Dataset[Row] = spark.read.json(spark.sparkContext.parallelize(records1, 2))
inputDF1.write.format("org.apache.hudi")
.options(commonOpts)
.option(DataSourceWriteOptions.OPERATION_OPT_KEY.key(), DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL)
.option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key(), DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL)
// option for clustering
.option("hoodie.parquet.small.file.limit", "0")
.option("hoodie.clustering.inline", "true")
.option("hoodie.clustering.inline.max.commits", "1")
.option("hoodie.clustering.plan.strategy.target.file.max.bytes", "1073741824")
.option("hoodie.clustering.plan.strategy.small.file.limit", "629145600")
.option("hoodie.clustering.plan.strategy.max.bytes.per.group", Long.MaxValue.toString)
.option("hoodie.clustering.plan.strategy.target.file.max.bytes", String.valueOf(12 *1024 * 1024L))
.option("hoodie.clustering.plan.strategy.sort.columns", "begin_lat, begin_lon")
.mode(SaveMode.Overwrite)
.save(basePath)

step2: check the temp dir, we find /tmp/junit1835474867260509758/dataset/.hoodie/.temp/ is not empty

/tmp/junit1835474867260509758/dataset/.hoodie/.temp/20210723171208

is not cleaned up.

Brief change log

(for example:)

Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

ut added

Committer checklist

Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

hudi-bot · 2021-07-23T09:45:24Z

CI report:

9bebeaf Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run travis re-run the last Travis build
@hudi-bot run azure re-run the last Azure build

xiarixiaoyao · 2021-07-26T01:35:00Z

@garyli1019 could you help me to review this pr, thanks

garyli1019 · 2021-07-26T03:58:29Z

@garyli1019 could you help me to review this pr, thanks

@xiarixiaoyao Thanks for your contribution. I am not quite familiar with the clustering code. Might need help from @satishkotha

xiarixiaoyao · 2021-07-26T14:53:01Z

@garyli1019 thanks . @satishkotha could you pls help me to review this pr

…up (apache#3335)

[HUDI-2214]residual temporary files after clustering are not cleaned up

9bebeaf

garyli1019 assigned satishkotha Jul 26, 2021

garyli1019 requested a review from satishkotha July 26, 2021 03:56

satishkotha merged commit 5353243 into apache:master Jul 26, 2021

liujinhui1994 pushed a commit to liujinhui1994/hudi that referenced this pull request Aug 12, 2021

[HUDI-2214]residual temporary files after clustering are not cleaned …

19f21b9

…up (apache#3335)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-2214] fix the bug that residual temporary files after clustering are not cleaned up #3335

[HUDI-2214] fix the bug that residual temporary files after clustering are not cleaned up #3335

xiarixiaoyao commented Jul 23, 2021

hudi-bot commented Jul 23, 2021 •

edited

xiarixiaoyao commented Jul 26, 2021

garyli1019 commented Jul 26, 2021

xiarixiaoyao commented Jul 26, 2021

[HUDI-2214] fix the bug that residual temporary files after clustering are not cleaned up #3335

[HUDI-2214] fix the bug that residual temporary files after clustering are not cleaned up #3335

Conversation

xiarixiaoyao commented Jul 23, 2021

Tips

What is the purpose of the pull request

Brief change log

Verify this pull request

Committer checklist

hudi-bot commented Jul 23, 2021 • edited

CI report:

xiarixiaoyao commented Jul 26, 2021

garyli1019 commented Jul 26, 2021

xiarixiaoyao commented Jul 26, 2021

hudi-bot commented Jul 23, 2021 •

edited