Skip to content

[HUDI-4056] Refine partition matching in when drop / truncate partition.#5533

Closed
jinxing64 wants to merge 1 commit intoapache:masterfrom
jinxing64:HUDI-4045
Closed

[HUDI-4056] Refine partition matching in when drop / truncate partition.#5533
jinxing64 wants to merge 1 commit intoapache:masterfrom
jinxing64:HUDI-4045

Conversation

@jinxing64
Copy link
Contributor

What is the purpose of the pull request

In current code, AlterHoodieTableDropPartitionCommand and TruncateHoodieTableCommand require all partition fields should be specified when droping / truncating partitions(s), otherwise complain with AnalyisException (HoodieSqlCommonUtils#normalizePartitionSpec).

But native Spark/Hive SQL have no such limitations – – partition matching is provided as a functionality helping user to manage partition(s) in an easy way. Say dropping partitions with a single SQL "alter table test drop partition (year='2020')", but rather to specify all the partitions from (year='2020', month='01', day='01') to (year='2020', month='12', day='31')

This PR propose to refine the partition matching logic when drop / truncate partition and remove the limitation mentioned above.

Brief change log

  • Refine HoodieSqlCommonUtils#normalizePartitionSpec -- Reuse Spark utilities for column verifying;
  • HoodieSqlCommonUtils#getMatchingPartitions provides as a utility for partition matching;
  • Corresponding change in AlterHoodieTableDropPartitionCommand and TruncateHoodieTableCommand

Verify this pull request

  • Added tests in TestAlterTableDropPartition and TestTruncateTable

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@jinxing64 jinxing64 changed the title [HUDI-4045] Refine partition matching in when drop / truncate partition. [HUDI-4056] Refine partition matching in when drop / truncate partition. May 9, 2022
@jinxing64
Copy link
Contributor Author

wip

@jinxing64 jinxing64 force-pushed the HUDI-4045 branch 2 times, most recently from 849f505 to 551c71b Compare May 9, 2022 05:36
val tableName = generateTableName
val tablePath = s"${tmp.getCanonicalPath}/$tableName"

import spark.implicits._
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes no sense to config urlencode=false when partition value contains slash -- '/' , it brings much obstacles when parsing partition values from path, say when executing 'show partitions'. I replace "/" with "#", which triggers character escaping but doesn't break partition path parsing.

@hudi-bot
Copy link
Collaborator

hudi-bot commented May 9, 2022

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants