Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-4396] Add a boolean parameter to decide whether the partition is cascade or not when hive table columns changes #6139

Closed
wants to merge 7 commits into from

Conversation

honeyaya
Copy link
Contributor

What is the purpose of the pull request

Add a boolean parameter to decide whether the partition is cascade or not when hive table columns changes

Brief change log

  • Change module: hudi-hive-sync

Verify this pull request

This pull request is already covered by existing tests, the default value of the the parameter is true

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@honeyaya honeyaya changed the title HUDI-4396 [HUDI-4396] Add a boolean parameter to decide whether the partition is cascade or not when hive table columns changes Jul 19, 2022
@honeyaya
Copy link
Contributor Author

@hudi-bot run azure

@honeyaya
Copy link
Contributor Author

@yihua could you help me review this pr? thanks.

@honeyaya
Copy link
Contributor Author

@hudi-bot run azure

Comment on lines +132 to +133
@Parameter(names = {"--partition-cascade"}, description = "Partition cascade when table columns change.")
public Boolean partitionCascade;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general we should avoid adding parameters/configs to tweak the logic inside the sync. There are so many configs already exposing the impl. details and making meta sync hard to use.

For this logic of determining cascade, can you figure out a way to detect it from schema/column changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, good idea about the configs settings.

And the logic cascade could not be detected from schema/column changes as far as I think, but the table parameters might help set this config in my opinion, then the user needs to create/alter the table with TBLPROPERTIES: "hoodie.datasource.hive_sync.partition_cascade=false" through HiveSQL, and this is not directly to use.

If we compare these two plans, I prefer the original plan, then what's your opinion?

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@yihua yihua added meta-sync priority:major degraded perf; unable to move forward; potential bugs priority:minor everything else; usability gaps; questions; feature reqs and removed priority:major degraded perf; unable to move forward; potential bugs labels Sep 12, 2022
@honeyaya honeyaya closed this Oct 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta-sync priority:minor everything else; usability gaps; questions; feature reqs
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

4 participants