Skip to content

[HUDI-6551] A new slashed month partition value extractor#9184

Closed
ban1989ban wants to merge 4 commits intoapache:masterfrom
ban1989ban:feature/MonthWisePartitioner
Closed

[HUDI-6551] A new slashed month partition value extractor#9184
ban1989ban wants to merge 4 commits intoapache:masterfrom
ban1989ban:feature/MonthWisePartitioner

Conversation

@ban1989ban
Copy link

@ban1989ban ban1989ban commented Jul 13, 2023

Change Logs

Support for adding Month Wise Partitioner for Hudi-hive sync

Impact

With this, now users will be able ti hudi-hive sync where month wise partitioner is required by giving the --partition-value-extractor org.apache.hudi.hive.SlashEncodedMonthPartitionValueExtractor

Risk level (write none, low medium or high below)

None

Documentation Update

To use Month-wise partitioner while using Hive sync tool for Hudi Tables
Use Following command
$HUDI_HOME/hudi-sync/hudi-hive-sync/run_sync_tool.sh --jdbc-url jdbc:hive2://localhost:10000 --partitioned-by partitionid --base-path "hdfs://NameNodeIp: port/<path to table>" --user hive --pass hive --database default --table <table_name> --partition-value-extractor org.apache.hudi.hive.SlashEncodedMonthPartitionValueExtractor

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

* in the format 'yyyy/mm'.
*/
public class SlashEncodedYearMonthPartitionValueExtractor implements PartitionValueExtractor {

Copy link
Contributor

@danny0405 danny0405 Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we name it: SlashEncodedMonthPartitionValueExtractor

Copy link
Author

@ban1989ban ban1989ban Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can do this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the changes.

@danny0405 danny0405 changed the title Feature/month wise partitioner [HUDI-6551] A new slashed month partition value extractor Jul 18, 2023
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 self-assigned this Jul 19, 2023
@danny0405 danny0405 added the component:catalog-sync Catalog-sync related label Jul 19, 2023
* PartitionValueExtractor interface to support extracting partition values from paths
* in the format 'yyyy/mm'.
*/
public class SlashEncodedMonthPartitionValueExtractor implements PartitionValueExtractor {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functionality of this extractor is covered by SinglePartPartitionValueExtractor. Is SinglePartPartitionValueExtractor not enough in your use case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not, the user actually wants the data format with yyyy/mm and ignores the datetime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that. SinglePartPartitionValueExtractor serves the same purpose by transforming yyyy/mm to yyyy-mm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user want yyyy/mm instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not following, the logic and test show that yyyy/mm is transformed to yyyy-mm during extraction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@banank1989 could you clarify?

Copy link
Contributor

@bvaradar bvaradar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @yihua : the functionality is available in SinglePartPartitionValueExtractor @banank1989 : Is it ok if we close this PR ?

@yihua
Copy link
Contributor

yihua commented Oct 24, 2023

Closing this PR now. @banank1989 feel free to reopen it if you need additional functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:catalog-sync Catalog-sync related

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

6 participants