Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-7183] Fix static insert overwrite partitions issue #10254

Merged
merged 2 commits into from
Dec 17, 2023

Conversation

wecharyu
Copy link
Contributor

@wecharyu wecharyu commented Dec 6, 2023

Change Logs

  1. Refactor overwrite config deduce:
    Record matching static partitions for static insert_overwrite operation;
    Read replaced partition fileIds through write status for dynamic insert overwrite.
  2. Reorder partition fields in hoodie table schema.
  3. Add test cases for insert overwrite multiple partitioned table.

Impact

Fix the issue that static insert overwrite not work for multiple partitioned table.

Risk level (write none, low medium or high below)

low.

Documentation Update

none.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@danny0405
Copy link
Contributor

cc @boneanxs for the reiview

}
val overwriteMode = if (isOverWriteTable) SaveMode.Overwrite else SaveMode.Append
val staticPartitions = if (isStaticOverwrite && !isOverWriteTable) {
val fileIndex = HoodieFileIndex(sparkSession, catalogTable.metaClient, None, combinedOpts, FileStatusCache.getOrCreate(sparkSession))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @beyond1920, this pr tries to align with Spark that only under static overwrite mode, insert overwrite an empty dataset will overwrite the old data, this is a breaking change with #9811. Is that suitable to you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the behavior makes sense only with static partitioning. #9811 covers the empty input df case which is only relevant with static partitioning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late response. @boneanxs I think the behavior is reasonable.

}
val staticOverwritePartitionPathOpt = getStaticOverwritePartitionPath(catalogTable, partitionSpec, isOverWritePartition)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the old method can be deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@bvaradar
Copy link
Contributor

@wecharyu : can you look into the test case failures

@bvaradar bvaradar self-assigned this Dec 13, 2023
@apache apache deleted a comment from hudi-bot Dec 15, 2023
@boneanxs
Copy link
Contributor

@hudi-bot run azure

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Contributor

@bvaradar bvaradar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bvaradar bvaradar merged commit 27ac428 into apache:master Dec 17, 2023
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

6 participants