How to drop `hoodie.datasource.write.partitionpath.field` fields from a Hudi Dataset? #2213

brandon-stanley · 2020-10-29T00:54:27Z

Hi Hudi Team,

Is it possible to change the behaviour of Hudi when specifying the hoodie.datasource.write.partitionpath.field configuration for a table? I notice that the data is partitioned as expected. However, the dataset also contains the columns that were specified in the hoodie.datasource.write.partitionpath.field configuration. This behaviour is different from the native spark.write.partitionBy operation, which will partition the data based on specified columns and remove the aforementioned columns from the data set. Is there a way to match this behaviour?

Here is an example of the behaviour I am referring to: https://stackoverflow.com/questions/36164914/prevent-dataframe-partitionby-from-removing-partitioned-columns-from-schema/47104251

Cheers,

Brandon Stanley

The text was updated successfully, but these errors were encountered:

bvaradar · 2020-10-29T22:32:13Z

I have added a faq entry for the reasoning in https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-Whypartitionfieldsarealsostoredinparquetfilesinadditiontothepartitionpath?

brandon-stanley · 2020-10-30T12:49:11Z

@bvaradar Thanks for the reasoning. Is it possible to turn this feature off?

bvaradar · 2020-10-30T14:57:32Z

@brandon-stanley : There is no such feature currently available. I have created a jira for this request : https://issues.apache.org/jira/browse/HUDI-1363 for this. If you are interested in taking this up, kindly assign it to yourself

Thanks,
Balaji.V

krishna-paypay · 2023-02-10T01:03:21Z

I am also facing issues in the HUDI version 0.11.1 strange thing is that is occurs in some of the tables sometime but not on all the Hudi tables I am managing.

bvaradar closed this as completed Nov 9, 2020

lw309637554 mentioned this issue Dec 9, 2020

[HUDI-1363] Provide Option to drop partition columns after they are used to generate partition or record keys #2225

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to drop `hoodie.datasource.write.partitionpath.field` fields from a Hudi Dataset? #2213

How to drop `hoodie.datasource.write.partitionpath.field` fields from a Hudi Dataset? #2213

brandon-stanley commented Oct 29, 2020

bvaradar commented Oct 29, 2020

brandon-stanley commented Oct 30, 2020

bvaradar commented Oct 30, 2020

krishna-paypay commented Feb 10, 2023 •

edited

How to drop hoodie.datasource.write.partitionpath.field fields from a Hudi Dataset? #2213

How to drop hoodie.datasource.write.partitionpath.field fields from a Hudi Dataset? #2213

Comments

brandon-stanley commented Oct 29, 2020

bvaradar commented Oct 29, 2020

brandon-stanley commented Oct 30, 2020

bvaradar commented Oct 30, 2020

krishna-paypay commented Feb 10, 2023 • edited

How to drop `hoodie.datasource.write.partitionpath.field` fields from a Hudi Dataset? #2213

How to drop `hoodie.datasource.write.partitionpath.field` fields from a Hudi Dataset? #2213

krishna-paypay commented Feb 10, 2023 •

edited