[HUDI-8631] Fix the bug where the Flink table config hoodie.populate.meta.fields is not effective and optimize write performance#12404
Conversation
…meta.fields is not effective and optimize write performance
|
Dear Danny, Could you please take a look at this PR? @danny0405 Summary1. When enable
|
|
@usberkeley, @yihua , @danny0405
which will lead to But at this stage file writer is already initialized with row type: So we can use row type without metadata columns, and write in append mode only initial Flink row data. |
|
@usberkeley, @yihua , @danny0405 For support of By any chance, could this MR be refactoring and optimization everything related to |
|
@usberkeley, if you don't mind, it would be better to reopen and use already created HUDI-8308 for optimizations. |
Dear geserdugarov, look amazing. I've been busy with a major feature recently, so I'll take a closer look at your PR a bit later |
Change Logs
1. Fix the bug
hoodie.populate.meta.fieldsin Table Config (hoodie.properties)2. Optimize write performance
Impact
Improve write performance. After optimization, the write speed with
hoodie.populate.meta.fields=falseis 42.9% faster than withhoodie.populate.meta.fields=true.Testing method
Consume from the earliest position in Kafka until all messages are consumed (Kafka Lag = 0), and compare the time taken for both.
1)populate meta fields

time taken: 21hours and 25mins
2)no meta fields

time taken: 12hours and 14mins
Risk level (write none, low medium or high below)
medium
Documentation Update
none
Contributor's checklist