-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-5157] Adding capability to remove all meta fields from source hudi table with Hudi incr source #7132
Conversation
final Dataset<Row> src = source.drop(HoodieRecord.HOODIE_META_COLUMNS.stream() | ||
.filter(x -> !x.equals(HoodieRecord.PARTITION_PATH_METADATA_FIELD)).toArray(String[]::new)); | ||
String[] colsToDrop = dropAllMetaFields ? HoodieRecord.HOODIE_META_COLUMNS.stream().toArray(String[]::new) : | ||
HoodieRecord.HOODIE_META_COLUMNS.stream().filter(x -> !x.equals(HoodieRecord.PARTITION_PATH_METADATA_FIELD)).toArray(String[]::new); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know what was the reason we strictly dropped partition path in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not decode that. I don't see a need unless we want to carry over the partitioning from tableA to tableB.
…th hudi incr source
23edfdd
to
c83686a
Compare
…ith hudi incr source (apache#7132)
…ith hudi incr source (apache#7132)
…ith hudi incr source (apache#7132)
…ith hudi incr source (apache#7132)
…ith hudi incr source (apache#7132)
…ith hudi incr source (apache#7132)
…ith hudi incr source (apache#7132) (apache#162) Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
Change Logs
HoodieIncrSource was dropping every meta field from source except partition path. If 3 hudi tables are chained (tableA, tableB incrementally reads from tableA, tableC incrementally reads from tableB), the 3rd table when reading from tableB fails w/ duplicate columns. So, adding a config to drop all meta fields. Tested chaining 3 hudi tables in a row and it worked.
Impact
We can chain any number of hudi tables now with Hoodie Incr Source.
Risk level (write none, low medium or high below)
low.
Documentation Update
New config added:
hoodie.deltastreamer.source.hoodieincr.drop.all.meta.fields.from.source
Default value is false. When set to true, chaining of any number of tables work.
Contributor's checklist