Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-5157] Adding capability to remove all meta fields from source hudi table with Hudi incr source #7132

Merged
merged 1 commit into from
Nov 22, 2022

Conversation

nsivabalan
Copy link
Contributor

@nsivabalan nsivabalan commented Nov 3, 2022

Change Logs

HoodieIncrSource was dropping every meta field from source except partition path. If 3 hudi tables are chained (tableA, tableB incrementally reads from tableA, tableC incrementally reads from tableB), the 3rd table when reading from tableB fails w/ duplicate columns. So, adding a config to drop all meta fields. Tested chaining 3 hudi tables in a row and it worked.

Impact

We can chain any number of hudi tables now with Hoodie Incr Source.

Risk level (write none, low medium or high below)

low.

Documentation Update

New config added:
hoodie.deltastreamer.source.hoodieincr.drop.all.meta.fields.from.source
Default value is false. When set to true, chaining of any number of tables work.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@nsivabalan nsivabalan added priority:critical production down; pipelines stalled; Need help asap. hudistreamer issues related to Hudi streamer (Formely deltastreamer) labels Nov 3, 2022
final Dataset<Row> src = source.drop(HoodieRecord.HOODIE_META_COLUMNS.stream()
.filter(x -> !x.equals(HoodieRecord.PARTITION_PATH_METADATA_FIELD)).toArray(String[]::new));
String[] colsToDrop = dropAllMetaFields ? HoodieRecord.HOODIE_META_COLUMNS.stream().toArray(String[]::new) :
HoodieRecord.HOODIE_META_COLUMNS.stream().filter(x -> !x.equals(HoodieRecord.PARTITION_PATH_METADATA_FIELD)).toArray(String[]::new);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know what was the reason we strictly dropped partition path in the first place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not decode that. I don't see a need unless we want to carry over the partitioning from tableA to tableB.

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@codope codope changed the title [HUDI-51577] Adding capability to remove all meta fields from source hudi table with Hudi incr source [HUDI-5157] Adding capability to remove all meta fields from source hudi table with Hudi incr source Nov 22, 2022
@codope codope merged commit ceb94b4 into apache:master Nov 22, 2022
satishkotha pushed a commit to satishkotha/incubator-hudi that referenced this pull request Dec 12, 2022
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Apr 5, 2023
vinishjail97 pushed a commit to vinishjail97/hudi that referenced this pull request Dec 15, 2023
…ith hudi incr source (apache#7132) (apache#162)

Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hudistreamer issues related to Hudi streamer (Formely deltastreamer) priority:critical production down; pipelines stalled; Need help asap.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

3 participants