Skip to content

Comments

[HUDI-3103] Enable MultiTableDeltaStreamer to update a single target table from multiple source tables.#4637

Closed
watermelon12138 wants to merge 1 commit intoapache:masterfrom
watermelon12138:SupportForMultiSource
Closed

[HUDI-3103] Enable MultiTableDeltaStreamer to update a single target table from multiple source tables.#4637
watermelon12138 wants to merge 1 commit intoapache:masterfrom
watermelon12138:SupportForMultiSource

Conversation

@watermelon12138
Copy link
Contributor

What is the purpose of the pull request

The purpose of pull request is to update a single target table from multiple source tables.

Brief change log

  • Modify the HoodieMultiTableDeltaStreamer file so that it can generate the execution context of table based on source tables.
  • Modify the DeltaSync.java file so that the source table can associate with other tables and the source can configure independent checkpoint.
  • add UT

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't taken a detailed look at the patch. But seems like we are making changes to DeltaSync to be aware of multiple sources/tables.

I would prefer if we keep DeltaSync not aware of multiple tables. It is meant to read from single source and write to single hudi table.

CC @pratyakshsharma @codope

@watermelon12138
Copy link
Contributor Author

@nsivabalan
Thank you for the advice.
However, the resumeCheckpointStr calculation method in DeltaSync applies only to updating a single target table by a single source. If multiple sources update a single target table, this calculation method does not work. If multiple sources update a single target, set an independent checkpoint for each source so that each source can recover from any checkpoint. I only changed the methods for calculating resumeCheckpointStr and saving checkpoints to checkpointCommitMetadata in DeltaSync. Specifically, I added these methods. This does not affect the calculation logic when a single source updates a single target.

@nsivabalan
Copy link
Contributor

oh I see. my bad, I did not look closely. sorry about that. We can keep it open. give me a day or two. I will take a detailed look and leave comments.

@nsivabalan nsivabalan reopened this Jan 20, 2022
@watermelon12138
Copy link
Contributor Author

oh I see. my bad, I did not look closely. sorry about that. We can keep it open. give me a day or two. I will take a detailed look and leave comments.
@nsivabalan
OK, thank you very much. You can see my new pull request. I want to close the current pull request.
#4645

@nsivabalan nsivabalan closed this Jan 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants