Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-2413] fix Sql source's checkpoint issue #3648

Merged
merged 4 commits into from
Feb 14, 2022

Conversation

fengjian428
Copy link
Contributor

@fengjian428 fengjian428 commented Sep 13, 2021

https://issues.apache.org/jira/browse/HUDI-2413

What is the purpose of the pull request

  • with SqlSource there is not checkpoint and is usually used only for one time backfill use-cases. So, added a config to deltastreamer named "--allow-commit-on-no-checkpoint-change".
  • Users are expected to set this config to true with SqlSource.

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

This change added tests and can be verified as follows:

  • Added tests to TestSqlSource
  • Added tests to TestHoodieDeltaStreamer to test SqlSource.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@pratyakshsharma
Copy link
Contributor

@fengjian428 Just a high level question, how is empty string solving the problems caused by null checkpoint?

@fengjian428
Copy link
Contributor Author

fengjian428 commented Sep 13, 2021

@fengjian428 Just a high level question, how is empty string solving the problems caused by null checkpoint?

In DeltaSync line 432, if both resumeCheckpointStr and checkpointStr is null, will judge as no new data, and won't do the next,and in this case , SqlSource is design as one time job

@dongkelun
Copy link
Contributor

dongkelun commented Sep 17, 2021

Hi,I also encountered the same problem in the previous use. I simply wrote the checkpoint as "0". Overall LGTM but I have a question about whether the empty string is repeated with 'formatAdapter.getSource() instanceof SqlSource', and whether it only needs to add a judgment in DeltaSync?

@nsivabalan
Copy link
Contributor

@codope : can you review this as well.

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the nicer way would be not have deltastreamer error out (or at-least control this behavior) if there is no checkpoint, but to provide an empty one? This PR adds source specific checks to the DeltaSync class which I'd like to avoid.

@vinothchandar
Copy link
Member

Good for @codope to review

@fengjian428
Copy link
Contributor Author

I think the nicer way would be not have deltastreamer error out (or at-least control this behavior) if there is no checkpoint, but to provide an empty one? This PR adds source specific checks to the DeltaSync class which I'd like to avoid.

I remove the exception throw part, instead of add a warning.

@nsivabalan
Copy link
Contributor

nsivabalan commented Oct 6, 2021

@fengjian428 : can you please address the feedback when you get a chance.

@nsivabalan
Copy link
Contributor

@fengjian428 : Do you think you can address the feedback in the next 2 to 3 weeks. We would like to get this in for 0.11.0. If you are occupied, one of us from the community can take it forward. Let us know.

@nsivabalan nsivabalan added the priority:critical production down; pipelines stalled; Need help asap. label Feb 8, 2022
@nsivabalan
Copy link
Contributor

@codope : this is ready for review now.

Copy link
Member

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just one minor comment.

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Member

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. CI passed.

@codope codope merged commit 55777fe into apache:master Feb 14, 2022
vingov pushed a commit to vingov/hudi that referenced this pull request Apr 3, 2022
* [HUDI-2413] fix Sql source's checkpoint

* Fixing sql source checkpoint handling

* Fixing docs

Co-authored-by: jian.feng <fengjian428@gmial.com>
Co-authored-by: sivabalan <n.siva.b@gmail.com>
@fengjian428 fengjian428 deleted the HUDI-2413 branch December 29, 2022 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:critical production down; pipelines stalled; Need help asap.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants