New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-1790] Added SqlSource to fetch data from any partitions for backfill use case #2896
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2896 +/- ##
============================================
- Coverage 55.35% 8.43% -46.92%
+ Complexity 4025 62 -3963
============================================
Files 520 70 -450
Lines 25291 2880 -22411
Branches 2872 359 -2513
============================================
- Hits 13999 243 -13756
+ Misses 9905 2616 -7289
+ Partials 1387 21 -1366
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Hi @n3nash, Can you please review this PR since you have more context on this feature? |
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/SqlSource.java
Outdated
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/SqlSource.java
Outdated
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/SqlSource.java
Outdated
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/SqlSource.java
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/SqlSource.java
Outdated
Show resolved
Hide resolved
public void testSqlSource() throws IOException { | ||
UtilitiesTestBase.dfs.mkdirs(new Path(dfsRoot)); | ||
TypedProperties props = new TypedProperties(); | ||
props.setProperty("hoodie.deltastreamer.source.sql", "select * from test_sql_table"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor. Try to see if we can reuse the variable declared in source code rather than hardcoding the config key here. we could avoid any typos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable declared in the source is private, hence hardcoded it here, but to avoid typos made it as a final variable and used it everywhere.
hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestSqlSource.java
Show resolved
Hide resolved
@vingov : hey vinoth. Did you get a chance to check out my feedback. We can merge this in once addressed. |
@nsivabalan - I've addressed all the review comments, please review it again, thanks! |
b2cba5c
to
9ac435c
Compare
The test failures are not related to this change, its on hudi-client/hudi-spark-client modules. |
awesome. LGTM. Have added the test flakiness to tracking ticket: https://issues.apache.org/jira/browse/HUDI-1989 |
Thanks for your contribution :) |
Tips
What is the purpose of the pull request
This pull request adds a new source to delta streamer, to perform snapshot queries mainly used for backfilling historical partitions.
Brief change log
Verify this pull request
This change added tests and can be verified as follows:
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.