-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-1743] Added support for SqlFileBasedTransformer #2747
[HUDI-1743] Added support for SqlFileBasedTransformer #2747
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2747 +/- ##
============================================
- Coverage 52.06% 50.35% -1.71%
+ Complexity 3625 3253 -372
============================================
Files 479 425 -54
Lines 22804 20815 -1989
Branches 2415 2179 -236
============================================
- Hits 11872 10482 -1390
+ Misses 9907 9439 -468
+ Partials 1025 894 -131
Flags with carried forward coverage won't be shown. Click here to find out more. |
@vingov Thanks for your contribution. IMO, this is a feature, it would be better to file a Jira ticket to track it. |
@yanghua - I've already filled Jira and linked in the description https://issues.apache.org/jira/browse/HUDI-1743 Updated the title as well. |
@vingov Thanks, but the CI has failed, would you please check the reason? If you not sure, you can push an empty commit to retrigger the Travis. |
61d0222
to
6a76483
Compare
@yanghua - I've fixed the build, can you please merge this code? |
Thanks, IMO, can you add a unit test for the feature. |
@yanghua - I don't see the unit tests for the existing transformers except for two functions, I don't have time now to write unit tests, can I handle it in a separate pull request where I can write unit tests for all transformers? This is blocking my data pipelines, can we make an exception and merge this pull request? I'm happy to create a JIRA to track the unit tests for all transformers. thoughts? |
It's better to follow a unified contribution guide. If we can test it, we should test it, so that we can make sure the code quality.
You can pick this patch into your inner branch. wdyt? |
Codecov Report
@@ Coverage Diff @@
## master #2747 +/- ##
============================================
+ Coverage 52.06% 55.42% +3.36%
- Complexity 3625 3887 +262
============================================
Files 479 489 +10
Lines 22804 23657 +853
Branches 2415 2533 +118
============================================
+ Hits 11872 13113 +1241
+ Misses 9907 9376 -531
- Partials 1025 1168 +143
Flags with carried forward coverage won't be shown. Click here to find out more.
|
@yanghua - I have added the unit tests, Can you please review and merge? |
thanks for addressing my concerns. @vinothchandar will take over this PR. |
hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/SqlFileBasedTransformer.java
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/SqlFileBasedTransformer.java
Show resolved
Hide resolved
hudi-utilities/src/test/resources/delta-streamer-config/sql-file-transformer.sql
Show resolved
Hide resolved
@vingov : Did you get a chance to check out my feedback. Once addressed, we can land this. |
@nsivabalan - Addressed all the feedback, please review and merge this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vingov : couple of minor comments. We can land this in once addressed.
...utilities/src/test/java/org/apache/hudi/utilities/transform/TestSqlFileBasedTransformer.java
Outdated
Show resolved
Hide resolved
...utilities/src/test/java/org/apache/hudi/utilities/transform/TestSqlFileBasedTransformer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. will land this in once CI succeeds
What is the purpose of the pull request
The current SQLQuery based transformer is limited in functionality, you can't pass multiple Spark SQL statements separated by a semicolon which is necessary if your transformation is complex.
This pull-request adds a new SQLFileBasedTransformer which takes a Spark SQL file as input with multiple Spark SQL statements to support complex transformation logic and applies the transformation to the delta streamer payload.
Jira: https://issues.apache.org/jira/browse/HUDI-1743
Brief change log
Verify this pull request
This pull request is trivial work without any test coverage.
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.