Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-1743] Added support for SqlFileBasedTransformer #2747

Merged
merged 12 commits into from
Jun 8, 2021

Conversation

vingov
Copy link
Contributor

@vingov vingov commented Mar 31, 2021

What is the purpose of the pull request

The current SQLQuery based transformer is limited in functionality, you can't pass multiple Spark SQL statements separated by a semicolon which is necessary if your transformation is complex.

This pull-request adds a new SQLFileBasedTransformer which takes a Spark SQL file as input with multiple Spark SQL statements to support complex transformation logic and applies the transformation to the delta streamer payload.

Jira: https://issues.apache.org/jira/browse/HUDI-1743

Brief change log

  • Adds Spark SQL File-based transformer to support complex multiline transformations.

Verify this pull request

This pull request is trivial work without any test coverage.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@vingov vingov changed the title [HUDI-1743] Added support for SqlFileBasedTransformer [MINOR] Added support for SqlFileBasedTransformer Mar 31, 2021
@codecov-io
Copy link

codecov-io commented Mar 31, 2021

Codecov Report

Merging #2747 (1b5622a) into master (aa0da72) will decrease coverage by 1.70%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #2747      +/-   ##
============================================
- Coverage     52.06%   50.35%   -1.71%     
+ Complexity     3625     3253     -372     
============================================
  Files           479      425      -54     
  Lines         22804    20815    -1989     
  Branches       2415     2179     -236     
============================================
- Hits          11872    10482    -1390     
+ Misses         9907     9439     -468     
+ Partials       1025      894     -131     
Flag Coverage Δ Complexity Δ
hudicli 37.01% <ø> (ø) 0.00 <ø> (ø)
hudiclient ∅ <ø> (∅) 0.00 <ø> (ø)
hudicommon 50.94% <ø> (-0.03%) 0.00 <ø> (ø)
hudiflink 56.01% <ø> (ø) 0.00 <ø> (ø)
hudihadoopmr 33.44% <ø> (ø) 0.00 <ø> (ø)
hudisparkdatasource 70.87% <ø> (ø) 0.00 <ø> (ø)
hudisync 45.47% <ø> (ø) 0.00 <ø> (ø)
huditimelineservice 64.36% <ø> (ø) 0.00 <ø> (ø)
hudiutilities ? ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...ache/hudi/common/fs/inline/InMemoryFileSystem.java 79.31% <0.00%> (-10.35%) 15.00% <0.00%> (-1.00%)
...udi/utilities/deltastreamer/BootstrapExecutor.java
.../org/apache/hudi/utilities/sources/JsonSource.java
...g/apache/hudi/utilities/sources/JsonDFSSource.java
.../hudi/utilities/schema/RowBasedSchemaProvider.java
...apache/hudi/utilities/deltastreamer/DeltaSync.java
...lities/schema/SchemaProviderWithPostProcessor.java
...ties/exception/HoodieIncrementalPullException.java
.../hudi/utilities/schema/SchemaRegistryProvider.java
...ck/kafka/HoodieWriteCommitKafkaCallbackConfig.java
... and 45 more

@yanghua
Copy link
Contributor

yanghua commented Mar 31, 2021

@vingov Thanks for your contribution. IMO, this is a feature, it would be better to file a Jira ticket to track it.

@vingov
Copy link
Contributor Author

vingov commented Mar 31, 2021

@yanghua - I've already filled Jira and linked in the description https://issues.apache.org/jira/browse/HUDI-1743

Updated the title as well.

@vingov vingov changed the title [MINOR] Added support for SqlFileBasedTransformer [HUDI_1743] Added support for SqlFileBasedTransformer Mar 31, 2021
@yanghua
Copy link
Contributor

yanghua commented Mar 31, 2021

@vingov Thanks, but the CI has failed, would you please check the reason? If you not sure, you can push an empty commit to retrigger the Travis.

@yanghua yanghua changed the title [HUDI_1743] Added support for SqlFileBasedTransformer [HUDI-1743] Added support for SqlFileBasedTransformer Mar 31, 2021
@vingov vingov force-pushed the HUDI-1743_sql_file_based_transformer branch from 61d0222 to 6a76483 Compare March 31, 2021 18:55
@vingov
Copy link
Contributor Author

vingov commented Apr 1, 2021

@yanghua - I've fixed the build, can you please merge this code?

@yanghua
Copy link
Contributor

yanghua commented Apr 1, 2021

@yanghua - I've fixed the build, can you please merge this code?

Thanks, IMO, can you add a unit test for the feature.

@vingov
Copy link
Contributor Author

vingov commented Apr 7, 2021

@yanghua - I don't see the unit tests for the existing transformers except for two functions, I don't have time now to write unit tests, can I handle it in a separate pull request where I can write unit tests for all transformers?

This is blocking my data pipelines, can we make an exception and merge this pull request? I'm happy to create a JIRA to track the unit tests for all transformers. thoughts?

@yanghua
Copy link
Contributor

yanghua commented Apr 8, 2021

@yanghua - I don't see the unit tests for the existing transformers except for two functions, I don't have time now to write unit tests, can I handle it in a separate pull request where I can write unit tests for all transformers?

It's better to follow a unified contribution guide. If we can test it, we should test it, so that we can make sure the code quality.

This is blocking my data pipelines, can we make an exception and merge this pull request? I'm happy to create a JIRA to track the unit tests for all transformers. thoughts?

You can pick this patch into your inner branch. wdyt?

@vinothchandar vinothchandar added this to Ready For Review in PR Tracker Board Apr 15, 2021
@vinothchandar vinothchandar self-assigned this Apr 19, 2021
@vinothchandar vinothchandar moved this from Opened PRs to Ready for Review in PR Tracker Board Apr 19, 2021
@codecov-commenter
Copy link

codecov-commenter commented Apr 24, 2021

Codecov Report

Merging #2747 (4d1f482) into master (aa0da72) will increase coverage by 3.36%.
The diff coverage is 91.30%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #2747      +/-   ##
============================================
+ Coverage     52.06%   55.42%   +3.36%     
- Complexity     3625     3887     +262     
============================================
  Files           479      489      +10     
  Lines         22804    23657     +853     
  Branches       2415     2533     +118     
============================================
+ Hits          11872    13113    +1241     
+ Misses         9907     9376     -531     
- Partials       1025     1168     +143     
Flag Coverage Δ
hudicli 39.95% <ø> (+2.94%) ⬆️
hudiclient ∅ <ø> (∅)
hudicommon 50.30% <ø> (-0.68%) ⬇️
hudiflink 63.25% <ø> (+7.23%) ⬆️
hudihadoopmr 51.43% <ø> (+17.98%) ⬆️
hudisparkdatasource 74.30% <ø> (+3.42%) ⬆️
hudisync 51.32% <ø> (+5.85%) ⬆️
huditimelineservice 64.36% <ø> (ø)
hudiutilities 71.11% <91.30%> (+1.37%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...i/utilities/transform/SqlFileBasedTransformer.java 91.30% <91.30%> (ø)
.../main/java/org/apache/hudi/util/AvroConvertor.java 0.00% <0.00%> (-82.36%) ⬇️
.../apache/hudi/sink/compact/CompactionPlanEvent.java 50.00% <0.00%> (-50.00%) ⬇️
...pache/hudi/sink/compact/CompactionCommitEvent.java 43.75% <0.00%> (-43.75%) ⬇️
...ache/hudi/sink/compact/CompactionPlanOperator.java 58.00% <0.00%> (-21.07%) ⬇️
.../org/apache/hudi/sink/utils/NonThrownExecutor.java 66.66% <0.00%> (-11.12%) ⬇️
...ache/hudi/common/fs/inline/InMemoryFileSystem.java 79.31% <0.00%> (-10.35%) ⬇️
...hadoop/realtime/RealtimeCompactedRecordReader.java 65.62% <0.00%> (-7.11%) ⬇️
...he/hudi/common/bootstrap/index/BootstrapIndex.java 94.11% <0.00%> (-5.89%) ⬇️
...g/apache/hudi/sink/partitioner/BucketAssigner.java 82.29% <0.00%> (-5.64%) ⬇️
... and 126 more

@vingov
Copy link
Contributor Author

vingov commented Apr 24, 2021

@yanghua - I have added the unit tests, Can you please review and merge?

@yanghua
Copy link
Contributor

yanghua commented Apr 25, 2021

@yanghua - I have added the unit tests, Can you please review and merge?

thanks for addressing my concerns. @vinothchandar will take over this PR.

@nsivabalan nsivabalan added the priority:minor everything else; usability gaps; questions; feature reqs label May 11, 2021
@vinothchandar vinothchandar removed their assignment May 18, 2021
PR Tracker Board automation moved this from Ready for Review to Nearing Landing May 19, 2021
@nsivabalan
Copy link
Contributor

@vingov : Did you get a chance to check out my feedback. Once addressed, we can land this.

@vingov
Copy link
Contributor Author

vingov commented Jun 6, 2021

@nsivabalan - Addressed all the feedback, please review and merge this PR.

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vingov : couple of minor comments. We can land this in once addressed.

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. will land this in once CI succeeds

@nsivabalan nsivabalan merged commit 57611d1 into apache:master Jun 8, 2021
PR Tracker Board automation moved this from Nearing Landing to Done Jun 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:minor everything else; usability gaps; questions; feature reqs
Projects
Development

Successfully merging this pull request may close these issues.

None yet

6 participants