Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-1929] Support configure KeyGenerator by type #2993

Merged
merged 1 commit into from Jun 8, 2021

Conversation

wangxianghu
Copy link
Contributor

@wangxianghu wangxianghu commented May 25, 2021

…y type

Tips

What is the purpose of the pull request

Support configure KeyGenerator by type

Verify this pull request

This pull request is already covered by existing tests, such as
org.apache.hudi.keygen.factory.TestHoodieSparkKeyGeneratorFactory
org.apache.hudi.keygen.TestCustomKeyGenerator
org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@wangxianghu wangxianghu force-pushed the HUDI-1929 branch 2 times, most recently from 3a43669 to 89e4fa7 Compare May 26, 2021 01:09
@codecov-commenter
Copy link

codecov-commenter commented May 26, 2021

Codecov Report

Merging #2993 (bd59a24) into master (870e97b) will decrease coverage by 15.42%.
The diff coverage is 80.00%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master    #2993       +/-   ##
=============================================
- Coverage     70.83%   55.41%   -15.43%     
- Complexity      385     3880     +3495     
=============================================
  Files            54      488      +434     
  Lines          2016    23632    +21616     
  Branches        241     2530     +2289     
=============================================
+ Hits           1428    13095    +11667     
- Misses          454     9369     +8915     
- Partials        134     1168     +1034     
Flag Coverage Δ
hudicli 39.95% <ø> (?)
hudiclient ∅ <ø> (∅)
hudicommon 50.30% <ø> (?)
hudiflink 63.39% <72.72%> (?)
hudihadoopmr 51.43% <ø> (?)
hudisparkdatasource 74.30% <100.00%> (?)
hudisync 51.32% <ø> (?)
huditimelineservice 64.36% <ø> (?)
hudiutilities 70.83% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../org/apache/hudi/streamer/FlinkStreamerConfig.java 0.00% <0.00%> (ø)
...c/main/java/org/apache/hudi/util/StreamerUtil.java 57.39% <ø> (ø)
...va/org/apache/hudi/configuration/FlinkOptions.java 91.53% <85.71%> (ø)
...e/hudi/sink/transform/RowDataToHoodieFunction.java 100.00% <100.00%> (ø)
...i/bootstrap/SparkParquetBootstrapDataProvider.java 80.00% <100.00%> (ø)
...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala 70.14% <100.00%> (ø)
...main/scala/org/apache/hudi/HoodieWriterUtils.scala 83.33% <100.00%> (ø)
...apache/hudi/utilities/deltastreamer/DeltaSync.java 70.84% <100.00%> (ø)
...a/org/apache/hudi/cli/commands/CommitsCommand.java 54.83% <0.00%> (ø)
...ain/java/org/apache/hudi/avro/HoodieAvroUtils.java 55.66% <0.00%> (ø)
... and 432 more

@wangxianghu wangxianghu force-pushed the HUDI-1929 branch 3 times, most recently from a065b46 to d0156e3 Compare May 26, 2021 04:47
@wangxianghu
Copy link
Contributor Author

@yanghua please take a look when free

@yanghua yanghua self-assigned this May 26, 2021
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some comments.

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment on two key gen configs.

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guess we need to fix quite a few more places.

val keyGeneratorClass = optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY,

public static KeyGenerator createKeyGenerator(TypedProperties props) throws IOException {

Also, I see you have added a keygen factory for spark. what's the plan for Flink ?

@wangxianghu
Copy link
Contributor Author

guess we need to fix quite a few more places.

val keyGeneratorClass = optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY,

public static KeyGenerator createKeyGenerator(TypedProperties props) throws IOException {

Also, I see you have added a keygen factory for spark. what's the plan for Flink ?
The keyGenerator of flink is in hudi-client-common, I plan to add a factory for flink and java engine. they can share it

@wangxianghu
Copy link
Contributor Author

few

Yes, several places. I will change them one by one

@wangxianghu
Copy link
Contributor Author

wangxianghu commented May 31, 2021

guess we need to fix quite a few more places.

val keyGeneratorClass = optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY,

public static KeyGenerator createKeyGenerator(TypedProperties props) throws IOException {

Also, I see you have added a keygen factory for spark. what's the plan for Flink ?

Currently, flink and java engine shares key generator, so I plan to add a factory in hudi-client-common to support both of them.

@yanghua yanghua assigned nsivabalan and unassigned yanghua Jun 1, 2021
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you able to fix all places where keygen is instantiated.

@nsivabalan
Copy link
Contributor

May I know what do you mean by "Yes, several places. I will change them one by one" ? Do you mean to say, you plan to address in a follow up patch ? IMO, it makes sense to do it in this patch itself.

@wangxianghu
Copy link
Contributor Author

May I know what do you mean by "Yes, several places. I will change them one by one" ? Do you mean to say, you plan to address in a follow up patch ? IMO, it makes sense to do it in this patch itself.

Ok, I'll update this pr to change all of them at once

@wangxianghu wangxianghu marked this pull request as draft June 3, 2021 03:31
@wangxianghu wangxianghu changed the title [HUDI-1929] Make HoodieDeltaStreamer support configure KeyGenerator by type [HUDI-1929] Support configure KeyGenerator by type Jun 3, 2021
@wangxianghu wangxianghu marked this pull request as ready for review June 3, 2021 12:40
@wangxianghu wangxianghu force-pushed the HUDI-1929 branch 2 times, most recently from 2c083a4 to 249c67a Compare June 3, 2021 12:46
@wangxianghu
Copy link
Contributor Author

hi @nsivabalan, I have addressed all your concerns, please take a look when free

@wangxianghu wangxianghu force-pushed the HUDI-1929 branch 2 times, most recently from bd19c7a to 82ae8d3 Compare June 5, 2021 00:57
@wangxianghu wangxianghu marked this pull request as ready for review June 5, 2021 00:57
@wangxianghu
Copy link
Contributor Author

@nsivabalan this pr is ready now, please take a look when free

PR Tracker Board automation moved this from Under Discussion PRs to Nearing Landing Jun 5, 2021
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last few minor comments. We can merge this in once addressed.

@nsivabalan
Copy link
Contributor

btw, do not forget to update our config page with the new config https://issues.apache.org/jira/browse/HUDI-1983. Please assign the ticket to yourself and take it up once the PR is landed. thanks in advance.

@wangxianghu
Copy link
Contributor Author

@nsivabalan thanks for the detailed review, I have addressed all your concerns.
PTAL

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few minor comments. we are almost there :)

}

@ParameterizedTest
@ValueSource(strings = {"SIMPLE", "TIMESTAMP","COMPLEX", "CUSTOM", "NON_PARTITION", "GLOBAL_DELETE"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar comment as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Will merge once CI succeeds. thanks for patiently addressing all feedback :)

@nsivabalan
Copy link
Contributor

Can you check the CI failure.

@wangxianghu wangxianghu closed this Jun 8, 2021
PR Tracker Board automation moved this from Nearing Landing to Done Jun 8, 2021
@wangxianghu
Copy link
Contributor Author

reopen to trigger the ci

@wangxianghu wangxianghu reopened this Jun 8, 2021
PR Tracker Board automation moved this from Done to Under Discussion PRs Jun 8, 2021
@wangxianghu wangxianghu force-pushed the HUDI-1929 branch 2 times, most recently from bd59a24 to c439bb0 Compare June 8, 2021 07:33
@wangxianghu wangxianghu closed this Jun 8, 2021
PR Tracker Board automation moved this from Under Discussion PRs to Done Jun 8, 2021
@wangxianghu
Copy link
Contributor Author

reopen to trigger the ci

@wangxianghu wangxianghu reopened this Jun 8, 2021
PR Tracker Board automation moved this from Done to Under Discussion PRs Jun 8, 2021
@nsivabalan
Copy link
Contributor

Thanks for the contribution :) Will land this in.

@nsivabalan nsivabalan merged commit 7261f08 into apache:master Jun 8, 2021
PR Tracker Board automation moved this from Under Discussion PRs to Done Jun 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants