-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-5780] Refactor Deltastreamer source configs to use HoodieConfig/ConfigProperty #8184
Conversation
Need to address #8152 (comment) |
groupName = ConfigGroups.Names.DELTA_STREAMER, | ||
subGroupName = ConfigGroups.SubGroupNames.DELTA_STREAMER_SOURCE, | ||
description = "Configs that are common during ingestion across different cloud stores") | ||
public class CloudSourceConfig extends HoodieConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCS event incr source is using this one and S3 event incr source is using S3EventsHoodieIncrSourceConfig
. We should merge the latter into the former.
For S3 event incr source, we should check CloudSourceConfig
first then fallback to the keys in S3EventsHoodieIncrSourceConfig
, which should be marked as deprecated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's take this separately. This refactoring PR does not intend to change any behavior and purely for moving configs to use ConfigProperty
class.
@lokeshj1703 could you file JIRA to track this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.withDocumentation("Ignore objects in the bucket whose relative path contains this substring"); | ||
|
||
public static final ConfigProperty<String> SPARK_DATASOURCE_OPTIONS = ConfigProperty | ||
.key("hoodie.deltastreamer.source.cloud.data.datasource.options") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a precise name would be hoodie.deltastreamer.source.cloud_incr.spark.datasource.options
, at least we should highlight "spark". We should use newer name and keep the old one as deprecated alternative values.
Also refer to keys from S3EventsHoodieIncrSourceConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. @lokeshj1703 let's track this in a new JIRA ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this up @lokeshj1703 quite some work of refactoring. LGTM overall. I left a few minor comments which I'll address before merging the PR.
groupName = ConfigGroups.Names.DELTA_STREAMER, | ||
subGroupName = ConfigGroups.SubGroupNames.DELTA_STREAMER_SOURCE, | ||
description = "Configs that are common during ingestion across different cloud stores") | ||
public class CloudSourceConfig extends HoodieConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's take this separately. This refactoring PR does not intend to change any behavior and purely for moving configs to use ConfigProperty
class.
@lokeshj1703 could you file JIRA to track this?
.withDocumentation("Ignore objects in the bucket whose relative path contains this substring"); | ||
|
||
public static final ConfigProperty<String> SPARK_DATASOURCE_OPTIONS = ConfigProperty | ||
.key("hoodie.deltastreamer.source.cloud.data.datasource.options") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. @lokeshj1703 let's track this in a new JIRA ticket.
hudi-utilities/src/main/java/org/apache/hudi/utilities/config/GCSEventsSourceConfig.java
Outdated
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/GcsEventsHoodieIncrSource.java
Outdated
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java
Outdated
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
Outdated
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/SqlSource.java
Outdated
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/config/SqlSourceConfig.java
Outdated
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/config/HoodieIncrSourceConfig.java
Outdated
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/config/SourceTestConfig.java
Outdated
Show resolved
Hide resolved
…/ConfigProperty (apache#8184) This commit refactors Deltastreamer source configs to use HoodieConfig/ConfigProperty. The following classes are relevant: HiveIncrPullSource HoodieIncrSource JdbcSource PulsarSource S3EventsHoodieIncrSource SqlSource CloudObjectsSelector CloudStoreIngestionConfig DatePartitionPathSelector DFSPathSelector KafkaOffsetGen GcsIngestionConfig MaxwellJsonKafkaSourcePostProcessor Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
Change Logs
Relevant classes:
HiveIncrPullSource
HoodieIncrSource
JdbcSource
PulsarSource
S3EventsHoodieIncrSource
SqlSource
CloudObjectsSelector
CloudStoreIngestionConfig
DatePartitionPathSelector
DFSPathSelector
KafkaOffsetGen
GcsIngestionConfig
MaxwellJsonKafkaSourcePostProcessor
org.apache.hudi.utilities.testutils.sources.config.SourceConfigs
Impact
NA
Risk level (write none, low medium or high below)
low
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist