New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-4647] Keep the hive sync settings in spark sql consistent #6448
base: master
Are you sure you want to change the base?
Conversation
@hudi-bot run azure |
ec9886f
to
b2c2844
Compare
@@ -495,7 +496,7 @@ case class MergeIntoHoodieTableCommand(mergeInto: MergeIntoTable) extends Hoodie | |||
KEYGENERATOR_CLASS_NAME.key -> classOf[SqlKeyGenerator].getCanonicalName, | |||
SqlKeyGenerator.ORIGIN_KEYGEN_CLASS_NAME -> tableConfig.getKeyGeneratorClassName, | |||
HoodieSyncConfig.META_SYNC_ENABLED.key -> enableHive.toString, | |||
HiveSyncConfigHolder.HIVE_SYNC_MODE.key -> hiveSyncConfig.getString(HiveSyncConfigHolder.HIVE_SYNC_MODE), | |||
HiveSyncConfigHolder.HIVE_SYNC_MODE.key -> hiveSyncConfig.getStringOrDefault(HiveSyncConfigHolder.HIVE_SYNC_MODE, HiveSyncMode.HMS.name()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just add the default value below inside the HiveSyncConfigHolder
class?
public static final ConfigProperty<String> HIVE_SYNC_MODE = ConfigProperty
.key("hoodie.datasource.hive_sync.mode")
.noDefaultValue()
.withDocumentation("Mode to choose for Hive ops. Valid values are hms, jdbc and hiveql.");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @xushiyan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just add the default value below inside the
HiveSyncConfigHolder
class?public static final ConfigProperty<String> HIVE_SYNC_MODE = ConfigProperty .key("hoodie.datasource.hive_sync.mode") .noDefaultValue() .withDocumentation("Mode to choose for Hive ops. Valid values are hms, jdbc and hiveql.");
Here is to modify the global default value, which may affect a lot. I am not sure whether it is reasonable for each module. This PR only wants to change the default value corresponding to Spark SQL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongkelun thanks for the patch. there are 2 reasons to why we have to close this for now:
- we should keep default value consistent for different scenarios
- we don't want to introduce breaking changes (unless with strong reason) until 1.0 when we batch these breakings together
hence i'm tracking the tasks here https://issues.apache.org/jira/browse/HUDI-5062
|
@dongkelun ok in this case it's a different story. we should keep it aligned for all sql scenarios. I'm re-openning this PR. Can you please re-purpose this PR to move |
|
Map( | ||
HoodieSyncConfig.META_SYNC_ENABLED.key -> hiveSyncConfig.getString(HoodieSyncConfig.META_SYNC_ENABLED.key), | ||
HiveSyncConfigHolder.HIVE_SYNC_ENABLED.key -> hiveSyncConfig.getString(HiveSyncConfigHolder.HIVE_SYNC_ENABLED.key), | ||
HiveSyncConfigHolder.HIVE_SYNC_MODE.key -> hiveSyncConfig.getString(HiveSyncConfigHolder.HIVE_SYNC_MODE), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since most of sql scenarios have already been using HMS mode, we should ensure the existing sql behavior is not affected. So we need to keep it HMS and only for merge into this is changed. (which is acceptable for consistency reason)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4644aaa
to
1767369
Compare
@dongkelun can you please rebase? |
The job running on agent Azure Pipelines 8 ran longer than the maximum time of 150 minutes |
@xushiyan Hi, the CI has passed |
@xushiyan hello,can you please help me take a review? |
By default, an error will be reported when synchronizing hive. When using SQL, it is troublesome to fill in the JDBC URL parameter every time
Change Logs
Keep the hive sync settings in spark sql consistent
Impact
Keep the hive sync settings in spark sql consistent
Risk level (write none, low medium or high below)
none
Contributor's checklist