New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-1194] Refactor HoodieHiveClient based on the way to call Hive API #1975
[HUDI-1194] Refactor HoodieHiveClient based on the way to call Hive API #1975
Conversation
304ef32
to
a68b0ac
Compare
if (optParams(HIVE_USE_JDBC_OPT_KEY).equals("true")) { | ||
optParams ++ Map(HIVE_CLIENT_CLASS_OPT_KEY -> DEFAULT_HIVE_CLIENT_CLASS_OPT_VAL) | ||
} else if (optParams(HIVE_USE_JDBC_OPT_KEY).equals("false")) { | ||
optParams ++ Map(HIVE_CLIENT_CLASS_OPT_KEY -> classOf[HoodieHiveDriverClient].getCanonicalName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment either here or in HoodieHiveDriverClient
explain why this is used when HIVE_USE_JDBC_OPT_KEY
is false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Here I just want to keep it the same behavior as before. Will add a comment.
@@ -94,7 +94,7 @@ | |||
<joda.version>2.9.9</joda.version> | |||
<hadoop.version>2.7.3</hadoop.version> | |||
<hive.groupid>org.apache.hive</hive.groupid> | |||
<hive.version>2.3.1</hive.version> | |||
<hive.version>2.3.6</hive.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we updating Hive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When running hive sync through Spark 2.x, it will use hive-spark as hive dependency which is 1.2.1-spark2
version. So I need to make sure all the Hive APIs used in HoodieHiveClient
are compatible with both hive-spark 1.2.1-spark2
and Hive 2.3.x
.
Actually for this alter_partition API: client.alter_partition(String, String, Partition)
. I didn't find a compatible API between 2.3.1 and 1.2.1. But I can find a compatible API with Hive 2.3.6 and hive-spark 1.2.1.
So I am thinking of bumping Hive version to 2.3.6. Is this acceptable to the community?
8f8e126
to
d941f7a
Compare
Codecov Report
@@ Coverage Diff @@
## master #1975 +/- ##
============================================
- Coverage 53.54% 53.52% -0.02%
Complexity 2770 2770
============================================
Files 348 348
Lines 16109 16120 +11
Branches 1643 1646 +3
============================================
+ Hits 8626 8629 +3
- Misses 6785 6792 +7
- Partials 698 699 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
d941f7a
to
22757e3
Compare
@vinothchandar Can you remove the |
@zhedoubushishi done |
@lw309637554 could you review this as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@n3nash : I started reviewing this, but looks like already modi reviewed this. Can you ask one of Uber folks to review and take this to finish line. Its been open for quite sometime.
val DEFAULT_HIVE_USE_JDBC_OPT_VAL = "true" | ||
val DEFAULT_HIVE_AUTO_CREATE_DATABASE_OPT_KEY = "true" | ||
val DEFAULT_HIVE_SKIP_RO_SUFFIX_VAL = "false" | ||
val DEFAULT_HIVE_SUPPORT_TIMESTAMP = "false" | ||
|
||
def translateUseJDBCToHiveClientClass(optParams: Map[String, String]) : Map[String, String] = { | ||
if (optParams.contains(HIVE_USE_JDBC_OPT_KEY) && !optParams.contains(HIVE_CLIENT_CLASS_OPT_KEY)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor optimization. all this matters only if hoodie.datasource.hive_sync.enable is enabled right? else we don't need to translate any of these.
hiveSyncConfig.autoCreateDatabase = Boolean.valueOf(props.getString(DataSourceWriteOptions.HIVE_AUTO_CREATE_DATABASE_OPT_KEY(), | ||
DataSourceWriteOptions.DEFAULT_HIVE_AUTO_CREATE_DATABASE_OPT_KEY())); | ||
hiveSyncConfig.skipROSuffix = Boolean.valueOf(props.getString(DataSourceWriteOptions.HIVE_SKIP_RO_SUFFIX(), | ||
DataSourceWriteOptions.DEFAULT_HIVE_SKIP_RO_SUFFIX_VAL())); | ||
hiveSyncConfig.supportTimestamp = Boolean.valueOf(props.getString(DataSourceWriteOptions.HIVE_SUPPORT_TIMESTAMP(), | ||
DataSourceWriteOptions.DEFAULT_HIVE_SUPPORT_TIMESTAMP())); | ||
hiveSyncConfig.hiveClientClass = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see these are used in some test data files as well. can you fix it in this patch or create a follow up jira.
docker/demo/sparksql-incremental.commands
import java.util.Collections; | ||
import java.util.List; | ||
|
||
public class HoodieHiveDriverClient extends HoodieHiveClient { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
java docs please.
import java.util.Map; | ||
import java.util.stream.Collectors; | ||
|
||
public class HoodieHiveClient extends AbstractSyncHoodieClient { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
java docs please
@jsbali Can you review this diff ? Since you are looking to add ways to invoke JDBC as well as Metastore. |
Sorry for the delay in responding. Similar PR was in progress and has been merged #2879. |
Yes since it's duplicated work, closed this PR. |
Tips
What is the purpose of the pull request
JIRA https://issues.apache.org/jira/browse/HUDI-1194
Brief change log
Separate
HoodieHiveClient
into three classes:HoodieHiveClient
which implements all the APIs through Metastore API.HoodieHiveJDBCClient
which extends from HoodieHiveClient and overwrite several the APIs through Hive JDBC.HoodieHiveDriverClient
which extends from HoodieHiveClient and overwrite several the APIs through Hive Driver.And also introduce a new parameter
hoodie.datasource.hive_sync.hive_client_class
which could let you choose which Hive Client class to use.Verify this pull request
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.