-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-27984: Support backward compatibility of hms thrift struct about column stats #4984
Conversation
@@ -725,7 +725,7 @@ struct SetPartitionsStatsRequest { | |||
2: optional bool needMerge, //stats need to be merged with the existing stats | |||
3: optional i64 writeId=-1, // writeId for the current query that updates the stats | |||
4: optional string validWriteIdList, // valid write id list for the table for which this struct is being sent | |||
5: required string engine //engine creating the current request | |||
5: optional string engine = "hive" //engine creating the current request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make the engine
field optional, and give it a default value hive
which is consistent with the current existing code logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we set the engine
to default value hive
without changing it to optional
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't work. required
will enforce check engine
field in construction method though it has a default value. For some others framework(Trino) will use this class SetPartitionsStatsRequest
but don't want to care the engine
field, and they just reuse hive stats.
optional
has no any problem in current hive code. only need partial trival changes in HiveMetaStoreClient.java
. I think it is safe. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I am not familiar with Thrift, I expect the following behaviors.
Client protocol | HMS version | Behavior |
---|---|---|
Hive 3 (no engine is set) | Hive 4 w/o HIVE-27984 | Error |
Hive 4 w/o HIVE-27984 | Hive 4 w/o HIVE-27984 | The engine type specified by the client is used |
Hive 4 w/ HIVE-27984 | Hive 4 w/o HIVE-27984 | The engine type specified by the client is used. Error is a client nullified engine explicitly? |
Hive 3 (no engine is set) | Hive 4 w/ HIVE-27984 | engine=hive |
Hive 4 w/o HIVE-27984 | Hive 4 w/ HIVE-27984 | The engine type specified by the client is used |
Hive 4 w/ HIVE-27984 | Hive 4 w/ HIVE-27984 | The engine type specified by the client is used. engine-hive if a client nullifies engine explicitly? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are almost right. To be more specific, for example, Trino will get hms table column statistics by its thrift hms client which is compatible with hive3.
https://github.com/trinodb/trino/blob/057f2d435d6461855ff0f904dbd675b39db97d1e/plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftHiveMetastoreClient.java#L331-L336
But TableStatsRequest
in Hive4 need pass engine
through constructor function, and the engine
field is required, therefore:
- for hive3 thrift client, this api call will fail due to the missed
engine
field if HMS server doesn't have this PR, As there is no place for hive3 thrift client to passengine
field. But this api call will sucessfully if HMS has this PR(engine
field is optional and has the defalut value). - for hive4 thrift client without this PR, you also must pass
engine
field inTableStatsRequest
to call this api correctly no matter if HMS server has this PR. Becauseengine
field is needed inTableStatsRequest
constructor function. - for hive4 thrift client with this PR. Whether HMS server have this PR or not
, you can call the api sucessfully without passengine
as theengine
field is not mandatory inTableStatsRequest
constructor function, and you can explicitly reset theengine
fileld byTableStatsRequest::setEngine
if you don't want the defaulthive engine
column statistics. But i think most framework(like trino) will want reuse hive engine column statistics.
Sounds a little verbose.. :) In one word, this PR can make hive3 thrift hms client worked well with hive4 hms when calling statistics related api. I have tested this PR with Trino, and i think it is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for elaborating the behaviors. I'd say it is flexible and less surprising for people.
@zhangbutao hi, have you observed backward compatibility issues with Hive 2.3 client and Hive 4 HMS? also cc @sunchao |
@pan3793 I think it depends your use case, for this change, TrinoDB&StarRocks will use its customized thrift client and call |
@zhangbutao great! thanks for such information |
Can i request your review? @ayushtkn @dengzhhu653 @deniskuzZ |
e5f0074
to
c0551e5
Compare
Quality Gate passedThe SonarCloud Quality Gate passed, but some issues were introduced. 6 New issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for handling this issue!
@@ -725,7 +725,7 @@ struct SetPartitionsStatsRequest { | |||
2: optional bool needMerge, //stats need to be merged with the existing stats | |||
3: optional i64 writeId=-1, // writeId for the current query that updates the stats | |||
4: optional string validWriteIdList, // valid write id list for the table for which this struct is being sent | |||
5: required string engine //engine creating the current request | |||
5: optional string engine = "hive" //engine creating the current request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I am not familiar with Thrift, I expect the following behaviors.
Client protocol | HMS version | Behavior |
---|---|---|
Hive 3 (no engine is set) | Hive 4 w/o HIVE-27984 | Error |
Hive 4 w/o HIVE-27984 | Hive 4 w/o HIVE-27984 | The engine type specified by the client is used |
Hive 4 w/ HIVE-27984 | Hive 4 w/o HIVE-27984 | The engine type specified by the client is used. Error is a client nullified engine explicitly? |
Hive 3 (no engine is set) | Hive 4 w/ HIVE-27984 | engine=hive |
Hive 4 w/o HIVE-27984 | Hive 4 w/ HIVE-27984 | The engine type specified by the client is used |
Hive 4 w/ HIVE-27984 | Hive 4 w/ HIVE-27984 | The engine type specified by the client is used. engine-hive if a client nullifies engine explicitly? |
… column stats (apache#4984)(Butao Zhang, reviewed by okumin, Zhihua Deng)
… column stats (apache#4984)(Butao Zhang, reviewed by okumin, Zhihua Deng)
What changes were proposed in this pull request?
Why are the changes needed?
Please see details from HIVE-27984.
HIVE-22046 indroduced the must required
engine
field in hms thrift api, which broken the backward compatibility. This lead to hive3 hms thrift client can not call the hive4 column stats api.This PR try to make
engine
field optional and then this will reduce the incompatibility issues.Does this PR introduce any user-facing change?
No
Is the change a dependency upgrade?
No
How was this patch tested?
Test with local cluster. And this change has no negative effect to Hive.