-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-20189: Separate metastore client code into its own module #5924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
<dependencies> | ||
<dependency> | ||
<groupId>org.apache.hive</groupId> | ||
<artifactId>hive-standalone-metastore-common</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new module depends on the metastore-common
, which doesn't reduce the size or dependencies from metastore-common
, not sure if it's ok, for the user what will benefit from the new module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's expected for a client module to depend on a common module. The client module delivers core client functionality and can be enhanced later with features like caching.
Offering a 'client' JAR aligns with common conventions and is expected to be user-friendly.
ATM we have 2 distinct cache wrappers in different Hive modules, just because there was no structure:
- org.apache.hadoop.hive.metastore.HiveClientCache
- org.apache.iceberg.hive.CachedClientPool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another point I don't understand, why ql
and beeline
modules depend on metastore-server
? I think we should drop the dependency on server and move the classes into the metastore common or client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I basically agree that we can potentially improve the structure in the future. We will have obvious guidelines about how to organize files.
- metastore-client: Client-specific files, which Server doesn't need
- metastore-server: Server-specific files, which Client doesn't need
- metastore-common: Common modules
another point I don't understand, why ql and beeline modules depend on metastore-server? I think we should drop the dependency on server and move the classes into the metastore common or client.
I guess ql
requires metastore-server to use an embedded HMS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't embedded HMS only used in tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it isn't. It can be used when metastore.thrift.uris
is empty. For example, our Hive docker image(not HMS docker image) can set up a HiveServer2 without HMS. It probably uses the embedded mode that runs HMS-equivalent threads in HS2.
Lines 2654 to 2663 in d7b3a5b
/** | |
* Check if metastore is being used in embedded mode. | |
* This utility function exists so that the logic for determining the mode is same | |
* in HiveConf and HiveMetaStoreClient | |
* @param msUri - metastore server uri | |
* @return true if the metastore is embedded | |
*/ | |
public static boolean isEmbeddedMetaStore(String msUri) { | |
return (msUri == null) || msUri.trim().isEmpty(); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we can say it is for testing purposes and separate the classes from hive-exec.
5461f07
to
24ca325
Compare
959dfa7
to
bc40344
Compare
I've kept the package unchanged for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch looks reasonable to me. I quickly checked whether there are any remaining candidates that could be moved to standalone-metastore-client, and it seems that all client-only-related classes have been gathered into the client package. I'm also in favor of the new HiveMetaStoreClientBuilder
class.
I left a few minor comments where I had questions. Also, it seems that the CI failed due to the unification of HiveClientCache. Could you update the TestPassProperties#testSequenceTableWriteReadMR
accordingly?
Regarding IMetaStoreClient
, HiveMetaStoreClient
, and RetryingMetaStoreClient
, I'm leaning towards refactoring them in a follow-up PR, as the large refactoring might obscure the existing changes.
@@ -41,7 +41,7 @@ | |||
import org.apache.hadoop.hive.metastore.api.PartitionsByExprRequest; | |||
import org.apache.hadoop.hive.metastore.api.PartitionsStatsRequest; | |||
import org.apache.hadoop.hive.metastore.api.Table; | |||
import org.apache.hadoop.hive.metastore.client.HiveMetaStoreClientUtils; | |||
import org.apache.hadoop.hive.metastore.client.utils.HiveMetaStoreClientUtils; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the order of import might be swapped as follows:
import org.apache.hadoop.hive.metastore.client.MetaStoreClientWrapper;
import org.apache.hadoop.hive.metastore.client.utils.HiveMetaStoreClientUtils;
@@ -92,7 +92,7 @@ | |||
import org.apache.hadoop.hive.metastore.api.TableValidWriteIds; | |||
import org.apache.hadoop.hive.metastore.api.UniqueConstraintsRequest; | |||
import org.apache.hadoop.hive.metastore.api.UniqueConstraintsResponse; | |||
import org.apache.hadoop.hive.metastore.client.HiveMetaStoreClientUtils; | |||
import org.apache.hadoop.hive.metastore.client.utils.HiveMetaStoreClientUtils; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same import order issue here.
@@ -65,15 +67,21 @@ public class RetryingMetaStoreClient implements InvocationHandler { | |||
private final UserGroupInformation ugi; | |||
private final int retryLimit; | |||
private final long retryDelaySeconds; | |||
private final ConcurrentHashMap<String, Long> metaCallTimeMap; | |||
private final Map<String, Long> metaCallTimeMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about removing all uses of ConcurrentHashMap
from this file? It seems that only one method uses ConcurrentHashMap
, and the actual value passed to that method is always null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the tab size is set to 4 inside Iterator { }
. Could we restore it to 2?
import org.apache.hadoop.hive.metastore.HiveMetaHookLoader; | ||
import org.apache.hadoop.hive.metastore.IMetaStoreClient; | ||
import org.apache.hadoop.hive.metastore.api.MetaException; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we remove the empty line and sort the import list?
9593477
to
9598be2
Compare
|
|
||
public HiveMetaStoreClientBuilder(Configuration conf) { | ||
Preconditions.checkNotNull(conf); | ||
this.conf = conf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can simplify it with one-line like this.conf = Preconditions.checkNotNull(conf)
. I prefer Objects.requireNonNull
for new code, though.
@@ -107,7 +107,7 @@ public HiveClientCache(final int timeout) { | |||
/** | |||
* @param timeout the length of time in seconds after a client is created that it should be automatically removed | |||
*/ | |||
private HiveClientCache(final int timeout, final int initialCapacity, final int maxCapacity, final boolean enableStats) { | |||
protected HiveClientCache(final int timeout, final int initialCapacity, final int maxCapacity, final boolean enableStats) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it simpler to make this public and directly instantiate HiveClientCache in HCatUtil?
What changes were proposed in this pull request?
move client classes into it's own module
Why are the changes needed?
improve the structure of standalone-metastore classes
Does this PR introduce any user-facing change?
No
How was this patch tested?
jenkins