Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hoodie-hive-hundle don't have hive jars #736

Closed
cdmikechen opened this issue Jun 14, 2019 · 14 comments
Closed

hoodie-hive-hundle don't have hive jars #736

cdmikechen opened this issue Jun 14, 2019 · 14 comments

Comments

@cdmikechen
Copy link
Contributor

when using run_sync_tool.sh to sync a table like that:

./run_sync_tool.sh --user hdfs --database xxx --jdbc-url "jdbc:hive2://ip:10000/" --base-path /hive/warehouse/xxx/xx/ --table xx --pass  ""

hoodie return this error:

2019-06-14 04:33:25,746 ERROR [main] metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(204)) - java.lang.NoClassDefFoundError: org/datanucleus/NucleusContext
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.hadoop.hive.metastore.MetaStoreUtils.getClass(MetaStoreUtils.java:1674)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:64)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:628)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:594)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
	at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6891)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:164)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:129)
	at com.uber.hoodie.hive.HoodieHiveClient.<init>(HoodieHiveClient.java:102)
	at com.uber.hoodie.hive.HiveSyncTool.<init>(HiveSyncTool.java:61)
	at com.uber.hoodie.hive.HiveSyncTool.main(HiveSyncTool.java:189)
Caused by: java.lang.ClassNotFoundException: org.datanucleus.NucleusContext
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 23 more

I found this class is in in hoodie-hive-bundle pom (hive-metastore), but when packaged a hoodie-hive-bundle-0.4.8-SNAPSHOT.jar, this class is missing. And in run_sync_tool.sh, it doesn't include hive lib about hive-metastore and its dependencies lib.
I think maybe we can include hive jars in hoodie-hive-bundle pom.

@vinothchandar
Copy link
Member

We are revisiting jar/bundles ground up.. Will factor this in and get back.

you can find some progress in the hackathon-0619 branch

@eisig
Copy link
Contributor

eisig commented Jun 24, 2019

@cdmikechen
Do you have set hive metastore uris in the hive config file?
or export an env
export HOODIE_ENV_hive_DOT_metastore_DOT_uris="thrift://xx.xx.xx.xx:9083"

@cdmikechen
Copy link
Contributor Author

@eisig
have set. I run shell with all hive jars like that:

#java -cp $HOODIE_HIVE_UBER_JAR:${HADOOP_HIVE_JARS}:${HADOOP_CONF_DIR} com.uber.hoodie.hive.HiveSyncTool "$@"
java -cp $HOODIE_HIVE_UBER_JAR:${HADOOP_HIVE_JARS}:${HADOOP_CONF_DIR}:${HIVE_HOME}/lib/* com.uber.hoodie.hive.HiveSyncTool "$@"

and it can run.

@vinothchandar
Copy link
Member

@cdmikechen these jars are in the hive installation, thats why we don't bundle them.

ls hive/lib/datanucleus-*
hive/lib/datanucleus-api-jdo-4.2.4.jar	hive/lib/datanucleus-core-4.1.17.jar  hive/lib/datanucleus-rdbms-4.1.19.jar
root@adhoc-2:/opt#

is it possible the the script is not just picking them up? are you able to repro this on top of #751 and see if this still is an issue?

@vinothchandar
Copy link
Member

@cdmikechen any updates on this?

@cdmikechen
Copy link
Contributor Author

@vinothchandar
Sorry, I may not have time to re-validate it recently. I will upgrade my Hudi to 0.5.0 next week and then test again.

@vinothchandar
Copy link
Member

No worries.

@vinothchandar
Copy link
Member

Closing due to inactivity

@haospotai
Copy link

@vinothchandar
Sorry, I may not have time to re-validate it recently. I will upgrade my Hudi to 0.5.0 next week and then test again.

It's still same issue on 0.5.0 version

@vinothchandar
Copy link
Member

https://github.com/apache/incubator-hudi/blob/master/hudi-hive/run_sync_tool.sh#L30 Adds in all the jars.. and we use the script in the docker setup successfully. Could you reproduce this in the docker setup we have and we can go from there? Trying to understand if this an HIVE_HOME config issue..

@haospotai
Copy link

haospotai commented Dec 23, 2019

https://github.com/apache/incubator-hudi/blob/master/hudi-hive/run_sync_tool.sh#L30 Adds in all the jars.. and we use the script in the docker setup successfully. Could you reproduce this in the docker setup we have and we can go from there? Trying to understand if this an HIVE_HOME config issue..

Hi I add below code inside run_sync_tool, then It works , but I did not try it in docker

if [ -z "$HIVE_CONF_DIR" ]; then
  echo "setting hive conf dir"
  HIVE_CONF_DIR="${HIVE_HOME}/conf"
fi

otherwise there will be throw this kind of exception

Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "dbcp-builtin" plugin to create a ConnectionPool gave an error : The specified datastore driver ("org.apache.derby.jdbc.EmbeddedDriver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
	at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:232)
	at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:117)
	at org.datanucleus.store.rdbms.ConnectionFactoryImpl.<init>(ConnectionFactoryImpl.java:82)
	... 58 more
Caused by: org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The specified datastore driver ("org.apache.derby.jdbc.EmbeddedDriver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
	at org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:58)
	at org.datanucleus.store.rdbms.connectionpool.DBCPBuiltinConnectionPoolFactory.createConnectionPool(DBCPBuiltinConnectionPoolFactory.java:49)
	at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:213)
	... 60 more

@vinothchandar
Copy link
Member

I think the code assumes everything is in the hadoop conf

if [ -z "$HADOOP_CONF_DIR" ]; then
  echo "setting hadoop conf dir"
  HADOOP_CONF_DIR="${HADOOP_HOME}/etc/hadoop"
fi

In either case, still cant understand how adding conf to the classpath will resolve the driver not being found.. is there a direct link? i.e do you know why exactly adding the conf directory helps?

@badri03iter
Copy link

badri03iter commented Feb 11, 2021

Hi Team,
while running ./run_sync_tool.sh --jdbc-url jdbc:hive2://localhost:10000/default --user *** --pass *** --partitioned-by data_load_date --base-path XXX--database default --table default.table , I am facing the same issue as above mentioned..
2021-02-10 16:07:16,381 ERROR [main] metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(204)) - java.lang.NoClassDefFoundError: org/datanucleus/NucleusContext
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getClass(MetaStoreUtils.java:1674)
at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:64)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:628)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:594)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:79)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6891)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:164)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1706)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3600)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3652)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3632)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3894)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:248)
at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:231)
at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:388)
at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:332)
at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:312)
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:288)
at org.apache.hudi.hive.HoodieHiveClient.(HoodieHiveClient.java:91)
at org.apache.hudi.hive.HiveSyncTool.(HiveSyncTool.java:66)
at org.apache.hudi.hive.HiveSyncTool.main(HiveSyncTool.java:231)
Caused by: java.lang.ClassNotFoundException: org.datanucleus.NucleusContext
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 41 more

datanucleus packages are present in
/mnt/apache-hive-2.3.5-bin/lib

Any pointers ?

@stym06
Copy link
Contributor

stym06 commented Jul 2, 2021

This works for Hive 3 also, if we include all the jars in $HIVE_HOME/lib/* folder while running hive_sync.sh
java -cp $HUDI_HIVE_UBER_JAR:${HADOOP_HIVE_JARS}:${HADOOP_CONF_DIR}:${HIVE_HOME}/lib/* org.apache.hudi.hive.HiveSyncTool "$@"

vinishjail97 pushed a commit to vinishjail97/hudi that referenced this issue Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants