Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-1609] How to disable Hive JDBC and enable metastore #1679

Closed
selvarajperiyasamy opened this issue May 28, 2020 · 35 comments
Closed

[HUDI-1609] How to disable Hive JDBC and enable metastore #1679

selvarajperiyasamy opened this issue May 28, 2020 · 35 comments
Assignees
Labels
meta-sync priority:major degraded perf; unable to move forward; potential bugs

Comments

@selvarajperiyasamy
Copy link

selvarajperiyasamy commented May 28, 2020

Team,

My spark version is 2.3.0
Scala version 2.11.8
Hive version 1.2.2

I see the below comment in Hudi code. How can I start using metastore client for hive registrations? is there a way to disable useJdbc flag?

// Support both JDBC and metastore based implementations for backwards compatiblity. Future users should
// disable jdbc and depend on metastore client for all hive registrations

Below is my log. It makes hive JDBC connection and failing due to method not available error.

20/05/26 15:38:15 INFO HoodieSparkSqlWriter$: Syncing to Hive Metastore (URL: jdbc:hive2://server1.visa.com:2181,server2.visa.com:2181,server3.visa.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2)
20/05/26 15:38:15 INFO FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://oprhqanameservice], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/etc/spark2/2.6.5.179-4/0/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1153590032_1, ugi=svchdc36q@VISA.COM (auth:KERBEROS)]]]
20/05/26 15:38:15 INFO HiveConf: Found configuration file file:/etc/spark2/2.6.5.179-4/0/hive-site.xml
20/05/26 15:38:16 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from /projects/cdp/data/cdp_reporting/trr
20/05/26 15:38:16 INFO FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://oprhqanameservice], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/etc/spark2/2.6.5.179-4/0/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1153590032_1, ugi=svchdc36q@VISA.COM (auth:KERBEROS)]]]
20/05/26 15:38:16 INFO HoodieTableConfig: Loading dataset properties from /projects/cdp/data/cdp_reporting/trr/.hoodie/hoodie.properties
20/05/26 15:38:16 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE from /projects/cdp/data/cdp_reporting/trr
20/05/26 15:38:16 INFO HoodieTableMetaClient: Loading Active commit timeline for /projects/cdp/data/cdp_reporting/trr
20/05/26 15:38:16 INFO HoodieActiveTimeline: Loaded instants java.util.stream.ReferencePipeline$Head@a1fca5a
20/05/26 15:38:16 INFO HoodieHiveClient: Creating hive connection jdbc:hive2://server1.visa.com:2181,server2.visa.com:2181,server3.visa.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
20/05/26 15:38:16 INFO Utils: Supplied authorities: server1.visa.com:2181,server2.visa.com:2181,server3.visa.com:2181
20/05/26 15:38:16 INFO CuratorFrameworkImpl: Starting
20/05/26 15:38:16 INFO ZooKeeper: Client environment:zookeeper.version=3.4.6-4--1, built on 08/09/2019 23:18 GMT
20/05/26 15:38:16 INFO ZooKeeper: Client environment:host.name=server4.visa.com
20/05/26 15:38:16 INFO ZooKeeper: Client environment:java.version=1.8.0_241
20/05/26 15:38:16 INFO ZooKeeper: Client environment:java.vendor=Oracle Corporation
20/05/26 15:38:16 INFO ZooKeeper: Client environment:java.home=/usr/java/jdk1.8.0_241-amd64/jre
20/05/26 15:38:16 INFO ZooKeeper: Client environment:java.class.path=/usr/hdp/current/spark2-client/conf/:/usr/hdp/current/spark2-client/jars/hk2-api-2.4.0-b34.jar:/usr/hdp/current/spark2-client/jars/JavaEWAH-0.3.2.jar:/usr/hdp/current/spark2-client/jars/commons-pool-1.5.4.jar:/usr/hdp/current/spark2-client/jars/RoaringBitmap-0.5.11.jar:/usr/hdp/current/spark2-client/jars/hk2-locator-2.4.0-b34.jar:/usr/hdp/current/spark2-client/jars/ST4-4.0.4.jar:/usr/hdp/current/spark2-client/jars/compress-lzf-1.0.3.jar:/usr/hdp/current/spark2-client/jars/activation-1.1.1.jar:/usr/hdp/current/spark2-client/jars/core-1.1.2.jar:/usr/hdp/current/spark2-client/jars/aircompressor-0.8.jar:/usr/hdp/current/spark2-client/jars/hk2-utils-2.4.0-b34.jar:/usr/hdp/current/spark2-client/jars/antlr-2.7.7.jar:/usr/hdp/current/spark2-client/jars/curator-client-2.7.1.jar:/usr/hdp/current/spark2-client/jars/antlr-runtime-3.4.jar:/usr/hdp/current/spark2-client/jars/curator-framework-2.7.1.jar:/usr/hdp/current/spark2-client/jars/antlr4-runtime-4.7.jar:/usr/hdp/current/spark2-client/jars/ivy-2.4.0.jar:/usr/hdp/current/spark2-client/jars/aopalliance-1.0.jar:/usr/hdp/current/spark2-client/jars/commons-io-2.4.jar:/usr/hdp/current/spark2-client/jars/janino-3.0.8.jar:/usr/hdp/current/spark2-client/jars/aopalliance-repackaged-2.4.0-b34.jar:/usr/hdp/current/spark2-client/jars/commons-collections-3.2.2.jar:/usr/hdp/current/spark2-client/jars/apache-log4j-extras-1.2.17.jar:/usr/hdp/current/spark2-client/jars/curator-recipes-2.7.1.jar:/usr/hdp/current/spark2-client/jars/apacheds-i18n-2.0.0-M15.jar:/usr/hdp/current/spark2-client/jars/commons-cli-1.2.jar:/usr/hdp/current/spark2-client/jars/javax.inject-1.jar:/usr/hdp/current/spark2-client/jars/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/hdp/current/spark2-client/jars/datanucleus-api-jdo-3.2.6.jar:/usr/hdp/current/spark2-client/jars/api-asn1-api-1.0.0-M20.jar:/usr/hdp/current/spark2-client/jars/datanucleus-core-3.2.10.jar:/usr/hdp/current/spark2-client/jars/api-util-1.0.0-M20.jar:/usr/hdp/current/spark2-client/jars/datanucleus-rdbms-3.2.9.jar:/usr/hdp/current/spark2-client/jars/arpack_combined_all-0.1.jar:/usr/hdp/current/spark2-client/jars/derby-10.12.1.1.jar:/usr/hdp/current/spark2-client/jars/arrow-format-0.8.0.jar:/usr/hdp/current/spark2-client/jars/eigenbase-properties-1.1.5.jar:/usr/hdp/current/spark2-client/jars/arrow-memory-0.8.0.jar:/usr/hdp/current/spark2-client/jars/flatbuffers-1.2.0-3f79e055.jar:/usr/hdp/current/spark2-client/jars/arrow-vector-0.8.0.jar:/usr/hdp/current/spark2-client/jars/hppc-0.7.2.jar:/usr/hdp/current/spark2-client/jars/avro-1.7.7.jar:/usr/hdp/current/spark2-client/jars/httpclient-4.5.2.jar:/usr/hdp/current/spark2-client/jars/avro-ipc-1.7.7.jar:/usr/hdp/current/spark2-client/jars/commons-compiler-3.0.8.jar:/usr/hdp/current/spark2-client/jars/avro-mapred-1.7.7-hadoop2.jar:/usr/hdp/current/spark2-client/jars/commons-compress-1.4.1.jar:/usr/hdp/current/spark2-client/jars/aws-java-sdk-core-1.10.6.jar:/usr/hdp/current/spark2-client/jars/guava-14.0.1.jar:/usr/hdp/current/spark2-client/jars/aws-java-sdk-kms-1.10.6.jar:/usr/hdp/current/spark2-client/jars/gson-2.2.4.jar:/usr/hdp/current/spark2-client/jars/aws-java-sdk-s3-1.10.6.jar:/usr/hdp/current/spark2-client/jars/commons-configuration-1.6.jar:/usr/hdp/current/spark2-client/jars/azure-data-lake-store-sdk-2.1.4.jar:/usr/hdp/current/spark2-client/jars/commons-lang-2.6.jar:/usr/hdp/current/spark2-client/jars/jpam-1.1.jar:/usr/hdp/current/spark2-client/jars/azure-keyvault-core-0.8.0.jar:/usr/hdp/current/spark2-client/jars/guice-3.0.jar:/usr/hdp/current/spark2-client/jars/azure-storage-5.4.0.jar:/usr/hdp/current/spark2-client/jars/httpcore-4.4.4.jar:/usr/hdp/current/spark2-client/jars/base64-2.3.8.jar:/usr/hdp/current/spark2-client/jars/guice-servlet-3.0.jar:/usr/hdp/current/spark2-client/jars/bcprov-jdk15on-1.58.jar:/usr/hdp/current/spark2-client/jars/hadoop-aws-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/bonecp-0.8.0.RELEASE.jar:/usr/hdp/current/spark2-client/jars/commons-lang3-3.5.jar:/usr/hdp/current/spark2-client/jars/breeze-macros_2.11-0.13.2.jar:/usr/hdp/current/spark2-client/jars/hadoop-auth-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/breeze_2.11-0.13.2.jar:/usr/hdp/current/spark2-client/jars/commons-codec-1.10.jar:/usr/hdp/current/spark2-client/jars/jta-1.1.jar:/usr/hdp/current/spark2-client/jars/calcite-avatica-1.2.0-incubating.jar:/usr/hdp/current/spark2-client/jars/commons-logging-1.1.3.jar:/usr/hdp/current/spark2-client/jars/calcite-core-1.2.0-incubating.jar:/usr/hdp/current/spark2-client/jars/commons-math3-3.4.1.jar:/usr/hdp/current/spark2-client/jars/calcite-linq4j-1.2.0-incubating.jar:/usr/hdp/current/spark2-client/jars/hadoop-azure-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/chill-java-0.8.4.jar:/usr/hdp/current/spark2-client/jars/hadoop-client-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/chill_2.11-0.8.4.jar:/usr/hdp/current/spark2-client/jars/hadoop-common-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/commons-beanutils-1.7.0.jar:/usr/hdp/current/spark2-client/jars/gcs-connector-1.8.1.2.6.5.179-4-shaded.jar:/usr/hdp/current/spark2-client/jars/commons-beanutils-core-1.8.0.jar:/usr/hdp/current/spark2-client/jars/commons-crypto-1.0.0.jar:/usr/hdp/current/spark2-client/jars/hadoop-hdfs-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/commons-dbcp-1.4.jar:/usr/hdp/current/spark2-client/jars/hive-beeline-1.21.2.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/commons-digester-1.8.jar:/usr/hdp/current/spark2-client/jars/hive-cli-1.21.2.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/commons-httpclient-3.1.jar:/usr/hdp/current/spark2-client/jars/jackson-core-2.6.7.jar:/usr/hdp/current/spark2-client/jars/commons-net-2.2.jar:/usr/hdp/current/spark2-client/jars/javolution-5.5.1.jar:/usr/hdp/current/spark2-client/jars/jersey-server-2.22.2.jar:/usr/hdp/current/spark2-client/jars/xz-1.0.jar:/usr/hdp/current/spark2-client/jars/hadoop-annotations-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jets3t-0.9.4.jar:/usr/hdp/current/spark2-client/jars/okhttp-2.7.5.jar:/usr/hdp/current/spark2-client/jars/hadoop-azure-datalake-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jersey-container-servlet-2.22.2.jar:/usr/hdp/current/spark2-client/jars/hadoop-mapreduce-client-app-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jdo-api-3.0.1.jar:/usr/hdp/current/spark2-client/jars/libfb303-0.9.3.jar:/usr/hdp/current/spark2-client/jars/hadoop-mapreduce-client-common-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jersey-container-servlet-core-2.22.2.jar:/usr/hdp/current/spark2-client/jars/hadoop-mapreduce-client-core-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jersey-client-2.22.2.jar:/usr/hdp/current/spark2-client/jars/netty-3.9.9.Final.jar:/usr/hdp/current/spark2-client/jars/hadoop-mapreduce-client-jobclient-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jersey-common-2.22.2.jar:/usr/hdp/current/spark2-client/jars/netty-all-4.1.17.Final.jar:/usr/hdp/current/spark2-client/jars/hadoop-mapreduce-client-shuffle-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jetty-6.1.26.hwx.jar:/usr/hdp/current/spark2-client/jars/okio-1.6.0.jar:/usr/hdp/current/spark2-client/jars/hadoop-openstack-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jetty-sslengine-6.1.26.hwx.jar:/usr/hdp/current/spark2-client/jars/hadoop-yarn-api-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jetty-util-6.1.26.hwx.jar:/usr/hdp/current/spark2-client/jars/hadoop-yarn-client-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jline-2.12.1.jar:/usr/hdp/current/spark2-client/jars/opencsv-2.3.jar:/usr/hdp/current/spark2-client/jars/hadoop-yarn-common-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/joda-time-2.9.3.jar:/usr/hdp/current/spark2-client/jars/oro-2.0.8.jar:/usr/hdp/current/spark2-client/jars/hadoop-yarn-registry-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jersey-guava-2.22.2.jar:/usr/hdp/current/spark2-client/jars/paranamer-2.8.jar:/usr/hdp/current/spark2-client/jars/hadoop-yarn-server-common-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jersey-media-jaxb-2.22.2.jar:/usr/hdp/current/spark2-client/jars/py4j-0.10.6.jar:/usr/hdp/current/spark2-client/jars/hadoop-yarn-server-web-proxy-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/json4s-ast_2.11-3.2.11.jar:/usr/hdp/current/spark2-client/jars/hive-exec-1.21.2.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/json4s-core_2.11-3.2.11.jar:/usr/hdp/current/spark2-client/jars/hive-jdbc-1.21.2.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jodd-core-3.5.2.jar:/usr/hdp/current/spark2-client/jars/pyrolite-4.13.jar:/usr/hdp/current/spark2-client/jars/hive-metastore-1.21.2.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/json4s-jackson_2.11-3.2.11.jar:/usr/hdp/current/spark2-client/jars/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/spark2-client/jars/jsp-api-2.1.jar:/usr/hdp/current/spark2-client/jars/jackson-annotations-2.6.7.jar:/usr/hdp/current/spark2-client/jars/libthrift-0.9.3.jar:/usr/hdp/current/spark2-client/jars/jackson-core-asl-1.9.13.jar:/usr/hdp/current/spark2-client/jars/jsr305-1.3.9.jar:/usr/hdp/current/spark2-client/jars/jackson-databind-2.6.7.1.jar:/usr/hdp/current/spark2-client/jars/jtransforms-2.4.0.jar:/usr/hdp/current/spark2-client/jars/jackson-dataformat-cbor-2.6.7.jar:/usr/hdp/current/spark2-client/jars/log4j-1.2.17.jar:/usr/hdp/current/spark2-client/jars/jackson-jaxrs-1.9.13.jar:/usr/hdp/current/spark2-client/jars/jul-to-slf4j-1.7.16.jar:/usr/hdp/current/spark2-client/jars/jackson-mapper-asl-1.9.13.jar:/usr/hdp/current/spark2-client/jars/kryo-shaded-3.0.3.jar:/usr/hdp/current/spark2-client/jars/jackson-module-paranamer-2.7.9.jar:/usr/hdp/current/spark2-client/jars/json-smart-1.3.1.jar:/usr/hdp/current/spark2-client/jars/scalap-2.11.8.jar:/usr/hdp/current/spark2-client/jars/jackson-module-scala_2.11-2.6.7.1.jar:/usr/hdp/current/spark2-client/jars/lz4-java-1.4.0.jar:/usr/hdp/current/spark2-client/jars/jackson-xc-1.9.13.jar:/usr/hdp/current/spark2-client/jars/machinist_2.11-0.6.1.jar:/usr/hdp/current/spark2-client/jars/java-xmlbuilder-1.1.jar:/usr/hdp/current/spark2-client/jars/macro-compat_2.11-1.1.1.jar:/usr/hdp/current/spark2-client/jars/javassist-3.18.1-GA.jar:/usr/hdp/current/spark2-client/jars/leveldbjni-all-1.8.jar:/usr/hdp/current/spark2-client/jars/javax.annotation-api-1.2.jar:/usr/hdp/current/spark2-client/jars/metrics-core-3.1.5.jar:/usr/hdp/current/spark2-client/jars/javax.inject-2.4.0-b34.jar:/usr/hdp/current/spark2-client/jars/metrics-graphite-3.1.5.jar:/usr/hdp/current/spark2-client/jars/javax.servlet-api-3.1.0.jar:/usr/hdp/current/spark2-client/jars/metrics-json-3.1.5.jar:/usr/hdp/current/spark2-client/jars/javax.ws.rs-api-2.0.1.jar:/usr/hdp/current/spark2-client/jars/nimbus-jose-jwt-4.41.1.jar:/usr/hdp/current/spark2-client/jars/jaxb-api-2.2.2.jar:/usr/hdp/current/spark2-client/jars/metrics-jvm-3.1.5.jar:/usr/hdp/current/spark2-client/jars/jcip-annotations-1.0-1.jar:/usr/hdp/current/spark2-client/jars/minlog-1.3.0.jar:/usr/hdp/current/spark2-client/jars/jcl-over-slf4j-1.7.16.jar:/usr/hdp/current/spark2-client/jars/parquet-column-1.8.2.jar:/usr/hdp/current/spark2-client/jars/objenesis-2.1.jar:/usr/hdp/current/spark2-client/jars/spire_2.11-0.13.0.jar:/usr/hdp/current/spark2-client/jars/orc-core-1.4.3.2.6.5.179-4-nohive.jar:/usr/hdp/current/spark2-client/jars/stax-api-1.0-2.jar:/usr/hdp/current/spark2-client/jars/orc-mapreduce-1.4.3.2.6.5.179-4-nohive.jar:/usr/hdp/current/spark2-client/jars/osgi-resource-locator-1.0.1.jar:/usr/hdp/current/spark2-client/jars/parquet-common-1.8.2.jar:/usr/hdp/current/spark2-client/jars/parquet-encoding-1.8.2.jar:/usr/hdp/current/spark2-client/jars/parquet-format-2.3.1.jar:/usr/hdp/current/spark2-client/jars/parquet-hadoop-1.8.2.jar:/usr/hdp/current/spark2-client/jars/parquet-hadoop-bundle-1.6.0.jar:/usr/hdp/current/spark2-client/jars/parquet-jackson-1.8.2.jar:/usr/hdp/current/spark2-client/jars/protobuf-java-2.5.0.jar:/usr/hdp/current/spark2-client/jars/scala-compiler-2.11.8.jar:/usr/hdp/current/spark2-client/jars/scala-library-2.11.8.jar:/usr/hdp/current/spark2-client/jars/stax-api-1.0.1.jar:/usr/hdp/current/spark2-client/jars/scala-parser-combinators_2.11-1.0.4.jar:/usr/hdp/current/spark2-client/jars/scala-reflect-2.11.8.jar:/usr/hdp/current/spark2-client/jars/scala-xml_2.11-1.0.5.jar:/usr/hdp/current/spark2-client/jars/shapeless_2.11-2.3.2.jar:/usr/hdp/current/spark2-client/jars/slf4j-api-1.7.16.jar:/usr/hdp/current/spark2-client/jars/slf4j-log4j12-1.7.16.jar:/usr/hdp/current/spark2-client/jars/snappy-0.2.jar:/usr/hdp/current/spark2-client/jars/snappy-java-1.1.2.6.jar:/usr/hdp/current/spark2-client/jars/stream-2.7.0.jar:/usr/hdp/current/spark2-client/jars/spark-catalyst_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/stringtemplate-3.2.1.jar:/usr/hdp/current/spark2-client/jars/spark-cloud_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/super-csv-2.2.0.jar:/usr/hdp/current/spark2-client/jars/spark-core_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/univocity-parsers-2.5.9.jar:/usr/hdp/current/spark2-client/jars/spark-graphx_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spark-unsafe_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spark-hadoop-cloud_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spark-tags_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spark-hive-thriftserver_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/validation-api-1.1.0.Final.jar:/usr/hdp/current/spark2-client/jars/spark-hive_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/xbean-asm5-shaded-4.4.jar:/usr/hdp/current/spark2-client/jars/spark-kvstore_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/xercesImpl-2.9.1.jar:/usr/hdp/current/spark2-client/jars/spark-launcher_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spark-mllib-local_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/xmlenc-0.52.jar:/usr/hdp/current/spark2-client/jars/spark-mllib_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spark-yarn_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spark-network-common_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spire-macros_2.11-0.13.0.jar:/usr/hdp/current/spark2-client/jars/spark-network-shuffle_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/zookeeper-3.4.6.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spark-repl_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/zstd-jni-1.3.2-2.jar:/usr/hdp/current/spark2-client/jars/spark-sketch_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spark-sql_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spark-streaming_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/2.6.5.179-4/hadoop/conf/
20/05/26 15:38:16 INFO ZooKeeper: Client environment:java.library.path=:/export/home/sobla/oracle_client/instantclient_19_5:/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
20/05/26 15:38:16 INFO ZooKeeper: Client environment:java.io.tmpdir=/tmp
20/05/26 15:38:16 INFO ZooKeeper: Client environment:java.compiler=<NA>
20/05/26 15:38:16 INFO ZooKeeper: Client environment:os.name=Linux
20/05/26 15:38:16 INFO ZooKeeper: Client environment:os.arch=amd64
20/05/26 15:38:16 INFO ZooKeeper: Client environment:os.version=3.10.0-1062.9.1.el7.x86_64
20/05/26 15:38:16 INFO ZooKeeper: Client environment:user.name=svchdc36q
20/05/26 15:38:16 INFO ZooKeeper: Client environment:user.home=/home/svchdc36q
20/05/26 15:38:16 INFO ZooKeeper: Client environment:user.dir=/home/svchdc36q
20/05/26 15:38:16 INFO ZooKeeper: Initiating client connection, connectString=server1.visa.com:2181,server2.visa.com:2181,server3.visa.com:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@4ed31bc9
20/05/26 15:38:16 INFO ClientCnxn: Opening socket connection to server server2.visa.com/x.x.x.x:2181. Will not attempt to authenticate using SASL (unknown error)
20/05/26 15:38:16 INFO ClientCnxn: Socket connection established, initiating session, client: /x.x.x.x:36938, server: server2.visa.com/x.x.x.x:2181
20/05/26 15:38:16 INFO ClientCnxn: Session establishment complete on server server2.visa.com/x.x.x.x:2181, sessionid = 0x27234630fb51f5b, negotiated timeout = 40000
20/05/26 15:38:16 INFO ConnectionStateManager: State change: CONNECTED
20/05/26 15:38:17 INFO ZooKeeper: Session: 0x27234630fb51f5b closed
20/05/26 15:38:17 INFO ClientCnxn: EventThread shut down
20/05/26 15:38:17 INFO Utils: Resolved authority: server2.visa.com:10000
20/05/26 15:38:17 INFO HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://server2.visa.com:10000/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
20/05/26 15:38:19 INFO HoodieHiveClient: Successfully established Hive connection to  jdbc:hive2://server1.visa.com:2181,server2.visa.com:2181,server3.visa.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
20/05/26 15:38:19 INFO metastore: Trying to connect to metastore with URI thrift://server2.visa.com:9083
20/05/26 15:38:19 INFO metastore: Opened a connection to metastore, current connections: 1
20/05/26 15:38:19 INFO metastore: Connected to metastore.
20/05/26 15:38:19 INFO HiveSyncTool: Trying to sync hoodie table trr with base path /projects/cdp/data/cdp_reporting/trr of type COPY_ON_WRITE
20/05/26 15:38:19 ERROR DBUtil$: [App] *********************** Exception occurred in baseTableWrite for trr : Failed to check if table exists trr
org.apache.hudi.hive.HoodieHiveSyncException: Failed to check if table exists trr
	at org.apache.hudi.hive.HoodieHiveClient.doesTableExist(HoodieHiveClient.java:459)
	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:91)
	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:67)
	at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:235)
	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225)
	at com.cybs.cdp.reporting.trr.DBUtil$.transactionTableWrite(DBUtil.scala:62)
	at com.cybs.cdp.reporting.trr.TRREngine$.startEngine(TRREngine.scala:45)
	at com.cybs.cdp.reporting.trr.TRREngine$.main(TRREngine.scala:23)
	at com.cybs.cdp.reporting.trr.TRREngine.main(TRREngine.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:906)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'
	at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
	at org.apache.hudi.org.apache.hadoop_hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1563)
	at org.apache.hudi.org.apache.hadoop_hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1550)
	at org.apache.hudi.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.tableExists(HiveMetaStoreClient.java:1443)
	at org.apache.hudi.hive.HoodieHiveClient.doesTableExist(HoodieHiveClient.java:457)
	... 38 more
Exception in thread "main" java.lang.Exception: Failed to check if table exists trr
	at com.cybs.cdp.reporting.trr.DBUtil$.transactionTableWrite(DBUtil.scala:69)
	at com.cybs.cdp.reporting.trr.TRREngine$.startEngine(TRREngine.scala:45)
	at com.cybs.cdp.reporting.trr.TRREngine$.main(TRREngine.scala:23)
	at com.cybs.cdp.reporting.trr.TRREngine.main(TRREngine.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:906)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)

Thanks,
Selva

@lamberken
Copy link
Member

lamberken commented May 28, 2020

hello @selvarajperiyasamy, try option --use-jdbc false

@lamberken lamberken self-assigned this May 28, 2020
@selvarajperiyasamy
Copy link
Author

@lamber-ken Do you mean something like below in data source writer ?
option(“use-jdbc”,”false”)

@lamberken
Copy link
Member

Hi @selvarajperiyasamy, I guess you used HiveSyncTool directly just now. For spark, use

option("hoodie.datasource.hive_sync.use_jdbc", "false")

@selvarajperiyasamy
Copy link
Author

Hi @lamber-ken , is this Spark option available in 0.5.0. I tired , it didn’t work . When I checked in Hudi code base , this string not found anywhere . Attached the image .

098D1206-9EE9-45BC-A141-9A2373DAD882
59483021-79DF-410B-A570-E999AB568ED0

@lamberken
Copy link
Member

@vinothchandar
Copy link
Member

cc @n3nash as well who made similar changes

@selvarajperiyasamy
Copy link
Author

I have already used below setting and error is still the same as mentioned in the ticket.

option(HIVE_SYNC_ENABLED_OPT_KEY,true). option(HIVE_URL_OPT_KEY,"jdbc:hive2://server1.visa.com:2181,server2.visa.com:2181,server3.visa.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2").
option(HIVE_DATABASE_OPT_KEY,"cdp_reporting").
option(HIVE_TABLE_OPT_KEY,"trr").
option(HIVE_PARTITION_FIELDS_OPT_KEY,"transaction_day,transaction_hour"). option(HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY,classOf[MultiPartKeysValueExtractor].getName).
option("hoodie.datasource.hive_sync.use_jdbc", "false").

@vinothchandar
Copy link
Member

@selvarajperiyasamy Actually, the stack trace does show it going over thrift to the metastore.

Caused by: org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'
	at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
	at org.apache.hudi.org.apache.hadoop_hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1563)
	at org.apache.hudi.org.apache.hadoop_hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1550)
	at org.apache.hudi.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.tableExists(HiveMetaStoreClient.java:1443)
	at org.apache.hudi.hive.HoodieHiveClient.doesTableExist(HoodieHiveClient.java:457)
	... 38 more

This might be an issue with Hive 1.2? (we test with Hive 2.x).. I assume you are running with CDH? @bvaradar who may know about this combo more

@bvaradar bvaradar self-assigned this May 31, 2020
@bvaradar
Copy link
Contributor

@selvarajperiyasamy : This is indeed caused by the version mismatch of Hive. Enabling/Disabling jdbc will not help here. With 0.5.0, Hudi moved to Hive 2.x which was predominantly being used across various deployments. Hive 1.2.x is really old :) and Hive 1.2.x server is not compatible with Hive 2.x clients. Is it possible to upgrade the hive environment to use Hive 2.x (2.3.3 for example) ?

@selvarajperiyasamy
Copy link
Author

selvarajperiyasamy commented May 31, 2020 via email

@vinothchandar
Copy link
Member

let us know how that goes. @bvaradar raised a JIRA to see what/if we can do something here..
But to add my 2c, hadoop/hive vendors are increasingly moving to Hive 3 even.. So ideally upgrading to hive 2 is a good to do things nonetheless. At least at uber, it improved hive overall from what I remember..

@selvarajperiyasamy
Copy link
Author

selvarajperiyasamy commented May 31, 2020 via email

@bvaradar
Copy link
Contributor

bvaradar commented Jun 6, 2020

@selvarajperiyasamy : Hope you were able to resolve the issue. Let us know if any help is needed.

@cdmikechen
Copy link
Contributor

cdmikechen commented Jun 22, 2020

@bvaradar I've tested deltastreamer by hudi in master branch, If I set hoodie.datasource.hive_sync.use_jdbc=false and use hive driver class to create hive table, It will report error java.lang.NoClassDefFoundError: org/json/JSONEXception .
I checked spark libs and hudi jars, then I found that hudi just use hive2's jars to syn hive, but hudi-utilities-bundle can not contain hive2's dependence libs.

@bvaradar
Copy link
Contributor

@cdmikechen : Long time :)

Hudi utilities include following hive jars in shaded form

                  <include>org.apache.hive:hive-service</include>
                  <include>org.apache.hive:hive-service-rpc</include>
                  <include>org.apache.hive:hive-metastore</include>
                  <include>org.apache.hive:hive-jdbc</include>

Can you attach the whole exception you are seeing. We had a compliance reason for not including org.json classes (due to licensing issues).

@vinothchandar
Copy link
Member

@cdmikechen Please let us know the whole exception.. If we can repro, ideally like to fix it before 0.6.0 goes out .

@ruztbucket
Copy link

Hi, i'm facing the same issue when trying to sync to hive with hoodie.datasource.hive_sync.use_jdbc=false.

This is the complete stacktrace -

20/12/01 11:47:07 INFO Client: Application report for application_1606814313006_0010 (state: RUNNING)
20/12/01 11:47:08 INFO Client: Application report for application_1606814313006_0010 (state: FINISHED)
20/12/01 11:47:08 INFO Client: 
	 client token: N/A
	 diagnostics: User class threw exception: java.lang.NoClassDefFoundError: org/json/JSONException
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:10847)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10047)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10128)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:209)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
	at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:384)
	at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:367)
	at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:357)
	at org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:262)
	at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:176)
	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:130)
	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
	at org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:321)
	at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:363)
	at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:359)
	at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
	at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:359)
	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:417)
	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:205)
	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:169)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:197)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:194)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:169)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:114)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112)
	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677)
	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677)
	at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83)
	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94)
	at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
	at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178)
	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:93)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:200)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:92)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:677)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:286)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:272)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:230)
	at com.amazon.smt.wes.r2.SimpleJob$.main(SimpleJob.scala:54)
	at com.amazon.smt.wes.r2.SimpleJob.main(SimpleJob.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:685)
Caused by: java.lang.ClassNotFoundException: org.json.JSONException
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 58 more

	 ApplicationMaster host: ip-10-0-1-32.ec2.internal
	 ApplicationMaster RPC port: 35829
	 queue: default
	 start time: 1606822996408
	 final status: FAILED
	 tracking URL: http://ip-10-0-1-75.ec2.internal:20888/proxy/application_1606814313006_0010/
	 user: hadoop
20/12/01 11:47:08 ERROR Client: Application diagnostics message: User class threw exception: java.lang.NoClassDefFoundError: org/json/JSONException
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:10847)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10047)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10128)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:209)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
	at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:384)
	at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:367)
	at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:357)
	at org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:262)
	at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:176)
	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:130)
	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
	at org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:321)
	at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:363)
	at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:359)
	at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
	at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:359)
	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:417)
	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:205)
	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:169)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:197)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:194)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:169)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:114)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112)
	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677)
	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677)
	at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83)
	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94)
	at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
	at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178)
	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:93)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:200)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:92)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:677)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:286)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:272)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:230)
	at com.amazon.smt.wes.r2.SimpleJob$.main(SimpleJob.scala:54)
	at com.amazon.smt.wes.r2.SimpleJob.main(SimpleJob.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:685)
Caused by: java.lang.ClassNotFoundException: org.json.JSONException
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 58 more

Exception in thread "main" org.apache.spark.SparkException: Application application_1606814313006_0010 finished with failed status
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:1149)
	at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1529)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/12/01 11:47:08 INFO ShutdownHookManager: Shutdown hook called

@rakeshramakrishnan
Copy link

I'm also facing the same issue as documented by @ruztbucket
Am using Hudi 0.6.0

@kimberlyamandalu
Copy link

I am also experiencing this error on Hudi 0.6.0, EMR 5.31.0.
I am setting the following property to false: hoodie.datasource.hive_sync.use_jdbc
But it does not fix anything. What do we need to set as our Hive URL when we have specified for the Glue catalog to be used as the metastore in both hive-site and hive-spark-site configs...

I've tried referencing the json-1.8.jar found in /usr/lib/hive/lib/json-1.8.jar of my EMR server in my --jars parameter but that does not fix the issue either.

@nsivabalan
Copy link
Contributor

@bvaradar : Can you please follow up on this ticket when you can.

@bvaradar
Copy link
Contributor

@kimberlyamandalu : Sorry for the delay. This is weird. Can you check if org/json/JSONException is present in /usr/lib/hive/lib/json-1.8.jar ?

@nikspatel03
Copy link

I'm also facing the same issue mentioned by @ruztbucket
I'm using EMR 5.31.0 - Hudi 0.6.0 - Hive 2.3.7

@nsivabalan nsivabalan added the priority:critical production down; pipelines stalled; Need help asap. label Feb 9, 2021
@nsivabalan
Copy link
Contributor

@bvaradar : fyi I have created a sec:critical jira on this https://issues.apache.org/jira/browse/HUDI-1609. Please reduce priority if you feel otherwise.

@nsivabalan nsivabalan changed the title How to disable Hive JDBC and enable metastore [HUDI-1609] How to disable Hive JDBC and enable metastore Feb 10, 2021
@nsivabalan
Copy link
Contributor

@bvaradar : I could not reproduce w/ local docker set up. Do you have any pointers on how to go about triaging this. Also, I am running into some other issue locally which I documented in https://issues.apache.org/jira/browse/HUDI-1609.

@nsivabalan
Copy link
Contributor

@kimberlyamandalu : in the mean time, would you mind responding to Balaji's doubts.

@nsivabalan
Copy link
Contributor

Here is my understanding.
Hudi does not use a fat jar(as the fat jar pulls in lot of unwanted jars) for hive-exec and relies on the jars to be available in the environment. Hudi does not pull this jar explicitly in any of the bundles. But this flow has been working before w/o any issues.
@bvaradar @n3nash : Could it be some jar version mismatches? Any pointers would be appreciable. As I told earlier, I am running some issues locally and hence couldn't reproduce. Your help is much appreciated here.

@nsivabalan nsivabalan added priority:major degraded perf; unable to move forward; potential bugs and removed priority:critical production down; pipelines stalled; Need help asap. labels Feb 19, 2021
@rubenssoto
Copy link

@nsivabalan I had a different error
Hudi 0.7.0
Emr 6.2

Exception in thread "ForkJoinPool-1-worker-13" java.lang.NoClassDefFoundError: org/apache/calcite/rel/type/RelDataTypeSystem at org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:318) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:484) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:401) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:384) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:374) at org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:263) at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:181) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:136) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94) at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:355) at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$4(HoodieSparkSqlWriter.scala:403) at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$4$adapted(HoodieSparkSqlWriter.scala:399) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:399) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:460) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:218) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:124) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:123) at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:963) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227) at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:107) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:132) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:132) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:248) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:131) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:963) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:415) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:399) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:288) at hudiwriter.HudiWriter.createHudiTable(HudiWriter.scala:48) at hudiwriter.HudiContext.writeToHudi(HudiContext.scala:46) at jobs.TableProcessor.start(TableProcessor.scala:86) at TableProcessorWrapper$.$anonfun$main$2(TableProcessorWrapper.scala:23) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at scala.util.Success.$anonfun$map$1(Try.scala:255) at scala.util.Success.map(Try.scala:213) at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) Caused by: java.lang.ClassNotFoundException: org.apache.calcite.rel.type.RelDataTypeSystem at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 65 more

@rubenssoto
Copy link

rubenssoto commented Feb 24, 2021

I give another try on
EMR 5.32
Hudi 0.6.0

First ERROR:

21/02/24 22:27:25 WARN FileUtils: Error setting permissions of hdfs://ip-10-0-28-211.us-west-2.compute.internal:8020/user/spark/warehouse/raw_courier_api_hudi.db java.io.IOException: Unable to set permissions of hdfs://ip-10-0-28-211.us-west-2.compute.internal:8020/user/spark/warehouse/raw_courier_api_hudi.db at org.apache.hadoop.hive.shims.Hadoop23Shims.setFullFileStatus(Hadoop23Shims.java:758) at org.apache.hadoop.hive.common.FileUtils.mkdir(FileUtils.java:527) at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:201) at com.amazonaws.glue.catalog.util.MetastoreClientUtils.makeDirs(MetastoreClientUtils.java:43) at com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.createDatabase(GlueMetastoreClientDelegate.java:236) at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.createDatabase(AWSCatalogMetastoreClient.java:272) at org.apache.hadoop.hive.ql.metadata.Hive.createDatabase(Hive.java:316) at org.apache.hadoop.hive.ql.exec.DDLTask.createDatabase(DDLTask.java:3895) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:271) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:384) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:367) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:357) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:121) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94) at org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:321) at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:363) at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:359) at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:359) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:417) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:205) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:169) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:197) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:194) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:169) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:114) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696) at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94) at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141) at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:93) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:200) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:92) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249) at hudiwriter.HudiWriter.createHudiTable(HudiWriter.scala:48) at hudiwriter.HudiContext.writeToHudi(HudiContext.scala:38) at jobs.TableProcessor.start(TableProcessor.scala:77) at TableProcessorWrapper$$anonfun$1$$anonfun$apply$1.apply$mcV$sp(TableProcessorWrapper.scala:23) at TableProcessorWrapper$$anonfun$1$$anonfun$apply$1.apply(TableProcessorWrapper.scala:23) at TableProcessorWrapper$$anonfun$1$$anonfun$apply$1.apply(TableProcessorWrapper.scala:23) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) Caused by: java.lang.NumberFormatException: For input string: "testingforemptydefaultvalue" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.parseInt(Integer.java:615) at org.apache.hadoop.conf.Configuration.getInts(Configuration.java:1402) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:332) at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:355) at org.apache.hadoop.fs.FsShell.init(FsShell.java:96) at org.apache.hadoop.fs.FsShell.run(FsShell.java:296) at org.apache.hadoop.hive.shims.HadoopShimsSecure.run(HadoopShimsSecure.java:377) at org.apache.hadoop.hive.shims.Hadoop23Shims.setFullFileStatus(Hadoop23Shims.java:729) ... 66 more

Second ERROR:

Exception in thread "ForkJoinPool-1-worker-2" java.lang.NoClassDefFoundError: org/json/JSONException at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:10847) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10047) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10128) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:209) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:384) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:367) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:357) at org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:262) at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:176) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:130) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94) at org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:321) at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:363) at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:359) at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:359) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:417) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:205) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:169) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:197) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:194) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:169) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:114) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696) at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94) at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141) at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:93) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:200) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:92) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249) at hudiwriter.HudiWriter.createHudiTable(HudiWriter.scala:48) at hudiwriter.HudiContext.writeToHudi(HudiContext.scala:38) at jobs.TableProcessor.start(TableProcessor.scala:77) at TableProcessorWrapper$$anonfun$1$$anonfun$apply$1.apply$mcV$sp(TableProcessorWrapper.scala:23) at TableProcessorWrapper$$anonfun$1$$anonfun$apply$1.apply(TableProcessorWrapper.scala:23) at TableProcessorWrapper$$anonfun$1$$anonfun$apply$1.apply(TableProcessorWrapper.scala:23) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) Caused by: java.lang.ClassNotFoundException: org.json.JSONException at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 64 more

@kimberlyamandalu
Copy link

@kimberlyamandalu : Sorry for the delay. This is weird. Can you check if org/json/JSONException is present in /usr/lib/hive/lib/json-1.8.jar ?

@bvaradar Sorry for the delayed response. Yes, the JSONException object is present in this jar
$ jar -xvf json-1.8.jar
created: META-INF/
inflated: META-INF/MANIFEST.MF
created: org/
created: org/json/
inflated: org/json/JSON.class
inflated: org/json/JSONArray.class
inflated: org/json/JSONException.class
inflated: org/json/JSONObject$1.class
inflated: org/json/JSONObject.class
inflated: org/json/JSONString.class
inflated: org/json/JSONStringer$Scope.class
inflated: org/json/JSONStringer.class
inflated: org/json/JSONTokener.class
created: META-INF/maven/
created: META-INF/maven/com.tdunning/
created: META-INF/maven/com.tdunning/json/
inflated: META-INF/maven/com.tdunning/json/pom.xml
inflated: META-INF/maven/com.tdunning/json/pom.properties

@ismailsimsek
Copy link

this comment might help for the second error #1751 (comment)
java.lang.NoClassDefFoundError: org/apache/calcite/rel/type/RelDataTypeSystem

org.apache.calcite
calcite-core
1.16.0

org.apache.thrift
libfb303
0.9.3

@ngonik
Copy link

ngonik commented Apr 18, 2021

Hey, I'm having the same issues with JSONEXception on EMR as mentioned above. Is there any update around that? Anything I can help with to make it work? Thanks!

@ngonik
Copy link

ngonik commented Apr 20, 2021

I was able to fix the JSONException error on EMR. Just needed to manually add the org.json (https://mvnrepository.com/artifact/org.json/json) package to both executor and driver extraClassPath config when deploying the cluster.

@diogodilcl
Copy link

Hudi version: 0.7.0
Emr : 6.2

Hi,

when I use:

"hoodie.datasource.hive_sync.use_jdbc":"false"

I have the following exception:

21/04/28 22:19:49 ERROR HiveSyncTool: Got runtime exception when hive syncing
org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing SQL
	at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:406)
	at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:384)
	at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:374)
	at org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:263)
	at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:181)
	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:136)
	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
	at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:355)
	at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$4(HoodieSparkSqlWriter.scala:403)
	at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$4$adapted(HoodieSparkSqlWriter.scala:399)
	at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
	at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:399)
	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:460)
	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:218)
	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:124)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:123)
	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:963)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:107)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:132)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:132)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:248)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:131)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:963)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:415)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:399)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:288)
	at sun.reflect.GeneratedMethodAccessor222.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Socket Factory class not found: java.lang.ClassNotFoundException: Class testingforemptydefaultvalue not found
	at org.apache.hadoop.net.NetUtils.getSocketFactoryFromProperty(NetUtils.java:143)
	at org.apache.hadoop.net.NetUtils.getSocketFactory(NetUtils.java:98)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:309)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:290)
	at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:171)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3358)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:483)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:234)
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:583)
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:548)
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:528)
	at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:394)
	... 51 more

Existing tables are updated, but for tables that need to be created I get the exception above.

@n3nash
Copy link
Contributor

n3nash commented Jun 3, 2021

@diogodilcl Are you able to reproduce this issue consistently ? Could you provide some ways to reproduce it so we can find a resolution.

@n3nash n3nash added hive Issues related to hive meta-sync awaiting-user-response and removed hive Issues related to hive labels Jun 4, 2021
@n3nash
Copy link
Contributor

n3nash commented Jun 16, 2021

Closing this ticket due to inactivity. There is a PR open that will provide ways to disable JDBC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta-sync priority:major degraded perf; unable to move forward; potential bugs
Projects
None yet
Development

No branches or pull requests