Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver #1398

Closed
eigakow opened this issue Mar 11, 2020 · 5 comments
Closed

[SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver #1398

eigakow opened this issue Mar 11, 2020 · 5 comments
Assignees

Comments

@eigakow
Copy link

eigakow commented Mar 11, 2020

Describe the problem you faced

Using DeltaStreamer with --enable-hive-sync throws java.lang.NoClassDefFoundError: org/apache/hive/jdbc/HiveDriver error.
Should I change something in the default compilation process to include this class?

To Reproduce

Steps to reproduce the behavior:

  1. Properties file:
hoodie.datasource.write.recordkey.field=ts
hoodie.datasource.write.partitionpath.field=ts
hoodie.deltastreamer.schemaprovider.source.schema.file=file:///home/director/me/hudi-0.5.1-incubating/schema.avro
hoodie.deltastreamer.schemaprovider.target.schema.file=file:///home/director/me/hudi-0.5.1-incubating/schema.avro
source-class=FR24JsonKafkaSource
bootstrap.servers=streaming-kafka-broker-1:9092,streaming-kafka-broker-2:9092,streaming-kafka-broker-3:9092
group.id=hudi_testing
hoodie.deltastreamer.source.kafka.topic=fr-bru
enable.auto.commit=false
schemaprovider-class=org.apache.hudi.utilities.schema.FilebasedSchemaProvider
auto.offset.reset=earliest

hoodie.datasource.hive_sync.database=fr24raw
hoodie.datasource.hive_sync.table=test_hudi
hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://master-1.bigdatapoc.local:10000/default;principal=hive/master-1.bigdatapoc.local@BIGDATAPOC.LOCAL
hoodie.datasource.hive_sync.assume_date_partitioning=true
hoodie.datasource.hive_sync.useJdbc=false
  1. Launch spark-submit with HoodieDeltaStreamer
spark-submit --master yarn  --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --jars $(pwd)/../my-app-1-jar-with-dependencies.jar $(pwd)/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.5.1-incubating.jar --props hdfs:///tmp/hudi-fr24.properties --target-base-path adl://XXX.azuredatalakestore.net/test-hudi --table-type MERGE_ON_READ --target-table test_hudi --source-class FR24JsonKafkaSource  --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider --enable-hive-sync --continuous --source-limit 100

Expected behavior

Sync to hive works

Environment Description

  • Hudi version : hudi-0.5.1-incubating

  • Spark version : 2.4.0-cdh6.1.0

  • Hive version : 2.1.1-cdh6.1.0

  • Hadoop version : 3.0.0-cdh6.1.0

  • Storage (HDFS/S3/GCS..) : ADLS

  • Running on Docker? (yes/no) : no

Stacktrace

0/03/11 16:04:47 INFO cluster.YarnScheduler: Removed TaskSet 37.0, whose tasks have all completed, from pool
20/03/11 16:04:47 INFO scheduler.DAGScheduler: ResultStage 37 (collect at HoodieMergeOnReadTableCompactor.java:208) finished in 0.679 s
20/03/11 16:04:47 INFO scheduler.DAGScheduler: Job 12 finished: collect at HoodieMergeOnReadTableCompactor.java:208, took 0.680344 s
20/03/11 16:04:47 INFO compact.HoodieMergeOnReadTableCompactor: Total of 0 compactions are retrieved
20/03/11 16:04:47 INFO compact.HoodieMergeOnReadTableCompactor: Total number of latest files slices 4
20/03/11 16:04:47 INFO compact.HoodieMergeOnReadTableCompactor: Total number of log files 0
20/03/11 16:04:47 INFO compact.HoodieMergeOnReadTableCompactor: Total number of file slices 4
20/03/11 16:04:47 WARN compact.HoodieMergeOnReadTableCompactor: After filtering, Nothing to compact for adl://ecintpocdl.azuredatalakestore.net/FlightRadar24/test-hudi3
20/03/11 16:04:47 INFO deltastreamer.DeltaSync: Syncing target hoodie table with hive table(test_hudi). Hive metastore URL :jdbc:hive2://master-1.bigdatapoc.local:10000/default;principal=hive/master-1.bigdatapoc.local@BIGDATAPOC.LOCAL, basePath :adl://XXX.azuredatalakestore.net/test-hudi
20/03/11 16:04:47 INFO deltastreamer.HoodieDeltaStreamer: Delta Sync shutdown. Error ?false
20/03/11 16:04:47 WARN deltastreamer.HoodieDeltaStreamer: Gracefully shutting down compactor
20/03/11 16:05:00 INFO deltastreamer.HoodieDeltaStreamer: Compactor shutting down properly!!
20/03/11 16:05:00 ERROR deltastreamer.AbstractDeltaStreamerService: Service shutdown with error
java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/apache/hive/jdbc/HiveDriver
        at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
        at org.apache.hudi.utilities.deltastreamer.AbstractDeltaStreamerService.waitForShutdown(AbstractDeltaStreamerService.java:72)
        at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:117)
        at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:295)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoClassDefFoundError: org/apache/hive/jdbc/HiveDriver
        at org.apache.hudi.hive.HoodieHiveClient.<clinit>(HoodieHiveClient.java:80)
        at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:66)
        at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncHive(DeltaSync.java:481)
        at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:423)
        at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:238)
        at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:393)
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.hive.jdbc.HiveDriver
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 10 more
@lamberken
Copy link
Member

@eigakow
Copy link
Author

eigakow commented Mar 12, 2020

Yes, I have tried adding both hive-jdbc-2.1.1-cdh6.1.0.jar and hive-service-2.1.1-cdh6.1.0.jar and ended up with another missing class error, therefore I was questioning if that is the right path. Thank you for clarifying.

With both jars added I am now getting Cannot find class 'org.apache.hudi.hadoop.HoodieParquetInputFormat' while hudi tries to create a new table

This class is present

$ jar tf packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.5.1-incubating.jar | grep HoodieParquetInputFormat
org/apache/hudi/hadoop/HoodieParquetInputFormat.class

Stacktrace:

20/03/12 07:29:31 INFO hive.HiveSyncTool: Hive table test_hudi is not found. Creating it
20/03/12 07:29:31 INFO hive.HoodieHiveClient: Creating table with CREATE EXTERNAL TABLE  IF NOT EXISTS `fr24raw`.`test_hudi`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `flights` ARRAY< STRUCT< `fcode` : string, `finfo` : ARRAY< string>>>, `full_count` int, `ts` string, `version` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'adl://XXX.azuredatalakestore.net/test-hudi'
20/03/12 07:29:31 INFO hive.HoodieHiveClient: Executing SQL CREATE EXTERNAL TABLE  IF NOT EXISTS `fr24raw`.`test_hudi`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `flights` ARRAY< STRUCT< `fcode` : string, `finfo` : ARRAY< string>>>, `full_count` int, `ts` string, `version` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'adl://XXX.azuredatalakestore.net/test-hudi'
20/03/12 07:29:31 ERROR hive.HiveSyncTool: Got runtime exception when hive syncing
org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing SQL CREATE EXTERNAL TABLE  IF NOT EXISTS `fr24raw`.`test_hudi`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `flights` ARRAY< STRUCT< `fcode` : string, `finfo` : ARRAY< string>>>, `full_count` int, `ts` string, `version` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'adl://XXX.azuredatalakestore.net/test-hudi'
        at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:488)
        at org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:276)
        at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:150)
        at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:118)
        at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:91)
        at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncHive(DeltaSync.java:481)
        at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:423)
        at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:238)
        at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:393)
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Cannot find class 'org.apache.hudi.hadoop.HoodieParquetInputFormat'
        at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:266)
        at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:252)
        at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:318)
        at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:259)
        at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:486)
        ... 12 more
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Cannot find class 'org.apache.hudi.hadoop.HoodieParquetInputFormat'
        at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:335)
        at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:207)
        at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
        at org.apache.hive.service.cli.operation.Operation.run(Operation.java:266)
        at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:504)
        at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:490)
        at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
        at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
        at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
        at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
        at com.sun.proxy.$Proxy38.executeStatementAsync(Unknown Source)
        at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:295)
        at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:507)
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:605)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
        ... 3 more
Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Cannot find class 'org.apache.hudi.hadoop.HoodieParquetInputFormat'
        at org.apache.hadoop.hive.ql.parse.ParseUtils.ensureClassExists(ParseUtils.java:263)
        at org.apache.hadoop.hive.ql.parse.StorageFormat.fillStorageFormat(StorageFormat.java:57)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:11666)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10839)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10949)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10639)
        at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:600)
        at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1414)
        at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1391)
        at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:205)
        ... 26 more
Caused by: java.lang.ClassNotFoundException: org.apache.hudi.hadoop.HoodieParquetInputFormat
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.hadoop.hive.ql.parse.ParseUtils.ensureClassExists(ParseUtils.java:261)
        ... 36 more
20/03/12 07:29:31 INFO hive.metastore: Closed a connection to metastore, current connections: 0

@lamberken
Copy link
Member

Hi @eigakow, hive parse the CREATE TABLE .. statement, so we also add that jar to hive server classpath.

@eigakow
Copy link
Author

eigakow commented Mar 12, 2020

@lamber-ken Yes, that works, thank you!

@eigakow eigakow closed this as completed Mar 12, 2020
@Harish0411
Copy link

I have facing one issue when I was tring to execute jar file.
The exact error is
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONException
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:488)
at java.base/java.lang.Class.forName(Class.java:467)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:56)
Caused by: java.lang.ClassNotFoundException: org.json.JSONException
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:588)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)

Please provide steps to solve this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants